Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

alinaqi/skills/autonomous-testing

Name: skills/autonomous-testing
Author: alinaqi

skills/autonomous-testing/SKILL.md

npx skillsauth add alinaqi/claude-bootstrap skills/autonomous-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Autonomous Testing Agent

Overview

An AI-driven testing agent that auto-discovers, generates, executes, evaluates, and fixes tests for any project type. Inspired by the edubites autonomous test runner pattern, generalized for Claude Bootstrap + Maggy.

Pipeline

Source Scan → Discover Gaps → Generate Tests → Execute → Evaluate → Report → Fix Loop

Phase 1: Discover — What Needs Testing?

Auto-detect project type:
  Python    → scan for *.py files, extract public functions/classes
  TypeScript → scan for *.ts/*.tsx files, extract exports
  API       → scan FastAPI/Express routes, extract endpoints + methods
  Web       → scan React/Vue components, extract user flows

Map existing tests:
  Python    → pytest --collect-only
  TypeScript → vitest --list
  API       → scan tests/ for endpoint coverage

Compute coverage gaps:
  - Functions with 0 tests
  - API endpoints with 0 tests
  - Components with 0 tests
  - Branches with <80% coverage

Phase 2: Generate — AI-Written Tests

For each uncovered function/endpoint/component:
  1. Read source code → understand inputs, outputs, edge cases
  2. Generate test scaffold using ~/bin/deepseek --pro
  3. Include: happy path, error cases, edge cases, auth checks
  4. Write to appropriate test directory

Model routing for generation:
  - Simple functions    → ~/bin/deepseek --flash (cheap, fast)
  - Complex logic       → ~/bin/deepseek --pro (thorough)
  - Auth/security tests → ~/bin/deepseek --pro (quality-critical)

Phase 3: Execute — Run Everything

# Python
pytest -x --cov --cov-report=json

# TypeScript
npx vitest run --coverage

# E2E (if Playwright detected)
npx playwright test

# Parse results → structured TestRun { pass/fail, coverage, duration, failures[] }

Phase 4: Evaluate — AI-Powered Assessment

For each test failure:
  1. Capture: test name, error message, stack trace, source code diff
  2. Classify failure:
     - TEST_BUG: test is wrong (outdated expectation, bad mock)
     - CODE_BUG: code is wrong (regression, edge case)
     - ENV_BUG: environment issue (missing dep, config)
  3. AI evaluation: ~/bin/deepseek --pro analyzes failure and classifies

For E2E/web tests:
  - Capture screenshots at failure points
  - ~/bin/gemini --flash evaluates visual state (multimodal)

Phase 5: Fix — Autonomous Repair

TEST_BUG → regenerate test with corrected expectation
CODE_BUG → propose fix with ~/bin/deepseek --pro, apply, re-run
ENV_BUG → report to user with fix instructions

Auto-fix loop:
  while test_failures > 0 and attempts < 3:
    for each failure:
      classify → fix → re-run
    if fixed: record as "auto-fixed"
    if not: escalate to CLAUDE tier

Phase 6: Report — Structured Output

{
  "project": "my-app",
  "timestamp": "2026-05-16T12:00:00Z",
  "summary": {
    "tests_run": 247,
    "passed": 231,
    "failed": 12,
    "auto_fixed": 8,
    "needs_manual": 4,
    "coverage": 0.83
  },
  "gaps_found": 15,
  "tests_generated": 15,
  "next_actions": [
    "4 manual fixes needed in auth module",
    "Coverage gap: src/payment.py has 0 tests",
    "3 E2E flows untested: signup, checkout, profile-edit"
  ]
}

Integration with Maggy

Maggy Dashboard → Testing tab shows:
  - Coverage trend over time
  - Auto-generated test count
  - Failure classification (TEST_BUG vs CODE_BUG)
  - "Generate tests for gaps" one-click button

Heartbeat job: auto-generate tests weekly for new untested code
Auto-review hook: triggers test generation after significant PR merges

Usage

# Discover test gaps
maggy test discover

# Generate tests for all gaps
maggy test generate --all

# Generate tests for specific module
maggy test generate --module auth

# Run full test cycle (discover → generate → execute → fix → report)
maggy test autonomous

# Watch mode — auto-test on file changes
maggy test watch

Configuration

// ~/.claude/testing-config.json
{
  "auto_generate": true,
  "auto_fix": true,
  "max_fix_attempts": 3,
  "min_coverage": 0.8,
  "generate_model": "deepseek-pro",
  "evaluate_model": "gemini-flash",
  "fix_model": "deepseek-pro",
  "exclude_patterns": ["*/migrations/*", "*/node_modules/*"]
}

alinaqi/skills/autonomous-testing

skills/autonomous-testing/SKILL.md

# Autonomous Testing Agent ## Overview An AI-driven testing agent that auto-discovers, generates, executes, evaluates, and fixes tests for any project type. Inspired by the edubites autonomous test runner pattern, generalized for Claude Bootstrap + Maggy. ## Pipeline ``` Source Scan → Discover Gaps → Generate Tests → Execute → Evaluate → Report → Fix Loop ``` ## Phase 1: Discover — What Needs Testing? ``` Auto-detect project type: Python → scan for *.py files, extract public functions

631 stars

development

Updated May 17, 2026

$ install --global

skillsauth

npx skillsauth add alinaqi/claude-bootstrap skills/autonomous-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 17, 2026, 3:57 AM141.6s1 file scanned

SKILL.md

Autonomous Testing Agent

Overview

Pipeline

Source Scan → Discover Gaps → Generate Tests → Execute → Evaluate → Report → Fix Loop

Phase 1: Discover — What Needs Testing?

Auto-detect project type:
  Python    → scan for *.py files, extract public functions/classes
  TypeScript → scan for *.ts/*.tsx files, extract exports
  API       → scan FastAPI/Express routes, extract endpoints + methods
  Web       → scan React/Vue components, extract user flows

Map existing tests:
  Python    → pytest --collect-only
  TypeScript → vitest --list
  API       → scan tests/ for endpoint coverage

Compute coverage gaps:
  - Functions with 0 tests
  - API endpoints with 0 tests
  - Components with 0 tests
  - Branches with <80% coverage

Phase 2: Generate — AI-Written Tests

For each uncovered function/endpoint/component:
  1. Read source code → understand inputs, outputs, edge cases
  2. Generate test scaffold using ~/bin/deepseek --pro
  3. Include: happy path, error cases, edge cases, auth checks
  4. Write to appropriate test directory

Model routing for generation:
  - Simple functions    → ~/bin/deepseek --flash (cheap, fast)
  - Complex logic       → ~/bin/deepseek --pro (thorough)
  - Auth/security tests → ~/bin/deepseek --pro (quality-critical)

Phase 3: Execute — Run Everything

# Python
pytest -x --cov --cov-report=json

# TypeScript
npx vitest run --coverage

# E2E (if Playwright detected)
npx playwright test

# Parse results → structured TestRun { pass/fail, coverage, duration, failures[] }

Phase 4: Evaluate — AI-Powered Assessment

For each test failure:
  1. Capture: test name, error message, stack trace, source code diff
  2. Classify failure:
     - TEST_BUG: test is wrong (outdated expectation, bad mock)
     - CODE_BUG: code is wrong (regression, edge case)
     - ENV_BUG: environment issue (missing dep, config)
  3. AI evaluation: ~/bin/deepseek --pro analyzes failure and classifies

For E2E/web tests:
  - Capture screenshots at failure points
  - ~/bin/gemini --flash evaluates visual state (multimodal)

Phase 5: Fix — Autonomous Repair

TEST_BUG → regenerate test with corrected expectation
CODE_BUG → propose fix with ~/bin/deepseek --pro, apply, re-run
ENV_BUG → report to user with fix instructions

Auto-fix loop:
  while test_failures > 0 and attempts < 3:
    for each failure:
      classify → fix → re-run
    if fixed: record as "auto-fixed"
    if not: escalate to CLAUDE tier

Phase 6: Report — Structured Output

{
  "project": "my-app",
  "timestamp": "2026-05-16T12:00:00Z",
  "summary": {
    "tests_run": 247,
    "passed": 231,
    "failed": 12,
    "auto_fixed": 8,
    "needs_manual": 4,
    "coverage": 0.83
  },
  "gaps_found": 15,
  "tests_generated": 15,
  "next_actions": [
    "4 manual fixes needed in auth module",
    "Coverage gap: src/payment.py has 0 tests",
    "3 E2E flows untested: signup, checkout, profile-edit"
  ]
}

Integration with Maggy

Maggy Dashboard → Testing tab shows:
  - Coverage trend over time
  - Auto-generated test count
  - Failure classification (TEST_BUG vs CODE_BUG)
  - "Generate tests for gaps" one-click button

Heartbeat job: auto-generate tests weekly for new untested code
Auto-review hook: triggers test generation after significant PR merges

Usage

# Discover test gaps
maggy test discover

# Generate tests for all gaps
maggy test generate --all

# Generate tests for specific module
maggy test generate --module auth

# Run full test cycle (discover → generate → execute → fix → report)
maggy test autonomous

# Watch mode — auto-test on file changes
maggy test watch

Configuration

// ~/.claude/testing-config.json
{
  "auto_generate": true,
  "auto_fix": true,
  "max_fix_attempts": 3,
  "min_coverage": 0.8,
  "generate_model": "deepseek-pro",
  "evaluate_model": "gemini-flash",
  "fix_model": "deepseek-pro",
  "exclude_patterns": ["*/migrations/*", "*/node_modules/*"]
}

Related Skills

alinaqi/council-review

testing

VerifiedTrustedCommunity

Multi-model validation council — auto-validate plans, architecture changes, and PRs via validate-plan/review before executing

686SKILL.mdUpdated Jun 5, 2026

alinaqi/council-review

alinaqi/code-review

development

VerifiedTrustedCommunity

Mandatory code reviews via /code-review before commits and deploys

641SKILL.mdUpdated Mar 20, 2026

alinaqi/skills/visual-validation

development

VerifiedTrustedCommunity

# Visual Validation — Autonomous Screenshot Verification ## Philosophy Every UI change should be visually verified before it ships. Peekaboo captures pixel-accurate screenshots. The system compares before/after and flags visual regressions. No manual "looks good to me" — the machine verifies what the machine built. ## Autonomous Flow ``` static/* files modified (detected by auto-review-hook or E2E testkit) ↓ peekaboo image --mode screen → ~/.maggy/visual-verify/after-{ts}.png ↓ Compa

635SKILL.mdUpdated May 19, 2026

alinaqi/skills/visual-validation

alinaqi/skills/model-routing

tools

VerifiedTrustedCommunity

# Model Routing System ## How Routing Decisions Are Made Every user prompt goes through a 9-tier classification pipeline before any AI model processes it. The system answers three questions: 1. **Which model should handle this?** — 9-tier cost/complexity classification 2. **Is the classifier itself working?** — Cascading fallback (qwen3 → kimi → deepseek → cache) 3. **Can we verify the result?** — Tool-level fallback + auto-evaluation ### The Pipeline ``` User types prompt ↓ UserPromptS

631SKILL.mdUpdated May 17, 2026

alinaqi/skills/model-routing

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/alinaqi/claude-bootstrap.git

# Copy into Claude Code skills folder (global)
cp -r claude-bootstrap/skills/autonomous-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

alinaqi/claude-bootstrap

631 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT