skills/test-audit/SKILL.md
--- name: test-audit description: Audit test suites for T1-T4 violations using AST analysis, mock detection, and multi-stage synthesis. Invoke when user asks to audit tests, check test quality, find mock violations, review test effectiveness, or inspect test suites for over-mocking. Triggers automatic rewrites when quality gates fail. user-invocable: true argument-hint: [path] [--threshold=N] skills: - test-classification - mock-detection - assertion-patterns - component-pattern
npx skillsauth add ashaykubal/essential-agents-skills skills/test-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
User-facing entry point for test suite quality auditing. Orchestrates classification, mock detection, and synthesis stages to identify T1-T4 violations and trigger automatic rewrites when required.
Load this skill when the user request matches ANY of these patterns:
| Trigger Pattern | Example User Request | |-----------------|---------------------| | Test quality audit | "Audit my tests", "Check test quality", "Review test suite" | | Mock detection | "Find mock violations", "Check for T1 violations", "Are my tests over-mocked?" | | Test effectiveness | "How effective are my tests?", "Are my tests real or mocked?" | | After writing tests | "I just wrote tests for X, can you audit them?" | | CI/CD integration | "Add test audit to pipeline", "Validate tests before merge" |
DO NOT use for:
npx jest (or your project test runner))code-review skill)issue-debugging skill)STOP. Before ANY analysis, you MUST acknowledge what this skill requires.
This skill uses a multi-stage pipeline with sub-agents. You are the orchestrator, NOT the executor.
Run Stage 0 AST scripts before any LLM stages:
npx tsx skills/test-audit/scripts/verification-counter.ts {target} → /tmp/claude/ast-verify-count.jsonnpx tsx skills/test-audit/scripts/skip-detector.ts {target} → /tmp/claude/ast-skip-detect.jsonnpx tsx skills/test-audit/scripts/data-flow-analyzer.ts {target} → /tmp/claude/ast-data-flow.jsonSelect mode based on file count and threshold (default 5)
Spawn sub-agents for each applicable stage:
Task(subagent_type="general-purpose", model="haiku", ...)Task(subagent_type="general-purpose", model="sonnet", ...)Task(subagent_type="general-purpose", model="sonnet", ...)Write outputs to logs/:
logs/test-classification-{YYYYMMDD-HHMMSS}.yaml (Scale mode only)logs/mock-detection-{YYYYMMDD-HHMMSS}.yamllogs/test-audit-{YYYYMMDD-HHMMSS}.yamllogs/diagnostics/test-audit-{YYYYMMDD-HHMMSS}.yamlFollow the orchestration instructions exactly - do not substitute your own judgment
The pipeline exists for:
If you find yourself thinking "I can just analyze this directly" - STOP. That violates SC1-SC2 in Rules.md.
Before returning to user, verify ALL items:
/tmp/claude/ast-*.json (or graceful degradation logged)logs/test-classification-*.yamllogs/mock-detection-*.yamllogs/test-audit-*.yamllogs/diagnostics/test-audit-*.yaml (includes mode, threshold, AST status)If REWRITE_REQUIRED == true, also verify:
bug-magnet-data context file loaded for component typesafe_for_automation: false) excluded or marked manual-onlyDo NOT return to user until all applicable checklist items are verified.
/test-audit [path] [--threshold=N]
Examples:
/test-audit tests/ - Audit all tests in tests/ directory/test-audit src/__tests__/api.test.ts - Audit specific file/test-audit tests/ --threshold=10 - Force Scale mode for ≤10 files/test-audit - Audit tests mentioned in recent context (or prompt for path)/test-audit tests/
↓
┌─────────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR (Opus) - Main Context │
│ │
│ Stage 0: AST Pre-Processing (deterministic, no LLM) │
│ └─ npx tsx skills/test-audit/scripts/verification-counter.ts {target} │
│ └─ npx tsx skills/test-audit/scripts/skip-detector.ts {target} │
│ └─ npx tsx skills/test-audit/scripts/data-flow-analyzer.ts {target} │
│ └─ Output: /tmp/claude/ast-*.json │
│ │
│ Mode Selection: file_count ≤ threshold → Deep, else → Scale │
│ │
│ ┌─── DEEP MODE (≤5 files) ──────── SCALE MODE (>5 files) ────┐ │
│ │ │ │
│ │ [skip classification] Stage 1: Classification │ │
│ │ └─ Haiku + AST hints │ │
│ │ │ │
│ │ Stage 2: Detection Stage 2: Detection │ │
│ │ └─ Sonnet, ALL files └─ Sonnet, flagged only │ │
│ │ └─ Self-computes metadata └─ Uses classification │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ Stage 3: Synthesis (Sonnet) — unified for both modes │
│ │
│ Step 4: Present summary to user │
│ │
│ Step 5: If REWRITE_REQUIRED → Implement rewrites (Opus) │
│ │
└─────────────────────────────────────────────────────────────────────┘
When this skill is loaded, follow these steps exactly:
IF $ARGUMENTS provided:
target = $1 (first argument)
Parse optional flags:
--threshold=N → override default threshold (default: 5)
ELSE:
Look for test files in recent conversation context
IF found: target = that path
ELSE: Ask user: "Which test directory or file should I audit?"
This step is BINDING. Do NOT skip it. AST scripts provide deterministic metadata that replaces heuristic estimates. Skipping Stage 0 degrades audit accuracy.
YYYYMMDD-HHMMSS**/*.test.{ts,tsx,js,jsx} + **/*.spec.{ts,tsx,js,jsx})npx tsx skills/test-audit/scripts/verification-counter.ts {target} > /tmp/claude/ast-verify-count.json
npx tsx skills/test-audit/scripts/skip-detector.ts {target} > /tmp/claude/ast-skip-detect.json
npx tsx skills/test-audit/scripts/data-flow-analyzer.ts {target} > /tmp/claude/ast-data-flow.json
npx tsx skills/test-audit/scripts/integration-mock-detector.ts {target} > /tmp/claude/ast-integration-mocks.json
AST output schemas (for prompt injection into LLM stages):
// verify-count output (per file)
{ "file": "tests/user.test.ts", "metrics": { "total_lines": 156, "test_logic_lines": 98, "assertion_lines": 42, "setup_lines": 56, "effectiveness_percent": 42.86, "framework_detected": "jest" } }
// skip-detect output (per file)
{ "file": "tests/user.test.ts", "markers": [{ "type": "test.skip", "line": 42, "test_name": "should handle edge case", "severity": "medium", "rule": "T4" }], "summary": { "skip_count": 1, "only_count": 0, "todo_count": 0 } }
// ast-analyze output (per file)
{ "file": "tests/workflow.integration.ts", "violations": [{ "line": 42, "type": "T3+", "confidence": "high", "variable": "orderData", "source": "object_literal", "message": "Variable 'orderData' is manually constructed", "suggestion": "Replace with factory function or upstream function output" }] }
// integration-mocks output (per file)
{ "file": "tests/error-handler.test.ts", "sections": [{ "name": "Error Handler Integration", "type": "integration", "signal": "keyword_in_name", "line_start": 559, "line_end": 628 }], "leads": [{ "line": 562, "type": "T3", "confidence": "high", "mock_pattern": "jest.fn().mockImplementation()", "enclosing_block": "Error Handler Integration", "block_type": "integration", "message": "Mock call in integration test block", "suggestion": "Replace mock with actual implementation" }], "summary": { "sections_found": 1, "integration_sections": 1, "e2e_sections": 0, "leads_count": 1, "mock_calls_in_integration": 1, "mock_calls_in_e2e": 0 } }
threshold = $THRESHOLD_FLAG OR 5 (default)
file_count = count of test files in target
IF file_count <= threshold:
IF file_count > 25:
mode = "scale"
WARN "Deep mode safety cap exceeded (>25 files). Falling back to Scale mode."
ELSE:
mode = "deep"
ELSE:
mode = "scale"
Display mode selection to user:
## Test Audit: {mode} Mode
**Target:** {target}
**Files:** {file_count}
**Threshold:** {threshold}
**Mode:** {mode} ({rationale})
Stage 0 (AST): {status — success/partial/failed}
verify-count: {ok/failed}
skip-detect: {ok/failed}
ast-analyze: {ok/failed}
Proceeding with {mode} mode pipeline...
Skip this step entirely in Deep mode. In Deep mode, detection (Step 5) self-computes classification metadata using AST output.
test-classification skill (loaded via frontmatter dependency)Batching check:
IF file_count > 20:
Split files into batches of 20-25
FOR each batch IN PARALLEL:
Construct 4-part prompt with batch file list
INCLUDE AST hints in CONTEXT (verify-count + skip-detect per file)
Task(subagent_type="general-purpose", model="haiku",
prompt=batch_prompt, run_in_background=true)
Read all batch outputs
Merge into single classification YAML
ELSE:
Construct 4-part prompt using the skill's template
INCLUDE AST hints in CONTEXT (verify-count + skip-detect per file)
Task(subagent_type="general-purpose", model="haiku", prompt=...)
AST hints for classification CONTEXT:
The following AST-computed metadata is available for each file.
Use this to improve classification accuracy — these are deterministic,
not heuristic.
{for each file in target}:
file: {path}
ast_verification_lines: {metrics.test_logic_lines}
ast_assertion_lines: {metrics.assertion_lines}
ast_skip_markers: {markers array or "none"}
ast_data_flow_violations: {violations array or "none"}
logs/test-classification-{YYYYMMDD-HHMMSS}.yamlfiles array with classification dataBehavior differs by mode:
In Deep mode, ALL files are analyzed (no classification filtering). The detection agent self-computes classification metadata from AST output.
mock-detection skill (loaded via frontmatter dependency)Batching check (deep mode):
IF file_count > 10:
Split files into batches of 10-15
FOR each batch:
Include full AST metadata per file
Task(subagent_type="general-purpose", model="sonnet",
prompt=deep_mode_batch_prompt, run_in_background=true)
Read all batch outputs
Merge into single detection YAML
ELSE:
Task(subagent_type="general-purpose", model="sonnet",
prompt=deep_mode_prompt)
In Scale mode, only files flagged by classification are analyzed.
mock-detection skill (loaded via frontmatter dependency)needs_deep_analysis: true from classification outputBatching check (scale mode):
IF flagged_file_count > 10:
Split flagged files into batches of 10-15
FOR each batch:
Include verification_lines from classification for each file
Include AST metadata (data-flow violations, skip markers) per file
Task(subagent_type="general-purpose", model="sonnet",
prompt=batch_prompt, run_in_background=true)
Read all batch outputs
Merge into single detection YAML
ELSE:
Construct 4-part prompt using the skill's template
Include AST metadata in CONTEXT
Task(subagent_type="general-purpose", model="sonnet", prompt=...)
logs/mock-detection-{YYYYMMDD-HHMMSS}.yamlviolations array and file_summariesTask(
subagent_type="general-purpose",
model="sonnet",
prompt="[synthesis 4-part prompt]"
)
logs/test-audit-{YYYYMMDD-HHMMSS}.yamldirective.REWRITE_REQUIRED fieldDisplay audit summary to user before any rewrites:
## Test Audit Complete ({mode} Mode)
**Target:** {target}
**Files audited:** {total_files}
**Files analyzed:** {files_analyzed} (deep: all, scale: flagged only)
**Overall test effectiveness:** {percentage}%
### Stage 0 (AST)
- Verification lines: AST-precise (not heuristic)
- Skip markers (T4): {count} found
- Data flow leads (T3+): {count} found
### Violations by Priority
- P0 (False confidence): {count}
- P1 (Incomplete verification): {count}
- P2 (Pattern issues): {count}
### REWRITE_REQUIRED: {true/false}
Gate triggered: {gate description}
[If true] Proceeding with automatic rewrites...
[If false] No automatic rewrites needed. See recommendations below.
Apply two-gate logic from audit report:
Gate 1 (Impact):
IF any P0 violations exist:
REWRITE_REQUIRED = true
gate_triggered = "Gate 1: Impact (P0 violations - false confidence)"
Gate 2 (Threshold):
ELSE IF P1 violations exist:
IF any file has test_effectiveness < 95%:
REWRITE_REQUIRED = true
gate_triggered = "Gate 2: Threshold (P1 + effectiveness < 95%)"
ELSE:
REWRITE_REQUIRED = false
status = "Advisory only (P1 above 95% threshold)"
Advisory:
ELSE (P2 only):
REWRITE_REQUIRED = false
status = "Advisory only (P2 pattern issues)"
IF REWRITE_REQUIRED == true:
Read `references/rewrite-instructions.md` and follow the procedure
for each file in directive.files_to_rewrite (ordered by priority, then effectiveness).
Uses: assertion-patterns, component-patterns, bug-magnet-data skills.
ELSE:
Display recommendations without auto-rewrite
Read references/prompts/deep-mode-detection.md and use as the Task() prompt for the Sonnet detection sub-agent in Deep mode. Inject per-file AST metadata into the prompt's CONTEXT placeholders (verification_lines, skip_markers, data_flow_leads, integration_mock_leads from Stage 0 output).
Read references/prompts/synthesis.md and use as the Task() prompt for the Sonnet synthesis sub-agent. Inject the following into the prompt's CONTEXT placeholders:
{deep or scale} → current mode{classification_yaml_path} → classification log path (Scale) or "N/A" (Deep){detection_yaml_path} → detection log path{skip_detect_json} → AST skip-detect output{verify_count_json} → AST verify-count outputFull definitions: references/priority-classification.md
Full schema with example: references/schemas/audit-output.yaml
Key fields the orchestrator validates after synthesis:
directive.REWRITE_REQUIRED — boolean, drives Step 9directive.gate_triggered — which gate fireddirective.files_to_rewrite — ordered list for rewrite stepaudit.file_analysis[].test_effectiveness — per-file percentageaudit.overview.overall_effectiveness — aggregate metricWrite diagnostic output to logs/diagnostics/test-audit-{YYYYMMDD-HHMMSS}.yaml.
Schema: references/schemas/diagnostic-output.yaml. Includes mode selection, Stage 0 AST status, gate evaluation, and per-file decisions with verification_lines_source: ast | heuristic.
This skill can be triggered by:
/test-audit [path]*.test.* files suggests Test Audit pipelineBoth paths use the same orchestration flow.
All AST scripts live in skills/test-audit/scripts/ and are invoked directly via npx tsx:
| Recipe | Script | Purpose |
|--------|--------|---------|
| npx tsx skills/test-audit/scripts/verification-counter.ts | verification-counter.ts | Precise line counting (replaces heuristic) |
| npx tsx skills/test-audit/scripts/skip-detector.ts | skip-detector.ts | T4 skip/only/todo marker detection |
| npx tsx skills/test-audit/scripts/data-flow-analyzer.ts | data-flow-analyzer.ts | T3+ broken chain detection via data flow tracing |
Scripts use ts-morph for AST parsing, run via npx tsx, and output JSON to stdout. Dependencies are in skills/test-audit/scripts/package.json.
See references/known-limitations.md for full details including resolved limitations history.
Active limitations: T3+ single-file scope (~90% coverage), manual stub detection gaps (mitigated by Deep mode + extended patterns), context limits at scale (mitigated by batching).
| File | Purpose |
|------|---------|
| references/prompts/deep-mode-detection.md | 4-part prompt for Deep mode detection sub-agent |
| references/prompts/synthesis.md | 4-part prompt for synthesis sub-agent |
| references/schemas/audit-output.yaml | Output schema with example for audit report |
| references/schemas/diagnostic-output.yaml | Diagnostic output schema |
| references/priority-classification.md | P0/P1/P2 definitions with T-rule impact tables |
| references/known-limitations.md | Active and resolved limitations |
| references/rewrite-instructions.md | Step 9 rewrite procedure with bug-magnet-data integration |
test-classification (P0.6) - Classification prompt templatemock-detection (P0.7) - Detection prompt template + references/stub-patterns.md, references/false-positive-prevention.mdpipeline-templates (P0.3) - Test Audit pipeline definitionsubagent-prompting (P0.1) - 4-part template referencebug-magnet-data (P4.2) - Curated edge case test datatesting
Prompt template for test classification stage in Test Audit pipeline
development
Template for structured sub-agent invocation using 4-part prompting (GOAL/CONSTRAINTS/CONTEXT/OUTPUT) and F# pipeline notation. Use when orchestrating sub-agents or designing multi-agent workflows.
development
Template for structured sub-agent output including YAML log format, task completion reports (WHY/WHAT/TRADE-OFFS/RISKS), and summary constraints. Use when defining how sub-agents should report results.
development
Configures Language Server Protocol integration for Claude Code projects. Use when setting up LSP servers, verifying post-restart initialization, or troubleshooting broken LSP configurations.