skills/mock-detection/SKILL.md
Deep mock appropriateness analysis for Test Audit pipeline
npx skillsauth add ashaykubal/essential-agents-skills mock-detectionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Prompt template for deep mock appropriateness analysis using call graph tracing. Designed for a Sonnet sub-agent to detect T1-T4 violations and track violation scope.
This is an internal skill loaded by the orchestrator during Test Audit pipeline.
| Context | Action |
|---------|--------|
| /test-audit invoked | Orchestrator loads this skill for Stage 2 |
| Test Audit pipeline triggered by hook | Orchestrator loads this skill for Stage 2 |
| Need deep mock analysis | Load directly as prompt template for Sonnet |
| Files flagged by test-classification | Analyze only needs_deep_analysis: true files |
DO NOT use for:
test-classification skill)test-audit skill)This skill provides the second stage prompt template:
test-audit (P0.8) orchestrates:
Stage 1: test-classification (Haiku) → classification YAML
Stage 2: mock-detection (Sonnet) → violations YAML ← THIS SKILL
Stage 3: synthesis (Sonnet) → audit report
The orchestrator loads this skill and constructs a 4-part prompt for a general-purpose Sonnet sub-agent.
Analyze flagged test files for T1-T4 violations using mock appropriateness rubric and call graph analysis. Track the full scope of each violation for test effectiveness calculation.
needs_deep_analysis: true from classificationClassification output: {classification_yaml_path}
Files to analyze: List of files with needs_deep_analysis: true
Mock appropriateness rubric: See "Mock Appropriateness Rubric" section below
T1-T4 detection patterns: See "T1-T4 Detection Patterns" section below
Violation scope tracking: See "Violation Scope Tracking" section below
Extended stub/fake patterns: See references/stub-patterns.md — Meszaros taxonomy, class hierarchy detection, factory function classification
False positive prevention: See references/false-positive-prevention.md — Two-tier allowlist (Universal Safe / Context-Dependent) and decision tree. Consult BEFORE flagging borderline patterns.
Write violations to: logs/mock-detection-{YYYYMMDD-HHMMSS}.yaml
Write diagnostics to: logs/diagnostics/mock-detection-{YYYYMMDD-HHMMSS}.yaml
Use the schema specified in "Output Schema" section below.
Determine whether mocks are appropriate based on test type:
| Test Type | Expected Mocks (OK) | Inappropriate Mocks (VIOLATION) | |-----------|---------------------|--------------------------------| | Unit | External deps (DB, HTTP, fs) to isolate unit | Mocking function/module under test (T1) | | Integration | Unrelated systems only | Mocking integration boundaries (T3), broken chain (T3+) | | E2E | Almost never | Any mock breaking end-to-end flow |
Test files commonly contain multiple test types in different describe blocks (e.g., unit tests at top, integration tests at bottom). You MUST evaluate mock appropriateness per describe block/section, not per file. A jest.fn() that is safe in a unit test section is a T3 violation in an integration test section of the same file.
Classification signals (language-agnostic — apply to TypeScript, Python, Java, Go, Ruby, etc.):
integration, e2e, end-to-end, acceptance, system// INTEGRATION TESTS, # E2E, /* system tests */If AST integration-mock metadata is available (from npx tsx skills/test-audit/scripts/integration-mock-detector.ts), use it as ground truth for section boundaries and mock locations. Validate AST leads and add any the AST missed.
BINDING: AST classification is final. When the AST script classifies a section as integration or e2e, that classification is NOT subject to LLM override. You MUST evaluate mocks in that section against integration/e2e rules — even if you believe the section is "actually" a unit test. Dismissing an AST T3 lead by re-classifying the section as a different test type is a rule violation. If you believe the section is mislabeled, note it as advisory — but still flag T3 violations against the classified type.
See references/false-positive-prevention.md § "Worked Example: Mixed-Type File" for a concrete demonstration.
A mock is inappropriate when it defeats the purpose of the test:
Severity: Critical Priority: P0 (False Confidence)
Detection patterns:
jest.spyOn(ModuleUnderTest, 'functionBeingTested')jest.mock('./file-being-tested')vi.mock() on the module imported in the test's subjectCall graph check: Trace from test assertion back to setup. If mock sits between "action" and "assertion" for the claimed behavior, it's T1.
Severity: High Priority: P1 (Incomplete Verification)
Detection patterns:
expect(fn).toHaveBeenCalled() without result assertionexpect(fn).toHaveBeenCalledWith(...) without verifying the effectverify(mock).someMethod() without outcome checkCall graph check: After the toHaveBeenCalled assertion, is there a result/state assertion for the same operation?
Severity: Critical Priority: P1 (Incomplete Verification)
Detection patterns:
.integration.* file: jest.mock('node-fetch'), jest.mock('fs'), jest.mock('http')Call graph check: Does the test claim to verify "integration with X" while mocking X?
Severity: Critical Priority: P0 (False Confidence)
Detection patterns:
mockData used where real function output should flowmockOrder.id passed into new objects)Class hierarchy signals (see references/stub-patterns.md for full taxonomy):
buildMock* or createFake*Call graph check: Trace data flow. If Component A should output to Component B, but test injects mockAOutput into B, the chain is broken.
Severity: Medium Priority: P2 (Pattern Issues)
Detection patterns:
npx jest (or your project test runner) not runNote: T4 is primarily a process check. Flag for manual review.
Track the full scope of each violation - not just the violation line, but all lines affected by it. This enables accurate test effectiveness calculation.
| Violation Type | Scope Definition | |----------------|------------------| | T1 (Mock SUT) | All lines that use the mock: assertions depending on mock, calls using mock return value | | T2 (Call-only) | The assertion line itself (single line) | | T3 (Mock boundary) | All lines using the mocked boundary (similar to T1) | | T3+ (Broken chain) | All lines using the incorrect/mocked data downstream |
// Line 15: Mock setup (violation line)
const mockSpawn = jest.spyOn(child_process, 'spawn')
.mockReturnValue(mockProcess);
// Lines 20-95: All use mockSpawn results
const proxy = startProxy(); // Line 20 - uses mock
await proxy.waitForReady(); // Line 21 - uses mock
expect(proxy.port).toBe(8080); // Line 25 - assertion on mock
// ... more lines using mock ...
expect(proxy.isRunning()).toBe(true); // Line 95 - still mock
// violation_scope: [15, 95]
// affected_lines: 80
For each flagged file, perform systematic analysis:
What does each test claim to verify? Look at:
it('starts proxy correctly', ...))How does data flow from setup → action → assertion?
Where do mocks intercept this flow?
Does the mock defeat the test's purpose?
metadata:
skill: mock-detection
timestamp: "{ISO-8601}"
classification_source: logs/test-classification-{YYYYMMDD-HHMMSS}.yaml
model: sonnet
files_analyzed: 5
violations:
- file: tests/proxy.test.ts
line: 15
violation_scope: [15, 95]
affected_lines: 80
rule: T1
severity: critical
priority: P0
pattern: "jest.spyOn(child_process, 'spawn')"
code_snippet: |
const mockSpawn = jest.spyOn(child_process, 'spawn')
.mockReturnValue(mockProcess);
reason: |
Test claims to verify "proxy starts correctly" but mocks spawn().
This provides false confidence - mock always succeeds.
Lines 15-95 all use this mock, making them ineffective.
suggested_fix: |
Replace mock with real spawn. Use port check to verify proxy started.
- file: tests/api.integration.ts
line: 8
violation_scope: [8, 45]
affected_lines: 37
rule: T3
severity: critical
priority: P1
pattern: "jest.mock('node-fetch')"
code_snippet: |
jest.mock('node-fetch');
// ... later in test
const response = await fetchUserData(userId);
reason: |
Integration test should verify real HTTP communication.
Mocking fetch defeats the purpose of integration testing.
suggested_fix: |
Remove jest.mock('node-fetch'). Use test server or MSW.
- file: tests/workflow.integration.ts
line: 42
violation_scope: [42, 78]
affected_lines: 36
rule: T3+
severity: critical
priority: P0
pattern: "Broken integration chain"
code_snippet: |
const result = await processOrder(mockOrderData);
// mockOrderData should come from createOrder() output
reason: |
Test uses mockOrderData instead of real createOrder() output.
This breaks the integration chain - no real integration tested.
suggested_fix: |
Replace mockOrderData with: const order = await createOrder(input);
- file: tests/config.test.ts
line: 42
violation_scope: [42, 42]
affected_lines: 1
rule: T2
severity: high
priority: P1
pattern: "expect(db.save).toHaveBeenCalled()"
code_snippet: |
await saveConfig(newConfig);
expect(db.save).toHaveBeenCalled();
reason: |
Verifies db.save was called but not what was saved.
Call verification without result verification is incomplete.
suggested_fix: |
Add result verification: expect(saved.value).toBe(newConfig.value);
totals:
critical: 3
high: 1
medium: 0
low: 0
total_affected_lines: 154
file_summaries:
# For each file, compute affected_lines as the UNION of all violation_scope ranges
# (merge overlapping/identical ranges). Do NOT sum individual affected_lines values.
# Example: two violations both scoped to [228, 269] = 42 affected lines, not 84.
- file: tests/proxy.test.ts
verification_lines: 95
affected_lines: 80
test_effectiveness: 16%
- file: tests/api.integration.ts
verification_lines: 55
affected_lines: 37
test_effectiveness: 33%
- file: tests/workflow.integration.ts
verification_lines: 50
affected_lines: 36
test_effectiveness: 28%
- file: tests/config.test.ts
verification_lines: 40
affected_lines: 1
test_effectiveness: 98%
summary: |
Analyzed 5 flagged files. Found 4 violations affecting 154 lines.
3 files below 95% test effectiveness threshold.
P0 violations (false confidence): proxy.test.ts, workflow.integration.ts
P1 violations (incomplete): api.integration.ts, config.test.ts
Write diagnostic output to logs/diagnostics/mock-detection-{YYYYMMDD-HHMMSS}.yaml:
diagnostic:
skill: mock-detection
timestamp: "{ISO-8601}"
model: sonnet
execution:
tool_calls: 35
files_analyzed: 5
analysis_depth: "call graph tracing"
decisions:
- file: tests/proxy.test.ts
decision: T1_violation
call_graph_analysis: |
Test claims: "proxy starts correctly"
Action: startProxy() calls child_process.spawn()
Mock: jest.spyOn intercepts spawn()
Result: Assertion verifies mock behavior, not real spawn
confidence: high
- file: tests/config.test.ts
decision: T2_violation
call_graph_analysis: |
Assertion: toHaveBeenCalled() on db.save
Missing: No assertion on saved data value
Scope: Single assertion line (minimal impact)
confidence: high
errors: []
Tests that pass but provide no real assurance:
Tests that run real code but don't fully verify:
Style and organization issues:
The orchestrator (P0.8) constructs the full prompt by:
Task(subagent_type="general-purpose", model="sonnet", prompt=...)logs/mock-detection-{YYYYMMDD-HHMMSS}.yamlFrom P0.6 (test-classification):
needs_deep_analysis: true file listverification_lines count per filemock_indicators as analysis starting pointsTo P0.8 (test-audit synthesis):
affected_lines per filetest_effectiveness per fileWhen processing many flagged files (>10), the orchestrator must batch detection to avoid context limits.
IF flagged_file_count > 10:
Split flagged files into batches of 10-15
FOR each batch:
Spawn Sonnet sub-agent with batch file list
Include verification_lines from classification for each file
Collect violations YAML for batch
Merge all batch results into single detection output
ELSE:
Process all flagged files in single sub-agent call
When merging batch results:
violations arraystotals across all batchesfile_summariesFor optimal performance, spawn batch sub-agents in parallel:
Task(subagent_type="general-purpose", model="sonnet", prompt=batch1_prompt, run_in_background=true)
Task(subagent_type="general-purpose", model="sonnet", prompt=batch2_prompt, run_in_background=true)
...
Read all outputs after completion, then merge.
| Document | Purpose |
|----------|---------|
| references/stub-patterns.md | Meszaros test double taxonomy, class hierarchy detection, factory function classification |
| references/false-positive-prevention.md | Two-tier allowlist (Universal Safe / Context-Dependent), decision tree for violation evaluation |
test-classification (P0.6) - Surface classification (upstream)test-audit (P0.8) - Orchestration and synthesis (downstream)pipeline-templates (P0.3) - Test Audit pipeline definitiontesting
Prompt template for test classification stage in Test Audit pipeline
testing
--- name: test-audit description: Audit test suites for T1-T4 violations using AST analysis, mock detection, and multi-stage synthesis. Invoke when user asks to audit tests, check test quality, find mock violations, review test effectiveness, or inspect test suites for over-mocking. Triggers automatic rewrites when quality gates fail. user-invocable: true argument-hint: [path] [--threshold=N] skills: - test-classification - mock-detection - assertion-patterns - component-pattern
development
Template for structured sub-agent invocation using 4-part prompting (GOAL/CONSTRAINTS/CONTEXT/OUTPUT) and F# pipeline notation. Use when orchestrating sub-agents or designing multi-agent workflows.
development
Template for structured sub-agent output including YAML log format, task completion reports (WHY/WHAT/TRADE-OFFS/RISKS), and summary constraints. Use when defining how sub-agents should report results.