skills/testing/testing-preferred-patterns/SKILL.md
Identify and fix testing mistakes: flaky, brittle, over-mocked tests.
npx skillsauth add notque/claude-code-toolkit testing-preferred-patternsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill identifies and fixes common testing mistakes across unit, integration, and E2E test suites. Tests should verify behavior, be reliable, run fast, and fail for the right reasons.
Scope: This skill focuses on improving test quality and reliability. It complements test-driven-development by addressing what goes wrong with tests, complementing how to write them correctly from scratch.
Out of scope: Writing new tests from scratch (use test-driven-development), fixing fundamental architectural issues (use systematic-refactoring), or profiling test performance with external tools.
| Signal | Load These Files | Why |
|---|---|---|
| implementation patterns | preferred-pattern-catalog.md | Loads detailed guidance from preferred-pattern-catalog.md. |
| tasks related to this reference | blind-spot-taxonomy.md | Loads detailed guidance from blind-spot-taxonomy.md. |
| errors, error handling | error-handling.md | Loads detailed guidance from error-handling.md. |
| fixing review feedback | fix-strategies.md | Loads detailed guidance from fix-strategies.md. |
| tests | load-test-scenarios.md | Loads detailed guidance from load-test-scenarios.md. |
| tasks related to this reference | quality-catalog.md | Loads detailed guidance from quality-catalog.md. |
| tasks related to this reference | quick-reference.md | Loads detailed guidance from quick-reference.md. |
Goal: Identify quality issues present in the target test code.
Step 1: Locate test files
Use Grep/Glob to find test files in the relevant area. If user pointed to specific files, start there. Common patterns:
*_test.gotest_*.py or *_test.py*.test.ts, *.spec.ts, *.test.js, *.spec.jsStep 2: Read CLAUDE.md
Check for project-specific testing conventions before flagging quality issues. Some projects intentionally deviate from general best practices. This prevents false positives based on organizational standards.
Step 3: Classify quality issues
For each test file, scan for these 10 categories (detailed examples in references/preferred-pattern-catalog.md):
| # | Pattern to Fix | Detection Signal |
|---|-------------|-----------------|
| 1 | Testing implementation details | Asserts on private fields, internal regex, spy on private methods |
| 2 | Over-mocking / brittle selectors | Mock setup > 50% of test code, CSS nth-child selectors |
| 3 | Order-dependent tests | Shared mutable state, class-level variables, numbered test names |
| 4 | Incomplete assertions | != nil, > 0, toBeTruthy(), no value checks |
| 5 | Over-specification | Exact timestamps, hardcoded IDs, asserting every default field |
| 6 | Ignored failures | @skip, .skip, xit, empty catch blocks, _ = err |
| 7 | Poor naming | testFunc2, test_new, it('works'), it('handles case') |
| 8 | Missing edge cases | Only happy path, no empty/null/boundary/error tests |
| 9 | Slow test suites | Full DB reset per test, no parallelization, no fixture sharing |
| 10 | Flaky tests | sleep(), time.Sleep(), setTimeout(), unsynchronized goroutines |
Step 4: Document findings
## Pattern Quality Report
### [File:Line] - [Pattern Name]
- **Severity**: HIGH / MEDIUM / LOW
- **Issue**: [What is wrong]
- **Impact**: [Flaky / slow / false-confidence / maintenance burden]
Gate: At least one quality issue identified with file:line reference. Proceed only when gate passes.
Goal: Rank findings by impact to fix the most damaging patterns first.
Priority order:
Constraint: Fix one pattern at a time. Mechanical bulk fixes (applying the same pattern to 50 tests without running them) miss context-specific nuances and cause regressions. Fix one, verify it works, then move to the next.
Constraint: Preserve test intent. When fixing quality issues, maintain what the test was originally trying to verify. Preserve the original test coverage scope.
Constraint: Prevent over-engineering. Fix the specific quality issue identified; make targeted fixes to the specific failure mode or delete tests and write new ones from scratch. Institutional knowledge lives in the existing tests.
Gate: Findings ranked. User agrees on scope of fixes. Proceed only when gate passes.
Goal: Apply targeted fixes to identified quality issues.
Step 1: For each quality issue (highest priority first):
ISSUE: [Name]
Location: [file:line]
Issue: [What is wrong]
Impact: [Flaky/slow/false-confidence/maintenance burden]
Current:
[problematic code snippet]
Fixed:
[improved code snippet]
Priority: [HIGH/MEDIUM/LOW]
Step 2: Apply fix
Constraint: Show real examples. Point to actual code when identifying quality issues, not abstract descriptions. Check for rationalization — if a test breaks during refactoring, that test was relying on buggy behavior. Investigate and fix the root cause, investigate and fix the root cause.
Constraint: Guide toward behavior testing. Always recommend testing observable behavior, not implementation internals. For example:
_getUser() → FIX: Test what happens when a user exists or doesn't existChange only what is needed to fix the failure mode. Consult references/fix-strategies.md for language-specific patterns.
Step 3: Run tests after each fix
Gate: Each fix verified individually. Tests pass after each change.
Goal: Confirm all fixes work together and suite is healthier.
Step 1: Run full test suite — all pass
Step 2: Verify previously-flaky tests are now deterministic (run 3x if applicable)
go test -count=3 -run TestFixed ./...pytest --count=3 tests/test_fixed.pyStep 3: Confirm no test was accidentally deleted or skipped
@skip or .skip annotations introducedStep 4: Summary report
## Fix Summary
Anti-patterns fixed: [count]
Files modified: [list]
Tests affected: [count]
Suite status: all passing / [details]
Remaining issues: [any deferred items]
Gate: Full suite passes. All fixes verified. Summary delivered.
See references/quality-catalog.md for detailed descriptions of all 10 failure modes (signals, why each is problematic, and fixes).
See references/error-handling.md for handling ambiguous patterns, fixes that change test behavior, and suites with hundreds of quality issues.
See references/quick-reference.md for the quick reference table, red flags during review, and TDD relationship notes.
${CLAUDE_SKILL_DIR}/references/quality-catalog.md: Detailed descriptions of all 10 failure modes${CLAUDE_SKILL_DIR}/references/error-handling.md: Ambiguous patterns and large-scale cleanup guidance${CLAUDE_SKILL_DIR}/references/quick-reference.md: Quick reference table, red flags, TDD relationship${CLAUDE_SKILL_DIR}/references/preferred-pattern-catalog.md: Detailed code examples for all 10 failure modes (Go, Python, JavaScript)${CLAUDE_SKILL_DIR}/references/fix-strategies.md: Language-specific fix patterns and tooling${CLAUDE_SKILL_DIR}/references/blind-spot-taxonomy.md: 6-category taxonomy of what high-coverage test suites commonly miss (concurrency, state, boundaries, security, integration, resilience)${CLAUDE_SKILL_DIR}/references/load-test-scenarios.md: 6 load test scenario types (smoke, load, stress, spike, soak, breakpoint) with configurations and critical endpoint prioritiesdocumentation
Document translation: quick/normal/refined modes with chunked parallel subagents and glossary support.
development
AI image generation: Gemini and Nano Banana backends; single/series/batch workflows with prompt-to-disk.
testing
Unified voice content generation pipeline with mandatory validation and joy-check. 13-phase pipeline: LOAD, GROUND, STATS-CHECKPOINT, GENERATE, HOOK-GATE, VALIDATE, REFINE, VARIETY-GATE, JOY-CHECK, ANTI-AI, CLOSE-GATE, OUTPUT, CLEANUP. Use when writing articles, blog posts, or any content that uses a voice profile. Use for "write article", "blog post", "write in voice", "generate content", "draft article", "write about".
documentation
Critique-and-rewrite loop for voice fidelity validation.