skills/atdd-mutate/SKILL.md
Use to add a third validation layer to the ATDD workflow — after acceptance tests verify WHAT and unit tests verify HOW, mutation testing verifies the tests actually catch bugs. Triggers — "/mutate", "/kill-mutants", "run mutation testing", "mutate my code", "kill mutants", "check test quality", "find surviving mutants", "run stryker", "run mutmut", "run pitest", "are my tests catching bugs".
npx skillsauth add swingerman/atdd atdd-mutateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Add a third validation layer to the ATDD two-stream testing approach. Acceptance tests verify WHAT, unit tests verify HOW, mutation testing verifies that the tests actually catch bugs.
Mutation testing introduces deliberate bugs (mutants) into source code, then runs the test suite. If tests fail, the mutant is killed (good). If tests pass despite the bug, the mutant survives (test gap found).
Source code → introduce mutation → run tests
├── tests FAIL → mutant killed ✓
└── tests PASS → mutant survived ✗
A project with 100% code coverage can still have a 60% mutation score — meaning 40% of introduced bugs go undetected by the test suite.
Run mutation testing after both test streams are green:
This is Phase 6 in the team-based ATDD workflow, or a standalone quality check at any point during development.
The preferred approach is to build a custom mutation tool for the project. This follows the methodology Uncle Bob developed for empire-2025 — a project-specific tool that walks the AST/source tree, applies one mutation at a time, runs targeted tests, and reports survivors.
+ → -, true → false,
>= → >) plus matching logic that walks the AST/form treedae_mutmap.py to mutate only changed functions and to update
the manifest after the run. See the Differential Mutation Testing section.| Category | Examples |
|----------|----------|
| Arithmetic | + ↔ -, * ↔ /, ++ ↔ -- |
| Comparison | > ↔ >=, < ↔ <= |
| Equality | == ↔ != |
| Boolean | true ↔ false, && ↔ || |
| Conditional | negate conditions, swap if/if-not |
| Constant | 0 ↔ 1, "" ↔ "mutant" |
| Return value | return true → return false |
| Void method | remove method call entirely |
For the full architecture and detailed reference, see
references/frameworks.md.
When speed of setup is more important than tight integration, use an established mutation framework as a secondary option:
| Language | Framework | |----------|-----------| | JavaScript/TypeScript | Stryker | | Python | mutmut | | Java/JVM | PIT (pitest) | | C# | Stryker.NET | | Rust | cargo-mutants | | Go | go-mutesting | | Ruby | mutant | | Scala | Stryker4s |
For install commands, configuration, and CLI reference, see
references/frameworks.md.
Mutation testing is slow — re-running it after a small change re-mutates every function. Differential mutation testing re-mutates only the functions whose code or covering tests changed, reusing cached results for the rest.
dae_mutmap.py (select before the run,
update after). Results live in a committed mutation-manifest.json
beside the tool, so the saving reaches CI and every clone. A function is
re-mutated when its code, its covering tests, or the mutation operator set
changed. See ${CLAUDE_PLUGIN_ROOT}/references/differential-mutation.md.--incremental), PIT (withHistory), and
mutmut have native incremental modes; enable the framework's incremental flag
and commit its history file. Do not build a separate manifest for the
framework path.Before Step 1, create one TodoWrite todo per step of this workflow (Steps 1–6),
all at once — the full list up front, as a roadmap. Flip each todo to
in_progress / completed as you go. See
${CLAUDE_PLUGIN_ROOT}/references/progress-indicator.md.
Before running mutation testing, confirm:
If no mutation tool is configured:
.build/ (generated tests and IR) and the acceptance/ pipeline code from mutationImportant: Configure mutation testing to target source code only. Never mutate test files, spec files, or generated pipeline code.
On the custom-tool path, run dae_mutmap.py select first and mutate only the
functions it returns — or all of them when it returns ALL. On the framework
path, the incremental flag handles this. Then execute and collect results:
For each surviving mutant:
>= → >, removed function call)Equivalent mutants are mutations that don't change observable behavior
(e.g., changing x = x + 0). These can be ignored.
For each real survivor:
On the custom-tool path, run dae_mutmap.py update to refresh
mutation-manifest.json. The report combines this run's fresh results with the
manifest's cached entries for unchanged functions — mark the cached ones
("unchanged since last_mutated"). Present a summary:
Mutation Testing Report
═══════════════════════
Score: 87% → 95% (after killing survivors)
Killed: 190 / 200
Survived: 10 → 5 (5 equivalent mutants ignored)
New tests: 5 unit tests added
Remaining survivors (equivalent mutants):
- src/utils.js:42 — changed `x + 0` to `x + 1` (no-op mutation)
- ...
| Score | Assessment | |-------|-----------| | 90%+ | Strong test suite — minor gaps only | | 70-89% | Moderate — meaningful gaps to address | | < 70% | Weak — significant untested behavior |
A 100% mutation score is not always practical or necessary. Focus on killing mutants that represent real behavioral gaps, not chasing equivalent mutants.
Mutation testing extends the existing two-stream approach:
1. Write specs (WHAT) ← acceptance tests
2. Implement with TDD (HOW) ← unit tests
3. Verify test quality (REAL?) ← mutation testing
When using the atdd-team skill, mutation testing is part of Phase 6
(Verify & Harden), run by the architect — an agent whose agent_id
is independent of the implementer and the refiner.
No. Fix failing tests first. Mutation testing assumes a green baseline.
Not practical. Equivalent mutants inflate the denominator. Aim for 90%+ and document the equivalent mutants that remain.
Never mutate generated test files or the acceptance pipeline. Only mutate source code under development.
For detailed framework setup and configuration:
references/frameworks.md — Installation, configuration, and CLI
reference for each supported mutation testing frameworkdata-ai
Use immediately after a PR is merged to clean up the local feature branch and resync main. Triggers — "/engineer.post-merge", "did we merge", "did we push", "PR merged", "post-merge cleanup", or right after a `gh pr merge` succeeds in the same session.
data-ai
Use to drive a bug fix from first report through close, with a "why didn't we catch it?" loop at the end. Triggers — "/engineer.fix", "a bug came in", "this is broken", "a user reported X", "there's a defect", "we have a regression", "this needs a fix", "another report", "more issues", "still failing", "validation failed again", "another bug", "next defect", "more fixes".
testing
Use mid-task when the working thread is lost — after a context compaction, a long agent run, or coming back to a feature unsure of the role, the current checkpoint, or the next action. Triggers — "/engineer.reorient", "reorient", "re-anchor", "what should I be doing right now", "I lost track", "where was I".
development
Use to check a feature's code against the charter's architecture rules — dependency layering, cycles, forbidden patterns, file naming, file size. Triggers — "/engineer.arch-check", "architecture check", "check architecture fitness", "does this follow the charter", "check layering".