skills/triaging-mutants/SKILL.md
Use when mutation testing reveals survived mutants — guides deep analysis of whether each mutant signals a missing test, a design improvement opportunity, or an equivalent mutation
npx skillsauth add mattwynne/yaks triaging-mutantsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A survived mutant is a question, not just a gap in test
coverage. When cargo mutants reports a miss, the obvious
response is to write a test that kills it. But that's often the
shallow fix. The deeper question is: why did this mutation
produce code that still works?
Three possibilities:
Option 2 is the most valuable and most commonly overlooked. Changing the modelling often obviates entire classes of mutants — making the code both more correct and simpler to test.
Don't use when:
dev mutate first)Don't ask which mutant to triage next — just pick one and go. After presenting a verdict and acting on it, immediately move to the next untriaged mutant. Keep momentum. The user will interrupt if they want to change direction.
Prefer domain-layer mutants over adapter-layer, and single-mutant files over multi-mutant files (quick wins build momentum).
The skill expects one of:
yx show "adapters-event_store-mod mutants")mutants.out/missed.txtmutants.out/missed.txt for a triage sessionFor each mutant (or group of mutants in the same file):
Read the mutant description from the yak context or
missed.txt:
src/domain/yak_map.rs:277:59: replace && with || in YakMap::update_state
Read the source code around the mutation site. Read enough context to understand the function's purpose — not just the mutated line, but the whole function and its callers.
Read existing tests that exercise this code path. Search for tests that call the function, Cucumber scenarios that exercise the behaviour, and any test helpers involved.
Understand the domain intent. What business rule or invariant does this code enforce? What would go wrong in the real world if this mutation were live?
For each mutant, work through this decision tree:
Is the mutated code reachable from tests?
├─ No → MISSING TEST (coverage gap)
│ But ask: why is it unreachable? Dead code?
╰─ Yes
Does the mutation change observable behaviour?
├─ No → EQUIVALENT MUTANT
│ The mutation is semantically identical.
│ Document why and mark accepted.
╰─ Yes
Do existing tests exercise the affected behaviour?
├─ No → MISSING TEST (specific scenario gap)
╰─ Yes
Why don't they catch it?
├─ Assertions are too weak → MISSING TEST
│ (tighten assertions or add a specific case)
╰─ The design tolerates both variants
→ DESIGN WEAKNESS
The behaviour is real and untested. Before writing the test, ask:
This is the high-value finding. The mutation survived because the design permits it. Common patterns:
| Pattern | Example | Better Design |
|---------|---------|---------------|
| Boolean blindness | if flag && other_flag — swapping &&/|| still works | Replace booleans with an enum that names the states |
| Primitive obsession | Using i32, String, bool where a domain type would constrain meaning | Introduce a newtype, enum, or value object that makes invalid states unrepresentable |
| Conditional logic | if/else or match chains selecting behaviour — flipping conditions still works | Replace with strategy pattern, polymorphism, or trait objects so the type system enforces the dispatch |
| Stringly-typed data | String manipulation where a type would constrain values | Introduce a newtype or enum |
| Redundant paths | Two code paths that happen to produce the same result | Unify the paths or make them clearly distinct |
| Weak preconditions | Function accepts broad input, only uses narrow range | Narrow the input type or add validation |
| Arithmetic on indices | + vs - doesn't matter because ranges overlap | Use named operations or iterator methods |
| Match arm equivalence | Deleting a match arm still works because the default covers it | Make the match exhaustive and meaningful |
| Dead code | Code exists but nothing exercises it | Remove it |
When you identify a design weakness, stop and discuss it with the user. Design changes have broader implications than adding a test. Present:
Wait for the user to decide: improve the design now, defer it to a new yak, or fall back to a test.
The mutation doesn't change behaviour. This is rare but real. Examples:
> with >= when the boundary value never occurs> with >= when a preceding == check already
handles the boundary (fix: use else if to make the
mutual exclusion explicit)Be sceptical of this classification. Most "equivalent" mutants are actually design weaknesses in disguise — the boundary value should occur, or the code should handle it.
To mark as accepted, propose adding a #[mutants::skip]
annotation with a comment explaining why:
#[mutants::skip] // Equivalent: migration version is never 0 in practice
fn source_version() -> u32 { 4 }
Always discuss with the user before marking equivalent.
Every mutant analysis MUST end with a clear verdict block. This is the deliverable — it tells the user what you found and what you recommend, in a format that supports quick decisions.
## Triage Verdict
**Mutant:** `<mutation description>` in `<file>:<line>`
**Classification:** Missing Test | Design Weakness | Equivalent Mutant
**Why it survives:** <1-2 sentence explanation of the mechanism>
**Recommendation:** <what to do — write a test, refactor, annotate>
**Test level:** Cucumber scenario | Unit test | Integration test
**Effort:** Small (1 assertion) | Medium (new test) | Large (refactoring)
### Questions (if any)
- <open question for the user>
For design weaknesses, add:
### Design Analysis
**Current design:** <what permits the mutation>
**Proposed improvement:** <how to fix it>
**Trade-off:** <cost vs benefit>
Group multiple mutants in the same file into a single verdict when they share the same root cause.
For each classified mutant, take the appropriate action:
Follow TDD:
Add test for [rule] — kills mutant in [file]:[line]
yx add "improve [description]" --under "[parent yak]"
cat <<'EOF' | yx context "improve [description]"
# Motivation
Mutant survived: [mutant description]
# Current Design
[What permits the mutation]
# Proposed Change
[How to improve it]
EOF
#[mutants::skip] with explanationAfter triaging all mutants in the session, summarise:
## Triage Summary
| Category | Count | Actions Taken |
|----------|-------|---------------|
| Missing test | N | N tests written |
| Design weakness | N | N fixed, N deferred |
| Equivalent | N | N annotated |
### Design Improvements Identified
- [Brief description of each]
### Deferred Items
- [Yaks created for later]
Update the mutant yak's state:
yx done "yak name"This skill grows through use. When a triage session produces a new insight — a design weakness pattern we haven't seen before, a classification heuristic that proved useful, or a mistake we learned from — update this skill file before closing the session.
Specifically:
The goal is that every triage session leaves the skill slightly better for the next one.
| Phase | What | Gate | |-------|------|------| | Gather | Read mutant, source, tests, domain intent | — | | Classify | Decision tree → missing test / design / equivalent | — | | Verdict | Present findings, recommendation, questions | User decides | | Act (test) | Write test, verify it kills mutant | Tests pass | | Act (design) | Present analysis, discuss with user | User approves | | Act (equivalent) | Annotate with skip + reason | User approves | | Report | Summary table, update yak state | — |
| Mistake | Fix |
|---------|-----|
| Immediately writing a test without analysing why the mutant survived | Always classify first — the test might be the wrong fix |
| Classifying a design weakness as "missing test" | If &&→|| survives, ask why both work, not just "add a test for &&" |
| Treating all mutants as independent | Group by file/function — patterns often emerge |
| Making design changes without discussing | Design weakness → always present to user first |
| Marking mutants as equivalent too readily | Most "equivalents" are design weaknesses in disguise |
| Writing tests that test the implementation, not the behaviour | Name the business rule, not the code line |
| Skipping the domain intent step | Understanding why the code exists is the whole point |
| Presenting analysis without a clear verdict | Always end with the Triage Verdict block — findings, recommendation, questions |
| Trusting stale line numbers from missed.txt | The file may have changed since the mutation run — always verify the actual code at the reported line |
| Assuming a test covers a mutant without checking | Read the test carefully — a test that sets state to "wip" doesn't help if the mutant is about "done" |
| Testing with only one instance when the mutant is about iteration | + offset vs - offset is invisible with one item — always test with 2+ items for loop/index mutants |
| Forgetting that test mode matters | cargo mutants uses in-process Cucumber mode — fullstack-only scenarios won't catch mutants |
testing
Use when writing or reviewing Gherkin features, especially after discovering examples or edge cases that reveal a new business rule
databases
Use when running yx commands that create, modify, or delete yaks outside of real project work — provides an isolated temp environment
documentation
Use when starting work on a yak - sets up an isolated git worktree, reads yak context, and guides the full cycle from claiming through merge and cleanup
development
Use when planning work by approaching goals and discovering blockers, before creating comprehensive plans