skills/mine.mutation-test/SKILL.md
Use when the user says: "mutation test", "do my tests actually catch bugs", or wants to verify test quality. Intentionally breaks code to check if tests catch real bugs. No framework needed.
npx skillsauth add NodeJSmith/Claudefiles mine.mutation-testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Intentionally introduce bugs into source code and verify that tests catch them. Each surviving mutation is a concrete, demonstrable gap in test coverage. Claude-driven — no external framework needed.
$ARGUMENTS — what to mutation-test. Can be:
/mine.mutation-test src/services/payment.py/mine.mutation-test calculate_total/mine.mutation-test src/auth/A clean working tree is the safety net under the safety net. If an Edit-based revert fails, git checkout -- <file> must be available.
git status to check for uncommitted changesAskUserQuestion:
question: "The working tree has uncommitted changes. Mutation testing needs a clean tree as a safety net for reverting mutations. How should we proceed?"
header: "Dirty tree"
multiSelect: false
options:
- label: "Stash changes (Recommended)"
description: "Run git stash, proceed with mutation testing, then git stash pop when done"
- label: "Commit first"
description: "Commit the current changes before starting"
- label: "Proceed anyway"
description: "Skip the safety net — I'll handle reverting manually if needed"
If the user chose stash, run git stash and remember to git stash pop at the end.
tests/test_<module>.py, tests/<module>/test_*.pygit-branch-diff-files to get changed source files on this branch (exclude test files). This handles base-branch detection automatically.git diff --name-only HEAD~3AskUserQuestion:
question: "Which file should I mutation-test?"
header: "Target"
multiSelect: false
options:
- label: "src/services/payment.py (Recommended)"
description: "Changed in 3 recent commits, has test_payment.py"
- label: "src/auth/tokens.py"
description: "Changed in 1 recent commit, has test_tokens.py"
Before mutating anything, run the relevant tests and confirm they all pass. If tests are already failing, stop:
Tests are failing before any mutations. Fix the failing tests first — mutation testing requires a green baseline.
Read the target code carefully. Identify mutation points that represent real bugs a developer might introduce — not every possible character swap.
| Category | Example | Severity |
|----------|---------|----------|
| Guard removal | Delete an if not user: early return | Critical |
| Logic flip | and to or, negate a condition | Critical |
| Exception removal | Remove a raise ValueError(...) | Critical |
| Return value | return True to return False, return x to return None | High |
| Boundary | > to >=, < to <=, == to != | High |
| Off-by-one | range(n) to range(n-1), [1:] to [2:] | High |
| Arithmetic | + to -, * to / | Medium |
| Constant | Change a string literal, numeric constant | Low |
logger.info(...), print(...))if DEBUG:, assert statements)__repr__ / __str__ methodsAfter identifying mutation points, present the plan:
## Mutation Plan: src/services/payment.py
| # | Line | Category | Mutation |
|---|------|----------|----------|
| 1 | 23 | Guard removal | Remove `if not amount > 0: raise ValueError` |
| 2 | 31 | Logic flip | `and` → `or` in discount eligibility check |
| 3 | 45 | Boundary | `>=` → `>` in minimum order threshold |
| 4 | 52 | Return value | `return total` → `return None` |
| 5 | 67 | Exception removal | Remove `raise InsufficientFunds` |
| 6 | 78 | Arithmetic | `price * quantity` → `price + quantity` |
| 7 | 89 | Off-by-one | `items[1:]` → `items[2:]` |
**7 mutations planned** — running tests/test_payment.py per mutation.
AskUserQuestion:
question: "Here's the mutation plan. Proceed with all 7, or narrow the scope?"
header: "Scope"
multiSelect: false
options:
- label: "Run all 7 (Recommended)"
description: "Full mutation testing pass"
- label: "Critical + High only"
description: "Skip Medium/Low severity — focus on the mutations that matter most"
- label: "Let me pick"
description: "I'll choose specific mutations from the list"
For each mutation, follow this exact sequence:
old_string that will be used to revert)old_string and new_string)git checkout -- <file>Maintain a running results table:
| # | Mutation | Result | Details |
|---|----------|--------|---------|
| 1 | Remove guard (L23) | Killed | test_negative_amount caught it (ValueError) |
| 2 | Logic flip (L31) | Survived | No test checks discount eligibility edge case |
| 3 | Boundary >= → > (L45) | Killed | test_minimum_order caught it |
| ...
## Mutation Testing Results: src/services/payment.py
**Score: 5/7 killed (71%)**
### Killed (tests caught the bug)
| # | Mutation | Caught by |
|---|----------|-----------|
| 1 | Remove guard (L23) | test_negative_amount |
| 3 | Boundary >= → > (L45) | test_minimum_order |
| 4 | Return None (L52) | test_calculate_total_returns_value |
| 5 | Remove raise (L67) | test_insufficient_funds |
| 6 | price * qty → price + qty (L78) | test_multi_item_order |
### Survived (tests missed the bug)
| # | Mutation | Severity | Why it matters |
|---|----------|----------|----------------|
| 2 | `and` → `or` in discount check (L31) | Critical | A logic flip in eligibility could give discounts to ineligible users |
| 7 | `items[1:]` → `items[2:]` (L89) | High | Off-by-one would silently skip the second item in processing |
AskUserQuestion:
question: "2 mutations survived — these are gaps in your test coverage. Which should I write tests for?"
header: "Fix"
multiSelect: true
options:
- label: "Discount logic flip (Recommended)"
description: "Critical severity — and/or flip could give unauthorized discounts"
- label: "Off-by-one in item processing"
description: "High severity — would silently skip items"
- label: "Skip all"
description: "I'll address these later"
This is a mini TDD cycle: the mutation is the "bug" and the new test is the "fix."
## Mutation Testing Summary: src/services/payment.py
### Score
- **Before**: 5/7 killed (71%)
- **After**: 7/7 killed (100%)
### Tests added
- `test_discount_ineligible_user` — verifies discount logic rejects users who don't meet all criteria
- `test_process_all_items` — verifies no items are skipped during processing
### Remaining survivors
None — all mutations killed.
### Confidence assessment
The test suite for payment.py now covers guard clauses, boundary conditions, return values, exception paths, and logic branches. The main remaining risk area is integration behavior (payment gateway responses), which mutation testing can't cover — integration tests are needed for that.
If the working tree was stashed in Phase 0, run git stash pop now.
development
Use when the user says: 'create an issue', 'file an issue', 'open an issue', 'write an issue', 'new issue for this'. Codebase-aware issue creation — investigates the code to produce well-structured issues with acceptance criteria, affected areas, and enough detail for automated triage.
development
Use when the user says: 'triage issues', 'classify issues by complexity', 'assess issue complexity', 'find quick wins', 'which issues are small', 'batch issue assessment'. Batch codebase-aware issue triage — parallel Haiku subagents assess actual complexity and effort by reading the code, not just titles.
development
Use when the user says: "review my changes", "run the reviewers", "code and integration review", "readability review", "maintainability review", "sniff test this", "WTF check", "code smells", "is this code any good", "fresh eyes on this branch", "review this directory", "check this module". Dispatches three parallel reviewers — code, integration, and a readability pass — and consolidates findings into one prioritized report.
development
Use when the user says: "clean code check", "style review", "LLM smell check", "code hygiene", "nitpick this", "style check", "find style sins", "nitpicker review", "anal retentive review", "exhaustive style review", "no-filter style report". Dispatches three parallel stylistic checkers — llm-checker (training-bias patterns), lazy-checker (deferred debt), and nitpicker (style hygiene) — and consolidates findings into a report organized by checker with a Summary section for orchestration consumption.