skills/testing/mutation-test-suite-optimizer/SKILL.md
Uses mutation testing to find weak assertions and missing tests — injects small bugs and checks if the suite catches them, then generates tests targeting the surviving mutants. Use when coverage is high but bugs still ship, when auditing test quality, or when deciding if the suite is good enough.
npx skillsauth add santosomar/general-secure-coding-agent-skills mutation-test-suite-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Coverage says "this line ran." Mutation testing says "this line ran AND if it were wrong, a test would catch it." Surviving mutants are lines where the test suite is blind.
> → >=. + → -. return x → return None.| Operator | Mutation | Exposes |
| -------------- | ------------------------------------- | ------------------------------------- |
| AOR (arithmetic) | + ↔ -, * ↔ / | Tests that don't check actual values |
| ROR (relational) | < ↔ <= ↔ == ↔ != ↔ >= ↔ > | Off-by-one, boundary untested |
| COR (conditional) | and ↔ or, negate condition | Branches where only one arm matters |
| LVR (literal value) | 0 → 1, 1 → 0, "x" → "" | Magic numbers with no assertion |
| SDL (statement delete) | Remove a line | Dead code, or unchecked side effects |
| RVR (return value) | return x → return None, return 0 | Caller ignores return value |
→ smart-mutation-operator-generator for domain-specific mutations beyond these.
| Ecosystem | Tool |
| --------- | --------------------------------------------------- |
| Python | mutmut, cosmic-ray |
| Java | pitest |
| JS/TS | stryker |
| Ruby | mutant |
| C/C++ | mull, dextool mutate |
These take hours on big codebases. Scope to changed files: mutmut run --paths-to-mutate src/pricing.py.
Not every survivor needs a test. Classify:
| Survivor type | Action |
| ------------------------------------ | ----------------------------------------------------- |
| Equivalent mutant | x = x * 1 → x = x / 1 — same behavior. Ignore. |
| Dead code mutant | Mutated line never runs. → dead-code-eliminator |
| Weak assertion | Test ran the line but didn't check the result. Fix the test. |
| Missing boundary | < vs <= both pass — never tested the boundary. Add test. |
| Unchecked side effect | Mutant deletes a log call, nothing notices. Decide: is this worth testing? |
Code:
def discount(price, tier):
if tier == "gold":
return price * 0.8
return price
Mutation report:
SURVIVED: discount.py:3 — `price * 0.8` → `price * 0.9`
Existing test:
def test_gold_discount():
assert discount(100, "gold") < 100 # ← too weak
Both 0.8 and 0.9 give something < 100. The test is imprecise.
Fix — strengthen the assertion:
def test_gold_discount():
assert discount(100, "gold") == 80 # 20% off, exactly
Now * 0.9 → 90 != 80 → test fails → mutant killed.
Another survivor:
SURVIVED: discount.py:2 — `tier == "gold"` → `tier != "gold"`
Only test is gold. The non-gold path is covered (return price) but the condition isn't — both == and != give the right answer for this one input. Need a second input:
def test_non_gold_no_discount():
assert discount(100, "silver") == 100
Now != would give silver → 0.8 * 100 = 80 != 100 → killed.
Some mutants can't be killed because they're behaviorally identical:
for i in range(len(xs)): # mutant: range(len(xs)) → range(0, len(xs))
range(n) and range(0, n) are the same. No test can distinguish them. Mark as equivalent and move on.
Detecting equivalence is undecidable in general. Heuristics: if you've spent 5 minutes trying to kill a mutant and every test you write passes on both, it's probably equivalent.
Full mutation on a big codebase: hours to days. Scope it:
assert discount.__code__.co_consts[1] == 0.8 kills the mutant and is a terrible test.logger.info(...) doesn't fail any test. No, you probably don't want to assert on every log line.## Mutation run
Tool: <mutmut/pitest/stryker> Scope: <files>
Mutants: <total> Killed: <N> Survived: <M> Timeout: <T> Score: <%>
## Survivors — triaged
### Weak assertions (fix the test)
| Mutant | Location | Existing test | Why it survived | Fixed assertion |
| ------ | -------- | ------------- | --------------- | --------------- |
### Missing tests (add a test)
| Mutant | Location | Missing case | New test |
| ------ | -------- | ------------ | -------- |
### Equivalent (ignore)
| Mutant | Why equivalent |
| ------ | -------------- |
### Dead code (remove)
| Mutant | Evidence |
| ------ | -------- |
## After fixes
Projected score: <%> (killed +<N>, marked equivalent +<M>)
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.