skills/testing/smart-mutation-operator-generator/SKILL.md
Generates domain-specific mutation operators beyond the standard arithmetic/relational set — mutations tailored to your codebase's idioms, APIs, and bug history that standard tools don't try. Use when generic mutation testing plateaus, when your domain has specific failure modes, or when mining bug history reveals patterns standard operators miss.
npx skillsauth add santosomar/general-secure-coding-agent-skills smart-mutation-operator-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Standard mutation (+→-, <→<=) tests generic arithmetic. Your bugs aren't generic. If your last three bugs were "forgot to check .is_expired()," no standard operator catches that. Write one that does.
| Standard operator | Catches | Misses |
| ----------------- | ---------------------------- | --------------------------------------------- |
| + ↔ - | Arithmetic errors | API misuse, resource leaks, protocol errors |
| < ↔ <= | Off-by-one | Wrong method called on similar API |
| Remove statement | Unchecked side effects | Wrong order of operations on stateful API |
| Negate condition | Untested branches | Missing check entirely (not wrong check) |
Past 80% mutation score on standard operators, the survivors are mostly equivalent mutants. The real bugs live elsewhere.
Your git log is a mutation operator database:
git log --all --grep="fix\|bug" -p --since="2 years ago" > fix-commits.txt
For each fix diff, ask: what's the inverse? The inverse of the fix is the mutation.
| Bug fix | Inverse = mutation operator |
| -------------------------------------------- | ------------------------------------------------- |
| Added if session.is_expired(): raise ... | Delete is_expired() checks |
| Changed json.loads(s) → json.loads(s, parse_float=Decimal) | Drop parse_float kwarg |
| Changed .get(k) → .get(k, default) | Drop the default from dict.get |
| Added conn.commit() before return | Delete commit calls before return |
| Swapped list.pop() → list.pop(0) | Swap pop index (LIFO ↔ FIFO) |
| Added @transaction.atomic | Remove @transaction.atomic decorators |
Each of these is a mutation your standard tool doesn't try. Each of them represents a real past bug — if it slips in again, will tests catch it?
| Domain | Operator | Bug it simulates |
| ---------------- | ----------------------------------------------------------- | --------------------------------------- |
| Web/HTTP | Swap status codes: 200↔201, 400↔404, 401↔403 | Wrong status → client mishandles |
| Web/HTTP | Drop a response header | Missing CORS/cache header |
| Async | Remove await (Python) / .await (Rust) | Fire-and-forget bug |
| Async | Swap asyncio.gather ↔ sequential loop | Lost concurrency, or race introduced |
| DB/ORM | Remove .select_for_update() | Lost locking → race condition |
| DB/ORM | Swap .filter() ↔ .exclude() | Inverted query |
| DB/ORM | Drop .distinct() | Duplicate rows |
| Locking | Remove with lock: wrapper | Unprotected critical section |
| Resource | Remove .close() / drop with block | Resource leak |
| Serialization | Swap field order in tuple unpacking | Misaligned deserialization |
| Retry logic | Set max_attempts=1 | Retry logic never exercised |
| Auth | Replace check_permission(user) with True | Auth bypass — does any test notice? |
Most mutation tools accept custom operators. mutmut example:
# mutmut_config.py
def pre_mutation(context):
line = context.current_source_line
# Custom operator: remove `parse_float=Decimal` from json.loads calls
# (Simulates bug #4521 — float precision loss in financial fields)
if "json.loads" in line and "parse_float=Decimal" in line:
context.mutations.append(
line.replace(", parse_float=Decimal", "")
)
# Custom operator: remove `@transaction.atomic` decorator
# (Simulates bug #3390 — partial writes on exception)
if line.strip() == "@transaction.atomic":
context.mutations.append("") # delete the decorator line
# Custom operator: swap `.get(k, default)` → `.get(k)`
# (Simulates bug #2817 — KeyError on missing config)
import re
m = re.search(r"\.get\(([^,]+),\s*[^)]+\)", line)
if m:
context.mutations.append(
line[:m.start()] + f".get({m.group(1)})" + line[m.end():]
)
Each operator cites the bug it came from. When a mutant from operator X survives, you know: "the test suite wouldn't catch bug #4521 if it recurred."
Before adding an operator permanently:
Replace any string with "" produces thousands of mutants, mostly noise. Replace auth token with "" produces a few, all meaningful.Delete line 47 of auth.py is a test, not an operator. Operators are patterns.swap pop index is in the list.## Bug history mined
| Bug # | Fix summary | Inverse mutation |
| ----- | ----------- | ---------------- |
## Proposed operators
### OP-<N>: <name>
Pattern: <what it matches>
Mutation: <what it changes>
Bug class: <cites bug # or CWE>
Expected sites: ~<N> in current codebase
Equivalence risk: <low | medium | high — and why>
## Operator implementation
<tool-specific config — mutmut_config.py, pitest plugin, stryker plugin>
## Dry run
| Operator | Mutants generated | Killed | Survived | Equivalent (est.) |
| -------- | ----------------- | ------ | -------- | ----------------- |
## Recommended for adoption
<subset with good kill/survive/equivalent ratio>
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.