Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/mutation-test-suite-optimizer

Name: mutation-test-suite-optimizer
Author: santosomar

skills/testing/mutation-test-suite-optimizer/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills mutation-test-suite-optimizer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Mutation Test Suite Optimizer

Coverage says "this line ran." Mutation testing says "this line ran AND if it were wrong, a test would catch it." Surviving mutants are lines where the test suite is blind.

How it works

Mutate: Apply a small change to the code. > → >=. + → -. return x → return None.
Run tests: Does any test fail?
Killed if a test fails — the suite noticed the bug. Survived if all pass — the suite is blind here.
Mutation score = killed / total. Higher is better. > 80% is good.

Standard mutation operators

| Operator | Mutation | Exposes | | -------------- | ------------------------------------- | ------------------------------------- | | AOR (arithmetic) | + ↔ -, * ↔ / | Tests that don't check actual values | | ROR (relational) | < ↔ <= ↔ == ↔ != ↔ >= ↔ > | Off-by-one, boundary untested | | COR (conditional) | and ↔ or, negate condition | Branches where only one arm matters | | LVR (literal value) | 0 → 1, 1 → 0, "x" → "" | Magic numbers with no assertion | | SDL (statement delete) | Remove a line | Dead code, or unchecked side effects | | RVR (return value) | return x → return None, return 0 | Caller ignores return value |

→ smart-mutation-operator-generator for domain-specific mutations beyond these.

Step 1 — Run the mutation tool

| Ecosystem | Tool | | --------- | --------------------------------------------------- | | Python | mutmut, cosmic-ray | | Java | pitest | | JS/TS | stryker | | Ruby | mutant | | C/C++ | mull, dextool mutate |

These take hours on big codebases. Scope to changed files: mutmut run --paths-to-mutate src/pricing.py.

Step 2 — Triage survivors

Not every survivor needs a test. Classify:

| Survivor type | Action | | ------------------------------------ | ----------------------------------------------------- | | Equivalent mutant | x = x * 1 → x = x / 1 — same behavior. Ignore. | | Dead code mutant | Mutated line never runs. → dead-code-eliminator | | Weak assertion | Test ran the line but didn't check the result. Fix the test. | | Missing boundary | < vs <= both pass — never tested the boundary. Add test. | | Unchecked side effect | Mutant deletes a log call, nothing notices. Decide: is this worth testing? |

Worked example — killing a survivor

Code:

def discount(price, tier):
    if tier == "gold":
        return price * 0.8
    return price

Mutation report:

SURVIVED: discount.py:3 — `price * 0.8` → `price * 0.9`

Existing test:

def test_gold_discount():
    assert discount(100, "gold") < 100   # ← too weak

Both 0.8 and 0.9 give something < 100. The test is imprecise.

Fix — strengthen the assertion:

def test_gold_discount():
    assert discount(100, "gold") == 80   # 20% off, exactly

Now * 0.9 → 90 != 80 → test fails → mutant killed.

Another survivor:

SURVIVED: discount.py:2 — `tier == "gold"` → `tier != "gold"`

Only test is gold. The non-gold path is covered (return price) but the condition isn't — both == and != give the right answer for this one input. Need a second input:

def test_non_gold_no_discount():
    assert discount(100, "silver") == 100

Now != would give silver → 0.8 * 100 = 80 != 100 → killed.

Equivalent mutants — don't fight them

Some mutants can't be killed because they're behaviorally identical:

for i in range(len(xs)):   # mutant: range(len(xs)) → range(0, len(xs))

range(n) and range(0, n) are the same. No test can distinguish them. Mark as equivalent and move on.

Detecting equivalence is undecidable in general. Heuristics: if you've spent 5 minutes trying to kill a mutant and every test you write passes on both, it's probably equivalent.

Budget — mutation testing is slow

Full mutation on a big codebase: hours to days. Scope it:

Changed files only on every PR.
Weekly full run on the whole codebase.
Sample: mutate 10% of operators randomly — statistical estimate of mutation score.
Timeout per mutant: if a mutant makes tests hang, kill it (counts as killed — it changed behavior, even if the change is "infinite loop").

Do not

Do not chase 100% mutation score. 20–30% of survivors are equivalent mutants. 80% is excellent.
Do not kill mutants with assertions that test the implementation. assert discount.__code__.co_consts[1] == 0.8 kills the mutant and is a terrible test.
Do not mutate test code. Mutate source; run tests. Mutating tests is circular.
Do not treat SDL (statement delete) survivors on logging/metrics as bugs. Yes, deleting logger.info(...) doesn't fail any test. No, you probably don't want to assert on every log line.

Output format

## Mutation run
Tool: <mutmut/pitest/stryker>  Scope: <files>
Mutants: <total>  Killed: <N>  Survived: <M>  Timeout: <T>  Score: <%>

## Survivors — triaged
### Weak assertions (fix the test)
| Mutant | Location | Existing test | Why it survived | Fixed assertion |
| ------ | -------- | ------------- | --------------- | --------------- |

### Missing tests (add a test)
| Mutant | Location | Missing case | New test |
| ------ | -------- | ------------ | -------- |

### Equivalent (ignore)
| Mutant | Why equivalent |
| ------ | -------------- |

### Dead code (remove)
| Mutant | Evidence |
| ------ | -------- |

## After fixes
Projected score: <%>  (killed +<N>, marked equivalent +<M>)

santosomar/mutation-test-suite-optimizer

skills/testing/mutation-test-suite-optimizer/SKILL.md

Uses mutation testing to find weak assertions and missing tests — injects small bugs and checks if the suite catches them, then generates tests targeting the surviving mutants. Use when coverage is high but bugs still ship, when auditing test quality, or when deciding if the suite is good enough.

development

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills mutation-test-suite-optimizer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 4:37 AM97.8s1 file scanned

SKILL.md

name:: mutation-test-suite-optimizer
description:: Uses mutation testing to find weak assertions and missing tests — injects small bugs and checks if the suite catches them, then generates tests targeting the surviving mutants. Use when coverage is high but bugs still ship, when auditing test quality, or when deciding if the suite is good enough.
license:: Apache-2.0
category:: testing
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: smart-mutation-operator-generator, coverage-enhancer, test-oracle-generator

Mutation Test Suite Optimizer

Coverage says "this line ran." Mutation testing says "this line ran AND if it were wrong, a test would catch it." Surviving mutants are lines where the test suite is blind.

How it works

Mutate: Apply a small change to the code. > → >=. + → -. return x → return None.
Run tests: Does any test fail?
Killed if a test fails — the suite noticed the bug. Survived if all pass — the suite is blind here.
Mutation score = killed / total. Higher is better. > 80% is good.

Standard mutation operators

→ smart-mutation-operator-generator for domain-specific mutations beyond these.

Step 1 — Run the mutation tool

These take hours on big codebases. Scope to changed files: mutmut run --paths-to-mutate src/pricing.py.

Step 2 — Triage survivors

Not every survivor needs a test. Classify:

Worked example — killing a survivor

Code:

def discount(price, tier):
    if tier == "gold":
        return price * 0.8
    return price

Mutation report:

SURVIVED: discount.py:3 — `price * 0.8` → `price * 0.9`

Existing test:

def test_gold_discount():
    assert discount(100, "gold") < 100   # ← too weak

Both 0.8 and 0.9 give something < 100. The test is imprecise.

Fix — strengthen the assertion:

def test_gold_discount():
    assert discount(100, "gold") == 80   # 20% off, exactly

Now * 0.9 → 90 != 80 → test fails → mutant killed.

Another survivor:

SURVIVED: discount.py:2 — `tier == "gold"` → `tier != "gold"`

Only test is gold. The non-gold path is covered (return price) but the condition isn't — both == and != give the right answer for this one input. Need a second input:

def test_non_gold_no_discount():
    assert discount(100, "silver") == 100

Now != would give silver → 0.8 * 100 = 80 != 100 → killed.

Equivalent mutants — don't fight them

Some mutants can't be killed because they're behaviorally identical:

for i in range(len(xs)):   # mutant: range(len(xs)) → range(0, len(xs))

range(n) and range(0, n) are the same. No test can distinguish them. Mark as equivalent and move on.

Detecting equivalence is undecidable in general. Heuristics: if you've spent 5 minutes trying to kill a mutant and every test you write passes on both, it's probably equivalent.

Budget — mutation testing is slow

Full mutation on a big codebase: hours to days. Scope it:

Changed files only on every PR.
Weekly full run on the whole codebase.
Sample: mutate 10% of operators randomly — statistical estimate of mutation score.
Timeout per mutant: if a mutant makes tests hang, kill it (counts as killed — it changed behavior, even if the change is "infinite loop").

Do not

Do not chase 100% mutation score. 20–30% of survivors are equivalent mutants. 80% is excellent.
Do not kill mutants with assertions that test the implementation. assert discount.__code__.co_consts[1] == 0.8 kills the mutant and is a terrible test.
Do not mutate test code. Mutate source; run tests. Mutating tests is circular.
Do not treat SDL (statement delete) survivors on logging/metrics as bugs. Yes, deleting logger.info(...) doesn't fail any test. No, you probably don't want to assert on every log line.

Output format

## Mutation run
Tool: <mutmut/pitest/stryker>  Scope: <files>
Mutants: <total>  Killed: <N>  Survived: <M>  Timeout: <T>  Score: <%>

## Survivors — triaged
### Weak assertions (fix the test)
| Mutant | Location | Existing test | Why it survived | Fixed assertion |
| ------ | -------- | ------------- | --------------- | --------------- |

### Missing tests (add a test)
| Mutant | Location | Missing case | New test |
| ------ | -------- | ------------ | -------- |

### Equivalent (ignore)
| Mutant | Why equivalent |
| ------ | -------------- |

### Dead code (remove)
| Mutant | Evidence |
| ------ | -------- |

## After fixes
Projected score: <%>  (killed +<N>, marked equivalent +<M>)

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/testing/mutation-test-suite-optimizer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT