Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

roasbeef/mutation-testing

Name: mutation-testing
Author: roasbeef

skills/mutation-testing/SKILL.md

npx skillsauth add roasbeef/claude-files mutation-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Mutation Testing

Mutation testing evaluates test quality by introducing small, deliberate bugs into production code (mutants) and checking whether the test suite fails. A test that passes on a mutant did not actually verify the behavior the mutant changed.

This skill is a thin orchestrator over go-gremlins/gremlins — a maintained Go mutation testing tool. The skill provides install, run, and analysis wrappers that produce machine-readable JSON for downstream tooling (notably the test-refine skill).

Why Mutation Testing

A test suite can hit 100% line coverage and still be useless: tests can execute code without asserting on its results, or assert only on side-irrelevant fields. Mutation testing closes this gap by checking whether the test suite distinguishes the original code from a mutant. See references/coverage-pitfalls.md (in the test-refine skill) for the broader context.

When to Use

After generating tests with test-forge or by hand — verify they have real assertions.
Before merging consensus / payment / crypto code — quality gate on critical paths.
During code review — surface weak tests in the diff.
As a signal source for test-refine — survivors map to weak-assertion findings.

Target efficacy (gremlins terminology: test_efficacy = killed / (killed + lived)):

| Code class | Target | |---|---| | Mission-critical (consensus, wallet, channel, crypto) | 90%+ | | Core business logic | 80–90% | | General code | 70–80% | | Trivial/glue code | run only if cheap |

Workflow

1. Install gremlins (once)

~/.claude/skills/mutation-testing/scripts/install-gremlins.sh

The script pins to a known-good version (override with GREMLINS_VERSION=...). Requires go on PATH and $(go env GOPATH)/bin on PATH.

2. Run mutations

# Default: cwd, JSON to .reviews/mutations/<slug>.json
~/.claude/skills/mutation-testing/scripts/unleash.sh

# Targeted package
~/.claude/skills/mutation-testing/scripts/unleash.sh \
    --pkg ./internal/wallet \
    --output .reviews/mutations/wallet.json

# With integration tests and a config file
~/.claude/skills/mutation-testing/scripts/unleash.sh \
    --pkg ./internal/channel \
    --integration \
    --config .gremlins.yaml \
    --silent

3. Analyze survivors

~/.claude/skills/mutation-testing/scripts/analyze-survivors.sh \
    --input .reviews/mutations/wallet.json \
    --output .reviews/mutations/wallet.md

Produces a markdown report with: efficacy/coverage summary, survivors ranked by file (consensus/channel/wallet paths bubble to the top), and mutator-type breakdown.

Gremlins JSON Schema

gremlins unleash --output <file> emits a single JSON document:

{
  "go_module": "github.com/example/foo",
  "test_efficacy": 82.00,
  "mutations_coverage": 80.00,
  "mutants_total": 100,
  "mutants_killed": 82,
  "mutants_lived": 8,
  "mutants_not_viable": 2,
  "mutants_not_covered": 10,
  "elapsed_time": 123.456,
  "files": [
    {
      "file_name": "wallet.go",
      "mutations": [
        { "line": 42, "column": 8, "type": "CONDITIONALS_NEGATION", "status": "KILLED" }
      ]
    }
  ]
}

Mutation status values:

| Status | Meaning | Action | |---|---|---| | KILLED | Test suite caught the mutation | Good — no action | | LIVED | Tests passed despite mutation | Survivor — strengthen tests | | NOT COVERED | Mutation in code no test exercises | Add a test for that path | | TIMED OUT | Tests timed out — implicit kill | Investigate (might be perf bug) | | NOT VIABLE | Mutation produced uncompilable code | Excluded from score | | RUNNABLE | Dry-run only; would be tested | (only in --dry-run) |

Key metrics:

test_efficacy = killed / (killed + lived) — quality of assertions on covered code.
mutations_coverage = (killed + lived) / (killed + lived + not_covered) — how much code is exercised at all.

A high mutations_coverage with low test_efficacy means tests run code without verifying its behavior — the classic "100% line coverage, 0% real testing" failure mode.

Configuration

Gremlins is configured via .gremlins.yaml (or --config <path>). Mutators ship default-on for safe operators and default-off for aggressive ones.

Default-on mutators (always enabled):

arithmetic-base — + - * / %
conditionals-boundary — < <= > >=
conditionals-negation — == !=, boolean conditions
increment-decrement — ++ --
invert-negatives — -x ↔ +x

Default-off mutators — enable for critical packages:

invert-assignments — += -= *= /= etc. swaps
invert-bitwise — & | ^ swaps
invert-bwassign — &= |= ^= swaps
invert-logical — && ↔ || (security-critical: catches auth bypass mutations)
invert-loopctrl — break ↔ continue
remove-self-assignments — drop x = x op y updates

Recommended config for consensus/wallet/payment code:

silent: false
unleash:
  workers: 0          # use all CPUs
  test-cpu: 0         # no per-test CPU pinning
  threshold:
    efficacy: 90      # fail if below 90%
    mutant-coverage: 85
mutants:
  arithmetic-base:        { enabled: true }
  conditionals-boundary:  { enabled: true }
  conditionals-negation:  { enabled: true }
  increment-decrement:    { enabled: true }
  invert-negatives:       { enabled: true }
  invert-assignments:     { enabled: true }
  invert-bitwise:         { enabled: true }
  invert-bwassign:        { enabled: true }
  invert-logical:         { enabled: true }   # critical for && / || in auth
  invert-loopctrl:        { enabled: true }
  remove-self-assignments:{ enabled: true }

See gremlins.dev configuration docs for the full schema.

Threshold Gating (CI)

For CI, use --silent and set thresholds in config or via env vars:

gremlins unleash --silent --output mutations.json ./...
# Exit nonzero if efficacy < threshold.

The unleash.threshold.efficacy and unleash.threshold.mutant-coverage keys cause gremlins to exit nonzero when the run falls below the configured percentages — wire this into your PR check.

Integration with Other Skills

`test-refine`

The test-refine skill consumes gremlins JSON to identify weak-assertion zones (smell S12: mutation-survivor). When invoked with --use-mutations, it calls unleash.sh and cross-references LIVED mutants with the AST smell scan.

`test-forge`

After test-forge generates tests, run mutation testing to validate them. LIVED mutants are direct evidence of weak assertions in the generated tests.

`code-review`

Include the test_efficacy delta in PR review — regression of >5% in covered code is a strong signal of weakening test quality.

Interpreting Results

High efficacy (≥90%): Tests have strong assertions. Focus remaining work on NOT COVERED mutants (uncovered code paths).

Medium (75–90%): Tests cover main paths. Survivors usually indicate boundary or error-path gaps.

Low (<75%): Significant gaps — tests likely run code without checking outputs. Pair with test-refine to identify the specific smells.

Mutator breakdown tells you the kind of weakness:

conditionals-boundary LIVED → missing edge tests at thresholds.
invert-logical LIVED → missing truth-table coverage for &&/||.
arithmetic-base LIVED → tests don't verify calculation results.
remove-self-assignments LIVED → state mutations not asserted.

Equivalent Mutants

Some LIVED mutants are semantically equivalent to the original — no test could kill them. Common cases:

Mutated value immediately overwritten before being read.
Mutation in unreachable code.
Operator swap in associative/commutative context with no observable difference.

When you identify an equivalent mutant, document it (e.g., a comment near the mutation site, or a project-level EQUIVALENT_MUTANTS.md) so reviewers don't waste time on it. Gremlins doesn't filter equivalents automatically.

Gremlins Limitations

From the upstream README: gremlins targets smallish Go modules (microservices). On very large modules, runs can take hours. Mitigations:

Per-package runs via --pkg ./internal/wallet. Don't pass ./... on a 500k-LOC monorepo.
Skip generated code by using build tags or running on hand-written packages only.
Use --workers to bound parallelism if memory is tight.
Use --dry-run first to preview the mutation count and skip if it's too large.

roasbeef/mutation-testing

skills/mutation-testing/SKILL.md

Validates Go test suite quality through mutation testing using go-gremlins/gremlins. Mutates production code, runs the test suite against each mutant, and reports which mutants the tests fail to kill — exposing weak assertions that line coverage cannot detect. Use when evaluating test effectiveness, validating newly written tests, or improving test quality for mission-critical code (consensus, channel state, payment flows, crypto). Triggers: "mutation test", "are these tests strong", "validate test quality", "/mutation-testing".

19 stars

development

Updated May 20, 2026

$ install --global

skillsauth

npx skillsauth add roasbeef/claude-files mutation-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 20, 2026, 6:32 AM54.9s7 files scanned

SKILL.md

name:: mutation-testing
description:: Validates Go test suite quality through mutation testing using go-gremlins/gremlins. Mutates production code, runs the test suite against each mutant, and reports which mutants the tests fail to kill — exposing weak assertions that line coverage cannot detect. Use when evaluating test effectiveness, validating newly written tests, or improving test quality for mission-critical code (consensus, channel state, payment flows, crypto). Triggers: "mutation test", "are these tests strong", "validate test quality", "/mutation-testing".

Mutation Testing

Why Mutation Testing

When to Use

After generating tests with test-forge or by hand — verify they have real assertions.
Before merging consensus / payment / crypto code — quality gate on critical paths.
During code review — surface weak tests in the diff.
As a signal source for test-refine — survivors map to weak-assertion findings.

Target efficacy (gremlins terminology: test_efficacy = killed / (killed + lived)):

Workflow

1. Install gremlins (once)

~/.claude/skills/mutation-testing/scripts/install-gremlins.sh

The script pins to a known-good version (override with GREMLINS_VERSION=...). Requires go on PATH and $(go env GOPATH)/bin on PATH.

2. Run mutations

# Default: cwd, JSON to .reviews/mutations/<slug>.json
~/.claude/skills/mutation-testing/scripts/unleash.sh

# Targeted package
~/.claude/skills/mutation-testing/scripts/unleash.sh \
    --pkg ./internal/wallet \
    --output .reviews/mutations/wallet.json

# With integration tests and a config file
~/.claude/skills/mutation-testing/scripts/unleash.sh \
    --pkg ./internal/channel \
    --integration \
    --config .gremlins.yaml \
    --silent

3. Analyze survivors

~/.claude/skills/mutation-testing/scripts/analyze-survivors.sh \
    --input .reviews/mutations/wallet.json \
    --output .reviews/mutations/wallet.md

Produces a markdown report with: efficacy/coverage summary, survivors ranked by file (consensus/channel/wallet paths bubble to the top), and mutator-type breakdown.

Gremlins JSON Schema

gremlins unleash --output <file> emits a single JSON document:

{
  "go_module": "github.com/example/foo",
  "test_efficacy": 82.00,
  "mutations_coverage": 80.00,
  "mutants_total": 100,
  "mutants_killed": 82,
  "mutants_lived": 8,
  "mutants_not_viable": 2,
  "mutants_not_covered": 10,
  "elapsed_time": 123.456,
  "files": [
    {
      "file_name": "wallet.go",
      "mutations": [
        { "line": 42, "column": 8, "type": "CONDITIONALS_NEGATION", "status": "KILLED" }
      ]
    }
  ]
}

Mutation status values:

Key metrics:

test_efficacy = killed / (killed + lived) — quality of assertions on covered code.
mutations_coverage = (killed + lived) / (killed + lived + not_covered) — how much code is exercised at all.

A high mutations_coverage with low test_efficacy means tests run code without verifying its behavior — the classic "100% line coverage, 0% real testing" failure mode.

Configuration

Gremlins is configured via .gremlins.yaml (or --config <path>). Mutators ship default-on for safe operators and default-off for aggressive ones.

Default-on mutators (always enabled):

arithmetic-base — + - * / %
conditionals-boundary — < <= > >=
conditionals-negation — == !=, boolean conditions
increment-decrement — ++ --
invert-negatives — -x ↔ +x

Default-off mutators — enable for critical packages:

invert-assignments — += -= *= /= etc. swaps
invert-bitwise — & | ^ swaps
invert-bwassign — &= |= ^= swaps
invert-logical — && ↔ || (security-critical: catches auth bypass mutations)
invert-loopctrl — break ↔ continue
remove-self-assignments — drop x = x op y updates

Recommended config for consensus/wallet/payment code:

silent: false
unleash:
  workers: 0          # use all CPUs
  test-cpu: 0         # no per-test CPU pinning
  threshold:
    efficacy: 90      # fail if below 90%
    mutant-coverage: 85
mutants:
  arithmetic-base:        { enabled: true }
  conditionals-boundary:  { enabled: true }
  conditionals-negation:  { enabled: true }
  increment-decrement:    { enabled: true }
  invert-negatives:       { enabled: true }
  invert-assignments:     { enabled: true }
  invert-bitwise:         { enabled: true }
  invert-bwassign:        { enabled: true }
  invert-logical:         { enabled: true }   # critical for && / || in auth
  invert-loopctrl:        { enabled: true }
  remove-self-assignments:{ enabled: true }

See gremlins.dev configuration docs for the full schema.

Threshold Gating (CI)

For CI, use --silent and set thresholds in config or via env vars:

gremlins unleash --silent --output mutations.json ./...
# Exit nonzero if efficacy < threshold.

The unleash.threshold.efficacy and unleash.threshold.mutant-coverage keys cause gremlins to exit nonzero when the run falls below the configured percentages — wire this into your PR check.

Integration with Other Skills

`test-refine`

`test-forge`

After test-forge generates tests, run mutation testing to validate them. LIVED mutants are direct evidence of weak assertions in the generated tests.

`code-review`

Include the test_efficacy delta in PR review — regression of >5% in covered code is a strong signal of weakening test quality.

Interpreting Results

High efficacy (≥90%): Tests have strong assertions. Focus remaining work on NOT COVERED mutants (uncovered code paths).

Medium (75–90%): Tests cover main paths. Survivors usually indicate boundary or error-path gaps.

Low (<75%): Significant gaps — tests likely run code without checking outputs. Pair with test-refine to identify the specific smells.

Mutator breakdown tells you the kind of weakness:

conditionals-boundary LIVED → missing edge tests at thresholds.
invert-logical LIVED → missing truth-table coverage for &&/||.
arithmetic-base LIVED → tests don't verify calculation results.
remove-self-assignments LIVED → state mutations not asserted.

Equivalent Mutants

Some LIVED mutants are semantically equivalent to the original — no test could kill them. Common cases:

Mutated value immediately overwritten before being read.
Mutation in unreachable code.
Operator swap in associative/commutative context with no observable difference.

Gremlins Limitations

From the upstream README: gremlins targets smallish Go modules (microservices). On very large modules, runs can take hours. Mitigations:

Per-package runs via --pkg ./internal/wallet. Don't pass ./... on a 500k-LOC monorepo.
Skip generated code by using build tags or running on hand-written packages only.
Use --workers to bound parallelism if memory is tight.
Use --dry-run first to preview the mutation count and skip if it's too large.

Related Skills

roasbeef/technical-writing

development

VerifiedTrustedCommunity

Clear-writing guide distilled from Steven Pinker's "The Sense of Style." Use when writing or revising prose that must be clear to a reader — documentation, design docs, specs, explanations, essays, emails, reports, RFCs, release notes — or when asked to make writing clearer, tighter, less academic, or less jargon-laden. Activate for "make this clearer", "tighten this", "why is this hard to read", "edit this for clarity", or any prose-quality pass.

19SKILL.mdUpdated Jun 2, 2026

roasbeef/technical-writing

roasbeef/go-debug

development

VerifiedTrustedCommunity

Interactively debug Go programs in a single context using Delve (dlv) driven through tmux. Use when a bug requires runtime inspection — stepping through code, examining variables, walking goroutines, attaching to a live process, or debugging a hanging integration test — rather than just reading the source. Triggers include "step through this", "set a breakpoint", "attach to the running server", "why is this goroutine stuck", "debug this failing test".

19SKILL.mdUpdated May 20, 2026

roasbeef/variant-analysis

development

VerifiedTrustedCommunity

Find similar vulnerabilities and bugs across codebases using pattern-based analysis. Use when hunting bug variants, building CodeQL/Semgrep queries, analyzing security vulnerabilities, or performing systematic code audits after finding an initial issue.

19SKILL.mdUpdated May 15, 2026

roasbeef/variant-analysis

roasbeef/test-refine

development

VerifiedTrustedCommunity

Refines an existing Go test suite — removes trivial/duplicate tests, strengthens weak assertions, reshapes tests around invariants, and closes branch-coverage gaps. Uses code-guided coverage and (when available) gremlins mutation-testing survivor data rather than relying on line coverage alone. Use when test quality is uneven, after a test-generation pass, before opening a PR, or as a quality gate on critical paths (consensus, channel state, payment flows). Triggers: "refine these tests", "tests are bloated", "tighten assertions", "remove trivial tests", "audit test quality", "/test-refine".

19SKILL.mdUpdated May 15, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/roasbeef/claude-files.git

# Copy into Claude Code skills folder (global)
cp -r claude-files/skills/mutation-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

roasbeef/claude-files

19 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

roasbeef/mutation-testing

$ install --global

Security Scan Results

SKILL.md

Mutation Testing

Why Mutation Testing

When to Use

Workflow

1. Install gremlins (once)

2. Run mutations

3. Analyze survivors

Gremlins JSON Schema

Configuration

Threshold Gating (CI)

Integration with Other Skills

test-refine

test-forge

code-review

Interpreting Results

Equivalent Mutants

Gremlins Limitations

Further Reading

Related Skills

roasbeef/technical-writing

roasbeef/go-debug

roasbeef/variant-analysis

roasbeef/test-refine

roasbeef/mutation-testing

$ install --global

Security Scan Results

SKILL.md

Mutation Testing

Why Mutation Testing

When to Use

Workflow

1. Install gremlins (once)

2. Run mutations

3. Analyze survivors

Gremlins JSON Schema

Configuration

Threshold Gating (CI)

Integration with Other Skills

test-refine

test-forge

code-review

Interpreting Results

Equivalent Mutants

Gremlins Limitations

Further Reading

Related Skills

roasbeef/technical-writing

roasbeef/go-debug

roasbeef/variant-analysis

roasbeef/test-refine

`test-refine`

`test-forge`

`code-review`

`test-refine`

`test-forge`

`code-review`