agentic/code/addons/testing-quality/skills/mutation-test/SKILL.md
Run mutation testing to validate test quality beyond code coverage. Use when assessing test effectiveness, finding weak tests, or validating test suite quality.
npx skillsauth add jmagly/aiwg mutation-testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run mutation testing to measure test suite effectiveness. Mutation testing introduces small changes (mutants) to code and checks if tests catch them. High coverage with low mutation score indicates weak tests.
| Concept | Source | Reference | |---------|--------|-----------| | Mutation Testing Theory | IEEE TSE (2019) | Papadakis et al. "Mutation Testing Advances" | | ICST Mutation Workshop | IEEE Annual | Mutation 2024 | | Stryker Mutator | Industry Tool | stryker-mutator.io | | PITest | Java Tool | pitest.org | | mutmut | Python Tool | github.com/boxed/mutmut |
| Natural Language | Action | |------------------|--------| | "Run mutation testing" | Execute mutation analysis | | "Check if my tests are effective" | Run mutation + analyze | | "Validate test quality" | Mutation score report | | "Are my tests catching real bugs?" | Mutation analysis | | "Find weak tests" | Identify low-score tests | | "Why did this bug escape tests?" | Mutation analysis on module |
A mutant is a small code change that should cause tests to fail:
// Original
if (age >= 18) { return "adult"; }
// Mutant 1: Changed >= to >
if (age > 18) { return "adult"; }
// Mutant 2: Changed >= to ==
if (age == 18) { return "adult"; }
// Mutant 3: Changed "adult" to ""
if (age >= 18) { return ""; }
| Operator | Example | Tests |
|----------|---------|-------|
| Arithmetic | + → - | Math operations |
| Relational | >= → > | Boundary conditions |
| Logical | && → \|\| | Boolean logic |
| Literal | true → false | Constant handling |
| Return | return x → return null | Return value handling |
Mutation Score = (Killed Mutants / Total Mutants) × 100
| Score | Quality | Interpretation | |-------|---------|----------------| | 90%+ | Excellent | Tests are highly effective | | 80-89% | Good | Target for production | | 60-79% | Adequate | Room for improvement | | <60% | Poor | Tests need significant work |
def setup_mutation_tool(project_type):
if project_type == "javascript":
# Install Stryker
return "npx stryker init"
elif project_type == "python":
# Install mutmut
return "pip install mutmut"
elif project_type == "java":
# PITest via Maven/Gradle
return "Add pitest plugin to pom.xml"
Stryker (JavaScript):
// stryker.config.json
{
"mutate": ["src/**/*.ts", "!src/**/*.test.ts"],
"testRunner": "vitest",
"reporters": ["html", "progress"],
"coverageAnalysis": "perTest",
"thresholds": {
"high": 80,
"low": 60,
"break": 50
}
}
mutmut (Python):
# setup.cfg
[mutmut]
paths_to_mutate=src/
tests_dir=tests/
runner=pytest
PITest (Java):
<!-- pom.xml -->
<plugin>
<groupId>org.pitest</groupId>
<artifactId>pitest-maven</artifactId>
<version>1.15.0</version>
<configuration>
<targetClasses>
<param>com.example.*</param>
</targetClasses>
<mutationThreshold>80</mutationThreshold>
</configuration>
</plugin>
# JavaScript
npx stryker run
# Python
mutmut run
# Java
mvn org.pitest:pitest-maven:mutationCoverage
def parse_mutation_results(report_path):
"""Parse mutation testing report"""
return {
"total_mutants": 150,
"killed": 120,
"survived": 25,
"timeout": 5,
"mutation_score": 80.0,
"survivors": [
{
"file": "src/auth/validate.ts",
"line": 45,
"mutator": "RelationalOperator",
"original": "age >= 18",
"mutant": "age > 18",
"status": "survived"
}
# ... more survivors
]
}
## Mutation Testing Report
**Module**: src/auth/
**Test Suite**: test/auth/
### Summary
| Metric | Value |
|--------|-------|
| Total Mutants | 150 |
| Killed | 120 (80%) |
| Survived | 25 (17%) |
| Timeout | 5 (3%) |
| **Mutation Score** | **80%** |
### Status: PASSED (threshold: 80%)
### Survived Mutants (Highest Priority)
#### 1. `src/auth/validate.ts:45`
```diff
- if (age >= 18) { return "adult"; }
+ if (age > 18) { return "adult"; }
Problem: Boundary condition not tested
Fix: Add test case for age = 18
src/auth/login.ts:23- if (attempts < maxAttempts) { allow(); }
+ if (attempts <= maxAttempts) { allow(); }
Problem: Off-by-one boundary not tested
Fix: Add test for attempts = maxAttempts
validate.ts (3 survivors)login.ts (2 survivors)session.ts (1 survivor)| File | Line Coverage | Mutation Score | Gap | |------|--------------|----------------|-----| | validate.ts | 95% | 72% | 23% | | login.ts | 88% | 85% | 3% | | session.ts | 100% | 91% | 9% |
High coverage with low mutation score indicates weak assertions
## Integration with CI
### GitHub Actions Integration
```yaml
- name: Run mutation testing
run: npx stryker run --reporters json
- name: Check mutation threshold
run: |
SCORE=$(jq '.metrics.mutationScore' reports/mutation/stryker-incremental.json)
if (( $(echo "$SCORE < 80" | bc -l) )); then
echo "::error::Mutation score $SCORE% below 80% threshold"
exit 1
fi
Only test changed code:
# Stryker incremental
npx stryker run --incremental
# PITest history
mvn pitest:mutationCoverage -DwithHistory
{
"mutate": [
"src/auth/**/*.ts",
"src/payment/**/*.ts",
"src/validation/**/*.ts"
]
}
tdd-enforce - Enforce test-first developmentflaky-detect - Identify unreliable teststest-sync - Maintain test-code alignmentRun mutation testing for project:
python scripts/mutation_runner.py --module src/auth
Analyze and prioritize survivors:
python scripts/mutation_analyzer.py --report stryker-report.json
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.