skills/hardening/SKILL.md
Agent-driven test hardening for functions, specs, and acceptance surfaces: property testing, mutation testing, acceptance mutation, CRAP/SCRAP risk, and structural DRY analysis. Use when: "property test this", "mutation test", "CRAP analysis", "DRY analysis", "harden tests", "find weak tests", "kill surviving mutants", "acceptance mutation", "uncle bob hardening". Trigger: /hardening.
npx skillsauth add phrazzld/spellbook hardeningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Find places where the code or tests can lie, then add the smallest hardening loop that makes the lie executable.
This skill absorbs the Uncle Bob pattern: agents are good at selecting a bounded target, deriving domains and invariants, generating focused hardening checks, running them, and repairing the bugs or weak tests they expose. The output is not a generic "more tests" report. It is a short list of hardened surfaces with commands, survivors, fixes, and residual risk.
You are the lead hardening engineer.
delegate on judgment per the shared Roster contract: native subagents
by default; add cross-model critics, roster providers, or sprite lanes
(/sprites) only when they answer a distinct question. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use one lane to identify hardening candidates and one lane to challenge the chosen oracle, domains, mutants, and residual risk.
| Mode | Use When | Primary Output |
|---|---|---|
| property | Pure or mostly pure behavior has a describable domain, range, invariant, round trip, monotonicity, idempotence, parser law, or conservation rule. | Property tests plus counterexample fixes. |
| mutation | Existing tests may be shallow, branch logic changed, or a file has enough coverage to justify mutant execution. | Mutation run, survivor fixes, and killed-mutant evidence. |
| acceptance | Gherkin, examples, fixtures, API contracts, CLI transcripts, or golden paths claim user behavior. | Mutated examples/contracts prove acceptance checks are connected. |
| risk | You need to find risky code before choosing targets. | CRAP/SCRAP/complexity/coverage-ranked target list. |
| dry | Duplication may hide copy-paste bugs or force scattered fixes. | Structural duplicate candidates and refactor/no-refactor calls. |
| full | User asks for a hardening pass without naming a mode. | Risk-ranked sequence: risk -> property/mutation/acceptance -> dry if relevant. |
Other workflow skills may route here without making hardening part of the default fast path:
| Signal | Route |
|---|---|
| /implement names broad-domain or invariant-heavy behavior | /hardening property |
| /code-review finds branch-heavy changed code with shallow tests | /hardening mutation or /hardening risk |
| /qa relies on examples, fixtures, Gherkin, contracts, or golden files whose values should matter | /hardening acceptance |
| /ci reports missing hardening visibility for a repo that needs it | /hardening risk |
| /deliver carries a blocking phase verdict naming a test-strength gap | the mode named by that phase |
| A context packet declares Formal Spec Required: yes | run the formal-spec ladder in references/formal-spec-ladder.md |
Start narrow. Pick a function, module, spec file, feature, route, or CLI surface that satisfies at least two of:
Reject targets that are mostly I/O glue unless you can isolate a deterministic core or test the boundary with recorded fixtures. Do not property-test framework wiring just because the mode was requested.
For each candidate, write the oracle before code changes:
## Property Oracle
- Target:
- Domain:
- Exclusions / invalid inputs:
- Generator strategy:
- Invariant or relation:
- Shrinking / counterexample capture:
- Exact command:
- Why examples alone are insufficient:
Good property targets usually have one of these shapes:
If you cannot name the property in one sentence, do not invent a vague random test. Fall back to example tests or mutation testing.
Mutation is a deliberate hardening workflow, not the default fast gate.
Acceptance mutation checks whether user-facing examples are connected to real assertions.
Risk analysis chooses where to spend hardening effort.
Every hardening report includes:
See harnesses/shared/AGENTS.md (Completion Evidence) for the shared evidence
core; this phase keeps hardening-specific local fields.
## Completion Gate
- Target hardened: function, module, fixture, contract, or acceptance surface hardened.
- Hardening mode: property, mutation, acceptance, risk, dry, or full.
- Oracle / property / mutant / metric used: invariant, mutant class, fixture mutation, CRAP/SCRAP, or duplicate metric.
- Evidence that proves it: counterexample, survivor, weak-test finding, killed mutant, or risk metric observed.
- Exact command/path/route exercised: command, fixture path, contract path, or route exercised.
- Fixes made: production fix, test strengthening, equivalent-mutant filter, or no-code result.
- Rerun evidence: rerun command output or artifact proving the hardening now holds.
- Residual risk: surviving mutant, uncovered domain, equivalent mutant, or none with reason.
- Roster receipt ids: provider receipt ids or explicit roster waiver.
No "hardened" claim without the exact command and rerun evidence.
Formal Spec Required: yes and the
triggering criteria.unclebob/Acceptance-Pipeline-Specification: portable Gherkin-to-IR
acceptance mutation with persistent runner workers.unclebob/mutate4go, mutate4java, and clj-mutate: bounded file-level
mutation with coverage, scans, differential manifests, and survivor loops.unclebob/crap4go, crap4java, and crap4clj: complexity plus coverage as
a target-selection metric.unclebob/scrap: structural test/spec smell scoring for refactor decisions.unclebob/dry4go and dry4java: structural duplicate discovery as evidence,
not an automatic refactor.Run cargo run --locked -p harness-kit-checks -- check-evidence-blocks skills to prove /hardening
keeps its local Completion Gate. Semantic proof is the run's counterexample,
survivor, killed mutant, risk metric, or explicit equivalent-mutant waiver.
tools
Enumerates the peer AI agent CLIs installed on this machine (codex, claude, pi, opencode, cursor-agent, grok, agy, hermes, thinktank) and how to invoke each headlessly. A capability map, not a quota: useful for fresh-context adversarial review on a different model family, second opinions, competing attempts, and wide benches. Use when: "ask codex", "ask another model", "second opinion", "cross-model review", "what AI tools do I have", "other agents", "different model family", "adversarial critique from another provider". Trigger: /roster.
development
Run lane cards on Fly Sprites: remote, isolated, scale-to-zero sandboxes for heavy or parallel agent work. Golden-checkpoint provisioning so lanes start on a ready sprite with zero setup tokens. Use when: "run this on a sprite", "remote lane", "offload to a sandbox", "dispatch to sprites", "bake a sprite", "sprite fleet", heavy/long-running/parallel sub-agent work that should not run on this machine. Trigger: /sprites, /sprite-lane.
testing
Compose and launch roster-backed specialist lanes with prompt-native lane cards and receipts. Use when: "dispatch agents", "use subagents", "compose a team", "run provider lanes", "make lane cards". Trigger: /dispatch, /subagents, /lanes.
tools
Fast session-start repository orientation from live local evidence. Use when: "orient yourself", "start of session", "new session", "where are we", "catch me up before acting", "what should I do next", after compaction, after switching worktrees, or before choosing a Harness Kit workflow. Trigger: /orient, /ground, /session-start.