skills/hardening/SKILL.md
Agent-driven test hardening for functions, specs, and acceptance surfaces: property testing, mutation testing, acceptance mutation, CRAP/SCRAP risk, and structural DRY analysis. Use when: "property test this", "mutation test", "CRAP analysis", "DRY analysis", "harden tests", "find weak tests", "kill surviving mutants", "acceptance mutation", "uncle bob hardening". Trigger: /hardening.
npx skillsauth add phrazzld/agent-skills hardeningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Find places where the code or tests can lie, then add the smallest hardening loop that makes the lie executable.
This skill absorbs the Uncle Bob pattern: agents are good at selecting a bounded target, deriving domains and invariants, generating focused hardening checks, running them, and repairing the bugs or weak tests they expose. The output is not a generic "more tests" report. It is a short list of hardened surfaces with commands, survivors, fixes, and residual risk.
You are the lead hardening engineer.
Delegation floor applies: probe the roster first; dispatch two or more
providers for substantive work; direct solo only for mechanical, emergency,
user-forbidden, or fewer-than-two-providers cases. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use one lane to identify hardening candidates and one lane to challenge the chosen oracle, domains, mutants, and residual risk.
| Mode | Use When | Primary Output |
|---|---|---|
| property | Pure or mostly pure behavior has a describable domain, range, invariant, round trip, monotonicity, idempotence, parser law, or conservation rule. | Property tests plus counterexample fixes. |
| mutation | Existing tests may be shallow, branch logic changed, or a file has enough coverage to justify mutant execution. | Mutation run, survivor fixes, and killed-mutant evidence. |
| acceptance | Gherkin, examples, fixtures, API contracts, CLI transcripts, or golden paths claim user behavior. | Mutated examples/contracts prove acceptance checks are connected. |
| risk | You need to find risky code before choosing targets. | CRAP/SCRAP/complexity/coverage-ranked target list. |
| dry | Duplication may hide copy-paste bugs or force scattered fixes. | Structural duplicate candidates and refactor/no-refactor calls. |
| full | User asks for a hardening pass without naming a mode. | Risk-ranked sequence: risk -> property/mutation/acceptance -> dry if relevant. |
Other workflow skills may route here without making hardening part of the default fast path:
| Signal | Route |
|---|---|
| /implement names broad-domain or invariant-heavy behavior | /hardening property |
| /code-review finds branch-heavy changed code with shallow tests | /hardening mutation or /hardening risk |
| /qa relies on examples, fixtures, Gherkin, contracts, or golden files whose values should matter | /hardening acceptance |
| /ci reports missing hardening visibility for a repo that needs it | /hardening risk |
| /deliver carries a blocking phase verdict naming a test-strength gap | the mode named by that phase |
| A context packet declares Formal Spec Required: yes | run the formal-spec ladder in references/formal-spec-ladder.md |
Start narrow. Pick a function, module, spec file, feature, route, or CLI surface that satisfies at least two of:
Reject targets that are mostly I/O glue unless you can isolate a deterministic core or test the boundary with recorded fixtures. Do not property-test framework wiring just because the mode was requested.
For each candidate, write the oracle before code changes:
## Property Oracle
- Target:
- Domain:
- Exclusions / invalid inputs:
- Generator strategy:
- Invariant or relation:
- Shrinking / counterexample capture:
- Exact command:
- Why examples alone are insufficient:
Good property targets usually have one of these shapes:
If you cannot name the property in one sentence, do not invent a vague random test. Fall back to example tests or mutation testing.
Mutation is a deliberate hardening workflow, not the default fast gate.
Acceptance mutation checks whether user-facing examples are connected to real assertions.
Risk analysis chooses where to spend hardening effort.
Every hardening report includes:
See harnesses/shared/AGENTS.md (Completion Evidence) for the shared evidence
core; this phase keeps hardening-specific local fields.
## Completion Gate
- Target hardened: function, module, fixture, contract, or acceptance surface hardened.
- Hardening mode: property, mutation, acceptance, risk, dry, or full.
- Oracle / property / mutant / metric used: invariant, mutant class, fixture mutation, CRAP/SCRAP, or duplicate metric.
- Evidence that proves it: counterexample, survivor, weak-test finding, killed mutant, or risk metric observed.
- Exact command/path/route exercised: command, fixture path, contract path, or route exercised.
- Fixes made: production fix, test strengthening, equivalent-mutant filter, or no-code result.
- Rerun evidence: rerun command output or artifact proving the hardening now holds.
- Residual risk: surviving mutant, uncovered domain, equivalent mutant, or none with reason.
- Roster receipt ids: provider receipt ids or explicit roster waiver.
No "hardened" claim without the exact command and rerun evidence.
Formal Spec Required: yes and the
triggering criteria.unclebob/Acceptance-Pipeline-Specification: portable Gherkin-to-IR
acceptance mutation with persistent runner workers.unclebob/mutate4go, mutate4java, and clj-mutate: bounded file-level
mutation with coverage, scans, differential manifests, and survivor loops.unclebob/crap4go, crap4java, and crap4clj: complexity plus coverage as
a target-selection metric.unclebob/scrap: structural test/spec smell scoring for refactor decisions.unclebob/dry4go and dry4java: structural duplicate discovery as evidence,
not an automatic refactor.development
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
testing
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
data-ai
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
testing
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.