skills/adversarial-tester/SKILL.md
Use after completing implementation to find unknown failure modes. Reads implementation diff and writes up to 5 tests designed to make it break. Triggers on 'break it', 'adversarial test', 'stress test implementation', 'find weaknesses', or any task seeking to expose unknown failure modes.
npx skillsauth add raddue/crucible adversarial-testerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Read completed implementation and write up to 5 tests designed to make it break. Targets edge cases, boundary conditions, and failure modes the implementer didn't anticipate.
Announce at start: "I'm using the adversarial-tester skill to find weaknesses in this implementation."
Skill type: Rigid -- follow exactly, no shortcuts.
Model: Opus (adversarial reasoning about failure modes requires creative analytical thinking)
<!-- CANONICAL: shared/dispatch-convention.md -->All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.
| Agent | Question | Output | Scope | |-------|----------|--------|-------| | Red-team | "What's wrong with this artifact?" | Written findings (Fatal/Significant/Minor) | Attacks designs, plans, code quality | | Test Gap Writer | "What known gaps need filling?" | Executable tests (expected to PASS) | Fills reviewer-identified holes | | Adversarial Tester | "What runtime behavior will break?" | Executable tests (may PASS or FAIL) | Finds unknown weaknesses in behavior |
Read the full diff of the implementation changes. Identify:
Brainstorm 8-10 ways the implementation could break at runtime. Think like an attacker:
Rank each candidate by:
Select the top 5. If fewer than 5 candidates are meaningful, write fewer -- don't pad with trivial tests.
For each selected failure mode, write one focused test that:
Run each test and record the result:
Output the ADVERSARIAL TEST REPORT (see Report Format below).
## ADVERSARIAL TEST REPORT
### Summary
- Failure modes identified: N
- Tests written: N
- Tests PASSING (implementation robust): N
- Tests FAILING (weaknesses found): N
- Tests ERROR (discarded): N
### Failure Mode 1: [Title]
- **Attack vector:** [how this breaks]
- **Likelihood:** High/Medium/Low
- **Impact:** High/Medium/Low
- **Test:** `TestClassName.TestMethodName`
- **Result:** PASS/FAIL
- **If FAIL -- fix guidance:** [what the implementer should change]
[repeat for each failure mode]
Must NOT do:
When used standalone, after running the tests:
When used within the build pipeline, the orchestrator handles outcome routing (see build skill Phase 3).
The orchestrator (not this skill) decides whether to skip. When used standalone, use your judgment:
When dispatched by the build pipeline:
Fix loop mechanics:
Orchestrator skip conditions:
.md, .json, .yaml, .uss, .uxml)This skill produces adversarial tests. When used standalone, the tests themselves are the quality mechanism -- no additional quality gate needed. When used within the build pipeline, the orchestrator handles outcome routing.
crucible:build (Phase 3, after test gap writer)crucible:test-driven-development patterns for test writingcrucible:temper (lightweight review if fix touches 3+ files)break-it-prompt.md (for subagent dispatch)testing
Standalone instance-bug reviewer — runs a parallel finder fan-out + verify gate over a diff or a path and prints ranked, verified findings. Use when the user says "delve", "find bugs in this diff", "review this for bugs", "scan this file/subsystem for defects", "instance-bug sweep", or wants concrete reproducible defects (not a merge verdict, not systemic health). Works on a PR id, a base..head range, or a path, on any forge (GitHub, GitLab, Bitbucket, self-hosted).
testing
Render the Crucible calibration ledger weekly report — the honest "Crucible caught N silent bugs" headline, verdict breakdown, per-skill severity rates, and the inflation detector. Triggers on "/ledger", "weekly report", "weekly ledger", "caught N", "quality ledger", "calibration report", "render the ledger".
development
The Book of Grudges — cross-session bug graveyard. Every fixed bug is recorded as a structured "grudge"; before touching code, skills query the grudgebook for the files in scope and surface past regressions as forced "DO NOT REPEAT" context. Read mode (pre-flight) and write mode (on bug resolution / fix(*) PR). Machine-local, per-repo, never committed. Triggers on /grudge, "check grudges", "record a grudge", "any past bugs here", "regression oracle", "bug graveyard".
testing
Reconcile the Crucible calibration ledger — walk merged fix/hotfix branches to falsify the originating gating-verdicts, compute per-skill Brier calibration scores, and append a falsification log. Triggers on "/calibration-reconcile", "reconcile ledger", "reconcile calibration", "falsify verdicts", "brier score", "calibration reconcile", "compute brier".