.claude/skills/invariant-test-review/SKILL.md
Use when writing or reviewing state-machine tests, simulation tests, oracle tests, or regression tests to verify they actually prove the claimed invariant. Catches hidden weaknesses like missing negative paths and order-sensitive comparisons.
npx skillsauth add ahrav/gossip-rs invariant-test-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Review tests with one question: does this test actually prove the property it claims to prove?
This skill is for subtle tests where the risk is not "missing coverage" in the abstract, but "the test passes without isolating the intended invariant."
/invariant-test-review [<test-file-or-function>]
*_tests.rs, tests/*.rs); if none
changed, inspect changed Rust files that contain inline tests or
simulation/oracle helpers.A test only proves what its setup, observation surface, and assertions uniquely force. If the test can pass because of unrelated setup, missing negative cases, or a lossy comparator, it is weaker than it looks.
Classify each finding with one of these severities:
| Severity | Meaning | |----------|---------| | BLOCK | The test does not isolate the claimed invariant, can pass for the wrong reason, or relies on an invalid oracle/comparator. | | WARN | The test points at the right behavior but is weaker than it looks because of missing twins, proxy observations, or confounding setup. | | INFO | Improves clarity, discoverability, or explanation without materially changing proof strength. |
Rewrite the test's purpose as one precise sentence.
Good:
stale lease checkpoints are rejectedterminal failed runs never become active againoracle comparison ignores spawned order but preserves child identityWeak:
covers evictiontests failure handlingchecks driftIf you cannot name the invariant in one sentence, the test is underspecified.
Ask:
Delete or inline anything that does not participate in the invariant.
The assertion must observe the property directly.
If the assertion only checks a proxy, call that out.
When the real question is "is this assertion strong enough?", add the smallest paired case that distinguishes the competing claims:
For terminal, rejection, replay, and idempotency rules, the negative or boundary twin is usually mandatory.
Many misleading tests come from a comparator that is wrong, not the system under test.
Check for:
Vec equality over logically unordered stateIf order is irrelevant, compare setwise or sort explicitly before asserting.
Ask the final question:
If the code were wrong in exactly the way we care about, would this test fail for that reason?
If the answer is "not sure" or "only indirectly," the test needs revision.
Typical classifications:
Return a short report in this format:
## Invariant Test Review: [test name or file]
- **Claimed invariant**: [one sentence]
- **Minimal trigger**: [smallest state/input that matters]
- **Observation surface**: [what the assertion actually observes]
### Findings
- [BLOCK|WARN|INFO] [Issue 1]: [why the current test is weaker than it looks]
- [BLOCK|WARN|INFO] [Issue 2]: [missing twin, confounder, or comparator problem]
- [BLOCK|WARN|INFO] [Issue N]: [additional issues — add as many as needed]
### Recommended Rewrite
- Remove (if applicable): [vestigial setup]
- Add (if applicable): [negative-path or boundary twin]
- Normalize (if applicable): [unordered state before comparison]
- Assert: [the direct property instead of a proxy]
/test-strategy — choose the right test form once the invariant is clear/sim-review — review DST compatibility and simulation-specific constraints/pr-comment-response — verify reviewer bug claims with the smallest proofdevelopment
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.