.claude/skills/test-consolidate/SKILL.md
Use when a test module has many similar unit tests, when repetitive assertions could be replaced by property-based or parameterized tests, or when test maintenance cost is high. Consolidates verbose suites into rstest, proptest, or fuzz tests.
npx skillsauth add ahrav/gossip-rs test-consolidateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze test modules for opportunities to replace many similar unit tests with a single, more powerful testing construct — property-based tests, parameterized tests, or fuzz tests. The goal is comprehensive coverage with minimal code to maintain, review, and keep updated.
/test-dedupTests should scale with behaviors, not with inputs.
If a function has 20 valid inputs worth testing, the answer is NOT 20 test functions. The answer is one construct that covers all 20 — and ideally the infinite space between them.
The hierarchy of consolidation (prefer higher):
Property-based test (proptest) — When you can state an invariant that
holds for ALL valid inputs. One proptest! replaces unbounded unit tests.
Maximum coverage, minimum code.
Parameterized test (rstest) — The default choice for any finite set
of (input, expected) pairs. One #[rstest] with #[case] attributes
replaces N identical test bodies. Each case runs as a named sub-test,
giving clear CI output on failure. Always prefer rstest over table-driven
loops — rstest gives you per-case names, per-case failures, and the
ability to #[ignore] individual cases.
Fuzz test — When exploring adversarial or untrusted input spaces. One fuzz target can subsume hundreds of hand-crafted "weird input" tests.
Table-driven test — A distant fallback. Only use when rstest is truly
inappropriate (e.g., dynamically generated case lists, cases loaded from a
file, or >50 cases where #[case] attributes become unwieldy). If you're
reaching for let cases = [...]; for case in cases { ... }, stop and
use rstest instead.
Individual unit tests — The last resort. Only when each test truly exercises unique setup, distinct error paths, or documents a specific bug.
Never consolidate for the sake of shorter code. The goal is less code to maintain — meaning fewer places to update when the function signature changes, fewer tests to rename when behavior evolves, and fewer copy-paste errors.
A test cluster is a group of tests that:
Scan the module and group tests into clusters. A single test can belong to multiple clusters if it tests more than one function.
Cluster: parse_duration()
- test_parse_seconds → parse_duration("5s") == Duration::from_secs(5)
- test_parse_minutes → parse_duration("3m") == Duration::from_secs(180)
- test_parse_hours → parse_duration("2h") == Duration::from_secs(7200)
- test_parse_zero → parse_duration("0s") == Duration::ZERO
- test_parse_large → parse_duration("999h") == Duration::from_secs(3596400)
- test_parse_invalid_unit → parse_duration("5x") == Err(...)
- test_parse_empty → parse_duration("") == Err(...)
- test_parse_negative → parse_duration("-1s") == Err(...)
For each cluster, determine the best consolidation strategy:
Can you state a universal property? → Property-based test. Examples:
Is it a finite set of (input, expected) with no general property? → Parameterized test (rstest). Always rstest, not table-driven. Examples:
Are the tests exploring adversarial/edge inputs? → Fuzz test. Examples:
Does each test have genuinely unique setup or assertions? → Keep as individual tests. Don't force consolidation.
For each cluster, answer:
| Question | If Yes | If No |
|----------|--------|-------|
| Can I state one invariant covering all cases? | proptest | Next question |
| Are all test bodies structurally identical? | rstest | Partial consolidation |
| Do >3 tests share the same assertion pattern? | rstest | Probably keep individual |
| Would adding a new case require a new function? | Consolidate with rstest (adding a #[case] should be trivial) | Fine as-is |
| Do tests differ only in values, not in logic? | Strong rstest candidate | Keep separate |
Use when: The assertion is about a property that holds regardless of input.
// BEFORE: 6 unit tests
#[test] fn encode_ascii() { assert_eq!(encode("hello"), "aGVsbG8="); }
#[test] fn encode_empty() { assert_eq!(encode(""), ""); }
#[test] fn encode_unicode() { assert_eq!(encode("café"), "Y2Fmw6k="); }
#[test] fn decode_ascii() { assert_eq!(decode("aGVsbG8="), "hello"); }
#[test] fn decode_empty() { assert_eq!(decode(""), ""); }
#[test] fn decode_unicode() { assert_eq!(decode("Y2Fmw6k="), "café"); }
// AFTER: 1 property test (covers infinite inputs)
proptest! {
#[test]
fn roundtrip(input in "\\PC*") {
let encoded = encode(&input);
let decoded = decode(&encoded).unwrap();
prop_assert_eq!(input, decoded);
}
}
// KEEP: encode_empty as anchor (documents empty-input behavior)
Use when: You have specific (input, expected) pairs with no general property.
// BEFORE: 7 unit tests
#[test] fn status_200() { assert_eq!(status_text(200), "OK"); }
#[test] fn status_201() { assert_eq!(status_text(201), "Created"); }
#[test] fn status_400() { assert_eq!(status_text(400), "Bad Request"); }
#[test] fn status_404() { assert_eq!(status_text(404), "Not Found"); }
#[test] fn status_500() { assert_eq!(status_text(500), "Internal Server Error"); }
#[test] fn status_unknown() { assert_eq!(status_text(999), "Unknown"); }
#[test] fn status_zero() { assert_eq!(status_text(0), "Unknown"); }
// AFTER: 1 parameterized test
#[rstest]
#[case(200, "OK")]
#[case(201, "Created")]
#[case(400, "Bad Request")]
#[case(404, "Not Found")]
#[case(500, "Internal Server Error")]
#[case(999, "Unknown")]
#[case(0, "Unknown")]
fn status_text_mapping(#[case] code: u16, #[case] expected: &str) {
assert_eq!(status_text(code), expected);
}
rstest handles complex types well. Use #[case] even when cases involve
structs, enums, booleans, or multi-field assertions.
// BEFORE: 5 individual tests with complex setup
#[test] fn gate_short_window() {
let spec = CharClassSpec { max_lower_pct: 95, min_window_len: 32 };
assert!(gate_passes(&[b'a'; 31], spec));
}
#[test] fn gate_all_lowercase() {
let spec = CharClassSpec { max_lower_pct: 95, min_window_len: 32 };
assert!(!gate_passes(&[b'a'; 40], spec));
}
#[test] fn gate_mixed_case() {
let spec = CharClassSpec { max_lower_pct: 95, min_window_len: 32 };
let mut window = vec![b'a'; 38];
window.extend_from_slice(b"AB");
assert!(gate_passes(&window, spec));
}
// ... 2 more ...
// AFTER: 1 rstest with descriptive case names
#[rstest]
#[case::short_window_passes(&[b'a'; 31], 95, 32, true)]
#[case::all_lowercase_rejected(&[b'a'; 40], 95, 32, false)]
#[case::mixed_case_passes(&[b'a'; 35, b'A', b'B', b'C', b'D', b'E'], 95, 32, true)]
#[case::at_threshold_rejected(&[b'a'; 39, b'A'], 95, 32, false)]
#[case::high_threshold_passes(&[b'a'; 40], 100, 32, true)]
fn char_class_gate(
#[case] window: &[u8],
#[case] max_lower_pct: u8,
#[case] min_window_len: usize,
#[case] expected: bool,
) {
let spec = CharClassSpec { max_lower_pct, min_window_len };
assert_eq!(gate_passes(window, spec), expected);
}
Table-driven tests (let cases = [...]; for case in cases) are an anti-pattern
in this codebase. They have real downsides compared to rstest:
cargo test output#[ignore] or filter individual cases"case={label}" formattingThe only situations where table-driven is acceptable:
50 cases where
#[case]attribute lines become genuinely unwieldy
// ANTI-PATTERN — do NOT write this:
#[test]
fn status_text_cases() {
let cases = [
(200, "OK"),
(201, "Created"),
(400, "Bad Request"),
];
for (code, expected) in cases {
assert_eq!(status_text(code), expected, "code={code}");
}
}
// CORRECT — use rstest instead:
#[rstest]
#[case(200, "OK")]
#[case(201, "Created")]
#[case(400, "Bad Request")]
fn status_text_mapping(#[case] code: u16, #[case] expected: &str) {
assert_eq!(status_text(code), expected);
}
Use when: Tests are exploring "weird inputs" to find crashes.
// BEFORE: 12 unit tests trying to crash the parser
#[test] fn parse_null_bytes() { let _ = parse(b"\x00\x00"); }
#[test] fn parse_huge_input() { let _ = parse(&vec![0xFF; 10_000]); }
#[test] fn parse_truncated() { let _ = parse(b"\x01\x02"); }
// ... 9 more ...
// AFTER: 1 fuzz target (explores billions of inputs)
fuzz_target!(|data: &[u8]| {
let _ = parse(data);
});
// KEEP: 1-2 specific regression tests if they document known bugs
Before recommending consolidation, verify:
Arbitrary impls behind test-support. If moving
tests behind a feature gate, keep at least one ungated unit test as baseline.// Regression: GH-123)
should stay as individual tests even if technically consolidatable.rstest = "0.25").
If not yet in a crate's [dev-dependencies], add rstest.workspace = true. This is
expected and should not be treated as a blocker.For each cluster, specify:
#[cfg(test)] mod tests { ... } at bottom of source filecrates/*/src/*_tests.rs (e.g., ring_buffer_tests.rs, error_tests.rs)#[cfg(test)] modules using proptest (direct dev-dep, no feature gate)#[cfg(kani)] blocks in gossip-stdxcrates/gossip-coordination/src/sim/ (CoordinationSim harness + proptest state machine)crates/gossip-contracts/fuzz/ and crates/gossip-stdx/fuzz/crates/*/tests/ (e.g., identity_smoke.rs)
crates/scanner-engine/src/engine/tests.rs and *_tests.rs companionscrates/scanner-engine-integration-tests/tests/chunked_file_scans.rstest-support: Enables Arbitrary impls in gossip-contracts and sim infrastructure in gossip-coordinationkani: Enables Kani model checking proofs in gossip-stdxtiger-harness: Enables tiger harness in scanner-enginescheduler-sim: Enables scheduler simulation in scanner-schedulersim-harness: Enables simulation harness in scanner-engine and scanner-schedulerbench: Enables benchmark scaffolding in scanner-engine and scanner-schedulerrstest = "0.25"). Add rstest.workspace = true
to crate-level [dev-dependencies] if not already present. No feature gate needed.Arbitrary impls for shared types
gated behind test-support in gossip-contracts.## Test Consolidation Report: [module/file]
### Cluster Analysis
#### Cluster 1: `function_name()` — N tests, M lines
**Tests in cluster:**
| # | Test | Input | Assertion |
|---|------|-------|-----------|
| 1 | `test_foo_basic` | "hello" | returns "HELLO" |
| 2 | `test_foo_empty` | "" | returns "" |
| 3 | `test_foo_unicode` | "café" | returns "CAFÉ" |
| ... | ... | ... | ... |
**Pattern detected:** All tests call `foo(input)` and assert exact output.
Inputs vary, assertion structure is identical.
**Recommended strategy:** Property-based test
**Rationale:** The invariant `foo(x).to_lowercase() == x.to_lowercase()` holds
for all inputs. A single proptest replaces all 8 unit tests.
**Consolidation:**
- REPLACE tests #1-#6 with proptest `prop_foo_case_invariant`
- KEEP `test_foo_empty` as anchor (documents empty-input behavior)
- KEEP `test_foo_regression_gh_42` (regression test, bug reference)
**Proposed code:**
```rust
proptest! {
#[test]
fn prop_foo_case_invariant(input in "\\PC{0,100}") {
prop_assert_eq!(foo(&input).to_lowercase(), input.to_lowercase());
}
}
bar() — N tests, M lines...
| Metric | Before | After | Change | |--------|--------|-------|--------| | Total tests | 34 | 12 | -22 (65%) | | Test lines | 280 | 95 | -185 (66%) | | Behaviors covered | 8 | 8 | No change | | Input space covered | ~34 points | Continuous | Vastly improved |
rstest.workspace = true to [dev-dependencies] (if rstest recommended)
## Decision Heuristics
### When to prefer proptest over rstest
- You can state a property about the output without knowing the exact value
- The input space is continuous or very large
- Roundtrip properties exist (encode/decode, serialize/deserialize)
- Order/sorting/containment invariants exist
### When to prefer rstest over proptest
- Each case has a specific expected output that must be exact
- The set of important cases is finite and known
- The mapping is arbitrary (no mathematical relationship)
- You want each case to appear as a named sub-test in output
### When to prefer rstest over table-driven (almost always)
- **Default to rstest for ALL finite case sets.** rstest gives you:
- Per-case test names in `cargo test` output (e.g., `status_text_mapping::case_3`)
- Independent execution — one failing case doesn't hide others
- `#[ignore]` on individual cases
- Native `cargo test --test-threads` parallelism per case
- Table-driven is only acceptable for dynamically generated cases or >50 entries
### When to prefer fuzz over all else
- The function processes untrusted/external input
- The goal is "never panic" rather than "correct output"
- You've been writing tests that try to "trick" the parser
### When NOT to consolidate
- Each test has genuinely different setup logic (not just different values)
- Tests verify different error paths with different error types
- Tests are regression tests for specific bugs (keep the history)
- The "consolidated" version would be harder to understand than the originals
- There are only 2-3 tests — consolidation overhead isn't worth it
## Judgment Calls
- **Threshold**: Don't consolidate clusters of fewer than 4 tests unless the
consolidation is obviously cleaner (e.g., perfect roundtrip property).
- **Mixed clusters**: If 6 of 8 tests consolidate but 2 are genuinely unique,
consolidate the 6 and keep the 2. Don't force everything into one construct.
- **Error tests**: Tests for error cases often belong in a separate rstest or
table, not mixed into the happy-path property test. Group by success/failure.
- **Naming**: Consolidated tests should have clear names describing the
invariant or case set, not generic names like `test_all_cases`.
## Related Skills
- `/test-dedup` — Remove tests subsumed by higher-level tests (complementary)
- `/test-strategy` — Decide what kind of test to write for new code
development
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.