.claude/skills/test-strategy/SKILL.md
Use when writing tests for new code and unsure which test type fits, when choosing between unit/rstest/proptest/fuzz/kani/sim, or when coordination or unsafe code changes need test coverage guidance. Recommends the optimal testing approach per code characteristic.
npx skillsauth add ahrav/gossip-rs test-strategyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze code and recommend the optimal testing approach from this project's testing toolkit.
| Type | Tool | Feature Flag | Best For |
|------|------|--------------|----------|
| Unit Tests | #[test] | None | Specific behavior, edge cases, regression tests |
| Parameterized Tests | rstest | Dev-dep | Finite case sets with specific expected outputs, enum variants, error codes |
| Property Tests | proptest | Dev-dep (some crates use test-support for Arbitrary impls) | Invariants over input domains, mathematical properties |
| Fuzz Tests | cargo-fuzz | External | Security-critical parsing, untrusted input handling |
| Model Checking | Kani | kani | Memory safety proofs, absence of panics, formal verification |
| Simulation Tests | CoordinationSim | test-support | Coordination protocol invariants, fault tolerance, state machine correctness |
This project has a deterministic simulation harness for the coordination subsystem. Always consider whether new or changed code should be covered by simulation tests.
| Harness | Location | Feature | Scope | When to Add Cases |
|---------|----------|---------|-------|-------------------|
| CoordinationSim | crates/gossip-coordination/src/sim/ | test-support | Coordination protocol invariants (S1–S9), lease management, shard lifecycle, fault injection (SunnyDay/Stormy/Radioactive), deterministic replay | Any change to coordination logic, shard state machines, lease acquisition, run lifecycle, or split handling |
| TigerHarness | crates/scanner-engine/src/tiger_harness.rs | tiger-harness | Scanner engine deterministic test harness for detection pipeline validation | Any change to detection rules, transform pipeline, or scanner engine core |
| SchedulerSim | crates/scanner-scheduler/src/scheduler/sim.rs | scheduler-sim | Scheduler simulation for work-stealing, chunking, and I/O orchestration validation | Any change to scheduler logic, parallel scan, task graph, or affinity |
Architecture: The sim module has five layers:
mod.rs — SimContext (seeded PRNG + logical clock) and FaultConfig/FaultLevelworker — SimWorker per-worker bookkeeping (lease claims, op-ID generation, cursor progress)invariants — InvariantChecker verifying 9 safety properties (S1–S9) externally against coordinator ground truthoverload — Scripted overload scenarios for targeted stress validationharness — CoordinationSim top-level driver (zombie preamble + safety phase + liveness phase)Additional simulation-adjacent tests in crates/gossip-coordination/src/sim/:
proptest_state_machine_tests.rs — Proptest state machine model checkingmega_sim_tests.rs — Large-scale simulation runssim_behavioral_tests.rs — Behavioral scenario testsoverload_tests.rs — Overload scenario validation#[cfg(test)]
mod tests {
#[test]
fn specific_edge_case() {
assert_eq!(function(edge_input), expected_output);
}
}
cargo test outputuse rstest::rstest;
#[rstest]
#[case("5s", Duration::from_secs(5))]
#[case("3m", Duration::from_secs(180))]
#[case("2h", Duration::from_secs(7200))]
#[case("0s", Duration::ZERO)]
fn parse_duration_valid(#[case] input: &str, #[case] expected: Duration) {
assert_eq!(parse_duration(input).unwrap(), expected);
}
#[rstest]
#[case("5x", ParseError::InvalidUnit)]
#[case("", ParseError::Empty)]
#[case("-1s", ParseError::Negative)]
fn parse_duration_errors(#[case] input: &str, #[case] expected: ParseError) {
assert_eq!(parse_duration(input).unwrap_err(), expected);
}
Dependency: rstest is declared in workspace [workspace.dependencies] as rstest = "0.25". Add rstest.workspace = true to crate-level [dev-dependencies].
Fixtures — shared setup across tests without boilerplate:
use rstest::*;
#[fixture]
fn config() -> Config {
Config::builder().timeout(Duration::from_secs(30)).build()
}
#[rstest]
fn test_with_default_config(config: Config) {
assert!(config.timeout().as_secs() > 0);
}
Matrix testing — combinatorial cases via multiple #[values] parameters:
#[rstest]
fn protocol_version_compat(
#[values(ProtocolVersion::V1, ProtocolVersion::V2)] version: ProtocolVersion,
#[values(true, false)] compressed: bool,
#[values(0, 1, 100)] payload_size: usize,
) {
let msg = Message::new(version, compressed, payload_size);
assert!(msg.is_valid());
}
// Generates 2 × 2 × 3 = 12 individual test cases
parse and format are inverses)#[cfg(test)]
mod tests {
use proptest::prelude::*;
proptest! {
#[test]
fn roundtrip_property(input in any::<ValidInput>()) {
let encoded = encode(&input);
let decoded = decode(&encoded).unwrap();
prop_assert_eq!(input, decoded);
}
}
}
Note: proptest is a direct dev-dependency — no feature gate needed for tests.
Some crates gate Arbitrary impls behind the test-support feature for use by
downstream test code (e.g., gossip-contracts exposes Arbitrary impls via
features = ["test-support"]).
// In fuzz/fuzz_targets/
#![no_main]
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
let _ = parse_untrusted(data);
});
Run with: cargo +nightly fuzz run <target>
#[cfg(kani)]
mod verification {
use super::*;
#[kani::proof]
fn verify_no_panic() {
let x: u32 = kani::any();
kani::assume(x < 1000);
let result = critical_function(x);
// Kani proves this never panics
}
#[kani::proof]
#[kani::unwind(10)]
fn verify_loop_bounds() {
let arr: [u8; 8] = kani::any();
process_array(&arr); // Prove no out-of-bounds
}
}
Run with: cargo kani --features kani
When to add simulation coverage:
Does it change coordination logic (acquire, complete, checkpoint, split)?
→ Add CoordinationSim test or extend existing mega_sim_tests
Does it change shard state transitions or lifecycle?
→ Ensure invariant checker (S1–S9) covers the new states
Does it change lease handling or fence epochs?
→ Test under Stormy/Radioactive fault levels
Does it change run lifecycle or session management?
→ Add behavioral scenario in sim_behavioral_tests
Adding a simulation test:
// In crates/gossip-coordination/src/sim/mega_sim_tests.rs or a new *_tests.rs
use crate::sim::{CoordinationSim, FaultLevel};
#[test]
fn my_new_coordination_scenario() {
let report = CoordinationSim::new(42, FaultLevel::Stormy)
.with_workers_and_shards(3, 5)
.run(500, 200);
assert!(report.violations.is_empty(), "{report:#?}");
}
Adding a proptest state machine test:
// In crates/gossip-coordination/src/sim/proptest_state_machine_tests.rs
// Use proptest to generate random operation sequences and verify invariants
Run with:
cargo test -p gossip-coordination --features test-support # All coordination tests incl. sim
cargo test -p gossip-coordination sim # Just sim-related tests
When recommending or reviewing tests that claim to prove an invariant, apply these checks before choosing the test type:
The Test Shape Hygiene checklist below operationalizes these same questions
during broader strategy reviews.
If an existing test fails these checks, use /invariant-test-review to produce
a structured diagnosis of what the test actually proves and what needs to
change.
When analyzing code for test strategy, consider:
Input Domain
#[values] matrixProperties to Verify
Code Characteristics
#[case]#[fixture]gossip-contracts/fuzz/)Test Shape Hygiene
Simulation Harness Checklist (always evaluate for coordination changes)
sim/mega_sim_tests.rsExisting Patterns in This Codebase
#[cfg(test)] mod tests inline, or sibling *_tests.rs files#[rstest] with #[case] (workspace dep rstest = "0.25")Arbitrary impls gated behind test-support feature in gossip-contracts#[cfg(kani)] blocks in gossip-stdxcrates/gossip-contracts/fuzz/ and crates/gossip-stdx/fuzz/crates/gossip-coordination/src/sim/ (CoordinationSim, proptest state machine, behavioral, overload)crates/scanner-engine/src/tiger_harness.rs (tiger-harness feature), crates/scanner-scheduler/src/scheduler/sim.rs (scheduler-sim feature)crates/scanner-engine-integration-tests/tests/crates/*/benches/crates/*/tests/identity_smoke.rs pattern## Test Strategy for `InlineVec<T, N>`
### Recommended Approach: Property Tests + Kani + Fuzz
**Rationale:**
- Generic data structure with large input space (push/pop/insert/remove sequences)
- Has invariants: length <= capacity, no out-of-bounds access
- Contains unsafe code for stack-allocated storage
- Already has fuzz targets in `crates/gossip-stdx/fuzz/`
**Specific Tests:**
1. **Property Test**: Collection invariants
- Property: `vec.len() <= N` after any sequence of operations
- Property: push-then-pop roundtrip preserves values
- Property: iteration yields exactly `len()` elements
2. **Kani Proof**: Memory safety of unsafe storage
- Prove: No out-of-bounds access in `unsafe` array ops
- Bound: Unwind factor based on max capacity N
3. **Fuzz Test**: Extend existing `fuzz_inline_vec` target
- Random operation sequences (push, pop, insert, remove, clear)
4. **Unit Tests**: Known edge cases
- Empty vec operations
- Full capacity behavior
- Single-element edge cases
## Test Strategy for `ShardSpec` Validation
### Recommended Approach: Parameterized Tests (rstest) + Property Tests
**Rationale:**
- Finite set of valid/invalid shard spec configurations
- Validation rules have specific expected error variants
- Split logic has mathematical properties (coverage, non-overlap)
**Specific Tests:**
1. **rstest Parameterized**: Known valid/invalid configurations
```rust
#[rstest]
#[case::valid_basic(spec(1, 100), true)]
#[case::zero_range(spec(0, 0), false)]
#[case::inverted_range(spec(100, 1), false)]
fn spec_validity(#[case] spec: ShardSpec, #[case] valid: bool) {
assert_eq!(spec.validate().is_ok(), valid);
}
```markdown
## Test Strategy for `LeaseManager` Changes
### Recommended Approach: CoordinationSim + Property Tests
**Rationale:**
- Lease behavior emerges from coordination protocol interactions
- Safety invariants (S1 mutual exclusion, S2 fence monotonicity) require external checking
- Must hold under fault injection (lease expiry, clock jumps)
**Specific Tests:**
1. **CoordinationSim**: Run under all fault levels
- SunnyDay: basic correctness
- Stormy: moderate fault tolerance
- Radioactive: aggressive fault tolerance
2. **Property Test**: Fence epoch monotonicity for random operation sequences
3. **Unit Tests**: Specific lease edge cases (expiry at exact boundary, zero-duration lease)
| Scenario | Primary | Secondary |
|----------|---------|-----------|
| New data structure | Property tests | Unit tests for edges |
| Enum/status mappings | rstest #[case] | Unit tests for edge cases |
| State transition tables | rstest #[case] | Property tests if transitions have invariants |
| Error code/message mapping | rstest #[case] | — |
| Config defaults/lookups | rstest #[case] | — |
| Combinatorial input validation | rstest #[values] matrix | Property tests for general invariants |
| Shared test fixtures | rstest #[fixture] | — |
| Parser/decoder | Fuzz tests | Property tests for roundtrip |
| Unsafe code | Kani proofs | Property tests for API |
| Algorithm correctness | Property tests | Unit tests for examples |
| Bug fix | Unit test (regression) | Sim test if coordination-related |
| Performance-critical loop | Kani (bounds) | Property tests |
| Coordination protocol change | CoordinationSim | Proptest state machine |
| Scanner engine / detection rules | TigerHarness + unit tests | Integration tests |
| Scheduler / parallel scan logic | SchedulerSim | Unit tests for edge cases |
| Git pack parsing / delta decode | Fuzz + Property tests | Unit tests |
| Shard lifecycle / state machine | CoordinationSim (all fault levels) | Unit tests for edge cases |
| Lease management | CoordinationSim (Stormy+) | Property tests for monotonicity |
| Identity types / derivation | Fuzz tests | Property tests for roundtrip |
| Data structures (stdx) | Fuzz + Property tests | Kani for unsafe |
| Connector / persistence | Conformance tests | Unit tests for edge cases |
/invariant-test-review — Deep-dive review when an existing test fails the
hygiene checks or a reviewer says the test does not prove its invariant/sim-review — Review DST compatibility and simulation-specific test risks/run-fuzz — Execute fuzz targets when adversarial input coverage is the
better fitdevelopment
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.