.codex/skills/sim-run/SKILL.md
Run deterministic simulation tests with progressive difficulty levels (sunny/stormy/radioactive) inspired by TigerBeetle VOPR — orchestrates seed management, workload selection, and invariant verification
npx skillsauth add ahrav/gossip-rs sim-runInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Execute simulation tests with progressive difficulty levels, inspired by TigerBeetle's VOPR three-level system. Manages seeds, selects workloads, injects faults, and reports invariant violations.
| Source | Principle | |--------|-----------| | TigerBeetle VOPR | 3 levels: clean → faults → catastrophic; 3.3 sec sim = 39 min real-world | | FoundationDB (SIGMOD 2021) | Tens of thousands of sims nightly, seeded determinism | | sled (simulation.html) | Discrete-event simulation with priority queue | | Eaton (practical DST guide) | Seeds break on code changes — use continuous exploration | | CockroachDB (Jepsen lessons) | Workload design determines simulation effectiveness | | Yuan et al. (OSDI 2014) | Error handling paths are highest-value targets |
/run-fuzz instead)/sim-scaffold first)Before running simulations, verify:
SimContext exists in gossip-coordination/src/sim/ (P1 prerequisite)/sim-scaffold)test-support feature gate is configuredproptest is available as a dev-dependencyIf prerequisites are missing, report what needs to be created and suggest
running /sim-scaffold first.
| Level | Name | Faults | Seeds | Context | |-------|------|--------|-------|---------| | 1 | Sunny Day | None — perfect conditions | 10 | Per-PR, pre-commit, quick sanity | | 2 | Stormy | Network partitions, lease expiry, message loss/reorder, clock skew | 100 | Nightly CI, pre-merge | | 3 | Radioactive | Storage corruption, cascading failures, Byzantine faults, all Level 2 combined | 1000+ | Pre-release, deep exploration |
If the user doesn't specify a level:
Each workload template exercises a specific coordination scenario. Select all that are relevant to the changed code, or run all for comprehensive coverage.
| Template | Exercises | Priority | |----------|-----------|----------| | Full scan lifecycle | Acquire → checkpoint x N → complete | ALWAYS | | Lease expiry during checkpoint | Acquire → delay → checkpoint with stale lease | HIGH — etcd Jepsen 3.4.3 pattern | | Split during active scan | Acquire → checkpoint → split → verify children | HIGH — shard coverage invariant | | Concurrent acquisition | 2+ workers acquire same shard | HIGH — fence monotonicity | | Ambiguous failure | Network drop after mutation succeeds | MED — Jepsen Redis-Raft pattern | | Park and unpark | Active → Parked → Active (reacquire) | MED — state machine completeness | | Mass restart | All workers restart simultaneously | MED — Serf/memberlist edge case | | Asymmetric partition | A→B works, B→A drops | LOW — Lifeguard false positives |
Execute tests with the selected level and seeds:
# Level 1 (Sunny Day) — quick sanity check
cargo test --features test-support -- --test-threads=1 sim::level1
# Level 2 (Stormy) — fault injection
SIM_SEEDS=100 SIM_LEVEL=2 cargo test --features test-support -- --test-threads=1 sim::level2
# Level 3 (Radioactive) — maximum chaos
SIM_SEEDS=1000 SIM_LEVEL=3 cargo test --features test-support -- --test-threads=1 sim::level3
# Reproduce a specific failure
SIM_SEED=12345 cargo test --features test-support -- --test-threads=1 sim::specific_test
Key flags:
--test-threads=1: Required for deterministic execution (no thread interleaving)SIM_SEEDS: Number of random seeds to exploreSIM_LEVEL: Fault injection intensitySIM_SEED: Specific seed for reproductionOn success (all seeds pass):
SIMULATION REPORT — Level {N}
═════════════════════════════
Result: PASS
Seeds tested: {count}
Wall time: {duration}
Sim time: {total simulated time}
Time compression: {sim_time / wall_time}x
Workload coverage:
[x] Full scan lifecycle — {N} occurrences
[x] Lease expiry during checkpoint — {N} occurrences
[x] Split during active scan — {N} occurrences
[x] Concurrent acquisition — {N} occurrences
[ ] Asymmetric partition — 0 occurrences (not exercised)
Invariants verified:
- Mutual exclusion (lease) — checked {N} times
- Fence monotonicity — checked {N} times
- Shard coverage (no gaps) — checked {N} times
- Terminal irreversibility — checked {N} times
On failure:
SIMULATION REPORT — Level {N}
═════════════════════════════
Result: FAIL
Seeds tested: {count} ({failures} failures)
Unique bugs: {count}
Invariant violations:
┌─────────────────────────────┬───────┬─────────────────────┐
│ Invariant │ Count │ Seeds │
├─────────────────────────────┼───────┼─────────────────────┤
│ Mutual exclusion (lease) │ 3 │ 42, 1337, 99999 │
│ Fence monotonicity │ 1 │ 42 │
└─────────────────────────────┴───────┴─────────────────────┘
Reproduction commands:
SIM_SEED=42 cargo test --features test-support -- sim::level2::lease_expiry_during_checkpoint
SIM_SEED=1337 cargo test --features test-support -- sim::level2::concurrent_acquisition
Failure trace (seed=42):
t=0 Worker-1: acquire_shard(shard-A) → Ok(epoch=1)
t=5 Worker-2: acquire_shard(shard-A) → Ok(epoch=2)
t=7 FAULT: pause Worker-2 for 100 ticks
t=10 Worker-1: checkpoint(shard-A, epoch=1) → Ok ← VIOLATION: stale epoch accepted
t=107 Worker-2: resume
...
When a simulation failure is found:
After the run, assess workload coverage:
If critical workloads are under-covered, recommend:
| Fault | Description | Implementation |
|-------|-------------|----------------|
| Network partition | Messages between node subsets are dropped | NetworkFaults.partitions |
| Message loss | Random message drops (10-30%) | NetworkFaults.drop_rate |
| Message reorder | Messages arrive out of order | Random delay in event queue |
| Lease expiry | Force lease timeout mid-operation | Advance clock past lease deadline |
| Process pause | Freeze a worker for N ticks | Skip node in simulation loop |
| Clock skew | Nodes disagree on current time | Per-node clock offset |
All Level 2 faults plus:
| Fault | Description | Implementation | |-------|-------------|----------------| | Storage corruption | Backend returns garbage data | Mock backend returns corrupted state | | Cascading failure | Failure of one node triggers failures in others | Fault propagation rules | | Byzantine | Node sends conflicting messages to different peers | Duplicate + modify messages | | Split brain | Two partitions each believe they're the majority | Symmetric partition | | Repeated crash-restart | Node crashes and restarts multiple times | Reset node state periodically |
These are the core invariants that every simulation run must verify. They map to the phase 2 spec invariant catalog.
| ID | Invariant | Check | |----|-----------|-------| | S1 | Mutual exclusion — at most one active lease per shard | Count active leases per shard ≤ 1 | | S2 | Fence monotonicity — epochs never decrease | Track max epoch per shard, assert non-decreasing | | S3 | Terminal irreversibility — Done/Failed shards never transition | Assert no transitions from terminal states | | S4 | Shard coverage — split children cover parent range exactly | Verify range algebra on split | | S5 | Idempotency — duplicate OpId returns same result | Replay operations, assert identical results | | S6 | Progress — under fair scheduling, work eventually completes | Assert completion within bounded time | | L1 | Lease exclusivity — stale-epoch checkpoints are rejected | Attempt checkpoint with old epoch, assert rejection | | L2 | Zombie rejection — restarted worker cannot use old lease | Kill and restart worker, assert old operations fail |
Do NOT save specific seeds as regression tests.
Seeds break whenever code changes (Eaton survey, Antithesis blog). Instead:
This is counterintuitive but well-established: FoundationDB, TigerBeetle, and Antithesis all use continuous exploration over saved seeds.
/sim-review — Verify code is DST-compatible before running simulations/sim-scaffold — Generate simulation harnesses/jepsen-test — Complement DST with real-network cluster tests/test-strategy — Choose between simulation and other test approachesdevelopment
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.