plugins/axiom-determinism-and-replay/skills/using-determinism-and-replay/SKILL.md
Use when designing a system whose past behaviour must be recoverable as a fact — RL training substrates, multi-agent simulations, deterministic game engines, replay-debuggable services, multiplayer lockstep, or any pipeline where "I cannot reproduce that bug" is unacceptable. Use when teams disagree about what "snapshot", "seed", or "replay" mean across modules. Use when cross-machine or cross-process determinism is required, or a regulator will ask "what was the input at tick T?". Architecture-level — how to design a deterministic system. For verifying an existing simulation against known patterns, use `/check-determinism` from yzmir-simulation-foundations instead.
npx skillsauth add tachyon-beep/skillpacks using-determinism-and-replayInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
What happened is recoverable as a fact, or it is not. There is no "mostly recoverable."
This pack treats replay — the ability to re-run a system from recorded inputs and observe identical behaviour — as a property of architecture, not a property of effort. A system either has a named determinism class, governed seeds, isolated RNGs, a snapshot strategy that matches its tick rate, a divergence-detection protocol that points at the first differing operation, and replay infrastructure that distinguishes read-only investigation from branching counterfactuals — or it has a stack of fragile assumptions that survive only as long as nobody touches them.
This is the architectural counterpart to verification-of-an-existing-simulation:
yzmir-simulation-foundations:check-determinism answers does this specific simulation hold under repeated runs? — a verification command that scans for known violation patterns (unseeded RNGs, dict iteration, wall-clock leaks).axiom-determinism-and-replay (this pack) answers how do I design a system whose determinism is a load-bearing property? — seed-propagation rules, RNG ownership, snapshot cadence, divergence localisation protocol, replay-loop architecture, the distinction between "logically equivalent" and "bit-exact" recovery.check-determinism finds violations in a system that already shipped without this pack, this pack tells you what to design into the next version. If you are designing a new substrate, run this pack first; check-determinism then becomes a regression check.Use this pack when:
Do not use this pack when:
/check-determinism (yzmir-simulation-foundations)./using-audit-pipelines (axiom-audit-pipelines). That pack handles canonical encoding, fingerprint chains, and signed exports for evidence; this pack handles re-runnable execution for debugging.yzmir-simulation-foundations covers integrator selection and stability.If your input is a system being designed (or significantly redesigned) and you have not run this pack before:
determinism-vs-reproducibility.md — fix the vocabulary before fighting about implementation. Pick a determinism class, emit 01-determinism-class.md.seed-governance.md — seeds are inputs, not implementation details. Decide where they live and how they propagate, emit 02-seed-governance-spec.md.rng-isolation-patterns.md — one big RNG is the most common mistake in this whole pack. Decide hierarchy, emit 03-rng-isolation-spec.md.99-determinism-and-replay-specification.md ready.Steps 1–3 are the spike. If those three artifacts hold together — vocabulary fixed, seeds governed, RNGs isolated — the rest is well-defined work. If they don't, no later sheet can save you. Most replay-determinism failures in shipped systems trace back to one of these three.
All reference sheets are in the same directory as this SKILL.md. When you see a link like [seed-governance.md](seed-governance.md), read the file from the same directory.
yzmir-simulation-foundations axiom-determinism-and-replay
/check-determinism (verify) ←-cross-ref-→ architecture (design)
scans an existing sim for designs seed governance,
unseeded RNGs, dict iteration, RNG isolation, snapshot
wall-clock leaks strategy, divergence
protocol, replay loop
─────────────────────────────────────────────────────────────────────
↓
Use this pack on a NEW system; use /check-determinism
as a regression check on the system you built with this pack.
axiom-audit-pipelines (evidence) axiom-determinism-and-replay (re-execution)
decision-log, fingerprint chain, ←-cross-ref-→ seeds, RNGs, snapshots,
signed exports, retention divergence, replay loop
→ "prove this decision happened" → "re-run this episode"
─────────────────────────────────────────────────────────────────────
Different problems. Audit logs are the *artefact* a verifier
checks; replay infrastructure is the *machine* that re-runs the
episode. They share canonical-encoding hygiene; the rest diverges.
The pack produces a numbered artifact set in a determinism-and-replay/ workspace:
| # | Artifact | Producer skill | Required |
|---|----------|----------------|----------|
| 00 | scope-and-goals.md | router (this SKILL.md) | Always |
| 01 | determinism-class.md | determinism-vs-reproducibility | Always |
| 02 | seed-governance-spec.md | seed-governance | Always |
| 03 | rng-isolation-spec.md | rng-isolation-patterns | Always |
| 04 | snapshot-strategy.md | snapshot-strategy | S+ |
| 05 | divergence-protocol.md | divergence-detection-and-localisation | S+ |
| 06 | replay-infrastructure-spec.md | replay-infrastructure-design | S+ |
| 07 | concurrency-determinism-spec.md | determinism-under-concurrency | M+ if concurrent |
| 08 | floating-point-policy.md | floating-point-determinism | L+ if FP-heavy |
| 09 | gpu-determinism-config.md | gpu-determinism | L+ if GPU |
| 10 | external-effects-substitution.md | external-effects-substitution | L+ if external IO |
| 11 | canonical-state-encoding.md | canonical-state-encoding-for-replay | XL (also L if cross-pack with audit) |
| 12 | property-test-suite.md | property-tests-as-determinism-checks | M+ |
| 13 | cost-of-determinism.md | cost-of-determinism | Always |
| 99 | determinism-and-replay-specification.md | router-owned consolidation | Always |
The numbered artifacts are not independent — changes propagate. Read this before editing any spec.
01-determinism-class.md (the class — bit-exact, logical, statistical)
│
▼
02-seed-governance-spec.md (seeds as inputs, propagation, audit)
│
▼
03-rng-isolation-spec.md (per-component RNGs derived from governed seeds)
│
▼
04-snapshot-strategy.md (state capture cadence, full vs delta vs event-sourced)
│ │
▼ ▼
05-divergence-protocol.md 06-replay-infrastructure-spec.md
(compare-points, hashes) (read-only, branching, lifecycle)
Channel-specific deltas hang off the spine; each refines a leak the spine alone
does not close. Their re-emission is governed by the per-sheet "class-breaking
events" sections, summarised below.
01- ─┬─ 07-concurrency-determinism-spec.md (lockstep / recorded-schedule / schedule-independent)
├─ 08-floating-point-policy.md (BLAS pinning, FMA, reduction order, ε)
├─ 09-gpu-determinism-config.md (cuDNN, cuBLAS, NCCL, atomic-float audit)
├─ 10-external-effects-substitution.md (Effects layer; record-and-replay)
└─ 11-canonical-state-encoding.md (snapshot bytes; per-tick hashing pattern)
↑ cross-links axiom-audit-pipelines
02-, 03-, 04-, 05-, 06-, 07- ──► 12-property-test-suite.md
(replay equivalence, seed isolation,
snapshot round-trip, restore idempotence,
fork-and-converge, schedule-independence)
01-, 02-, ..., 12- ──► 13-cost-of-determinism.md
(per-rule cost; tier justification; trade record;
response on perf-budget breach)
Coordinated re-emission rules — spine (01–06):
| If you change | You also re-emit | Class-breaking? |
|---------------|------------------|-----------------|
| 01- determinism class (e.g., logical → bit-exact) | All downstream — every other artifact's tolerance changes | Yes (full re-design) |
| 02- seed propagation rule (e.g., add a new RNG-bearing component) | 03- (new RNG slot), 05- (new compare-point), 06- (replay must re-derive new seed), 12- (Property 2 test vector) | Yes |
| 03- RNG hierarchy (e.g., per-actor → per-actor-per-decision) | 02- if seed derivation rule changes, 05- (granularity of compare), 06-, 12- | Yes |
| 04- snapshot cadence | 05- (compare-point density), 06- (rewind granularity) | No (replay still works, performance changes) |
| 04- snapshot encoding (full → delta) | 05- (hash strategy must adapt), 06- (rehydration logic), 11- (delta canonicalisation) | Maybe (depends on whether old snapshots are kept) |
| 05- compare-point set | 06- (replay loop emits compare-points), 11- (hashed bytes) | No |
| 06- rewind/branch capability | 04- (full snapshots required at branches if branching added), 12- (Property 5) | No-to-yes (depends on whether branching is new) |
Coordinated re-emission rules — channel-specific (07–11):
| If you change | You also re-emit | Class-breaking? |
|---------------|------------------|-----------------|
| 07- concurrency strategy (A/B/C swap, or new thread / actor / async task) | 12- (Property 6 if A/C; record-trace tests if B), 13- (cost change) | Yes (strategy swap); No (additional component) |
| 07- schedule-trace format (Strategy B) | 06- (replayer reads new format), 13- | Yes |
| 08- library or thread-count pin (BLAS swap, FMA flag, OMP_NUM_THREADS, denormal mode) | 11- (hash policy if precision changes), 12- (Property 1 vector), 13- | Yes |
| 08- tolerance ε change | 01- (class refines), 05- (comparison threshold), 12- (assertion ε) | Yes |
| 09- GPU framework knob, driver, NCCL algo, or atomic-float kernel introduction | 11- (if precision impact), 12- (Property 1 multi-SKU vector), 13- | Yes |
| 10- substitution pattern for an effect (record→deterministic-function, etc.) | 06- (Effects-layer wiring), 12- (Property 1 includes external cursor), 13- | Yes |
| 10- new external effect introduced | 06- (Effects-layer extension), 12- (new test vector) | Yes |
| 11- encoding library upgrade or canonicalisation rule change | 04- (snapshot bytes change), 05- (hashes recompute), 12- (test vectors); cross-pack: re-cite axiom-audit-pipelines:canonical-encoding-for-fingerprinting | Yes |
| 11- projection / Merkle / incremental hashing pattern | 05- (compare-point hash function), 06- (replay emits matching hashes) | Yes |
Coordinated re-emission rules — verification and trade-off (12–13):
| If you change | You also re-emit | Class-breaking? |
|---------------|------------------|-----------------|
| 12- property added or generator broadened | None upstream; the property suite is the consumer; CI absorbs | No |
| 12- property fails with a shrunk counterexample | The responsible upstream artifact (named by replay-debugger channel attribution); re-gate at affected checks | Yes (the fail localises a class-breaking finding) |
| 13- rule relaxed under perf pressure | The affected upstream artifact (knob change → 08-/09-; library swap → 08-/09-/11-; strategy downgrade → 07-); re-gate Check 17 + the technical check | Yes (relaxations are class-breaking) |
| 13- tier promotion | Add the tier-required sheets (e.g., S→M adds 07-/12-; M→L adds 08-/09-/10-); re-gate at new tier | Yes |
A change not listed above is not exempt; it is evaluated against the consistency gate's affected checks. Default for ambiguity: treat as class-breaking unless 01- explicitly tolerates it. Per-sheet "class-breaking events" sections are authoritative for their channel; this table is the cross-channel index.
Every workflow is classified during determinism-vs-reproducibility and recorded in 00-scope-and-goals.md and 01-determinism-class.md. The tier determines which artifacts are required and how strict the consistency gate runs.
| Tier | Trigger | Required artifacts |
|------|---------|--------------------|
| XS | Single-process simulation, debug-only replay, single developer machine | 00, 01, 02, 03, 13; 04, 05, 06 may be one-page memos |
| S | Replay-from-seed for debugging across team machines, single OS | XS set + full 04, 05, 06 |
| M | Replay infrastructure consumed by other systems (RL training loop, regression suite, multi-process workers) | S set + 07- if concurrent + 12- property tests + per-process snapshot ownership rules in 04-, cross-process compare-point sync in 05- |
| L | Cross-machine reproducibility (CI fingerprints match dev fingerprints; bit-exact across architectures) | M set + 08-floating-point-policy.md, 09-gpu-determinism-config.md, 10-external-effects-substitution.md; 11- if sharing canonical encoding with audit pack |
| XL | Bit-exact reproducibility across heterogeneous hardware, regulatory or contractual replay obligation | L set + 11-canonical-state-encoding.md (cross-links axiom-audit-pipelines:canonical-encoding-for-fingerprinting), full 12-property-test-suite.md including stateful tests, named cryptographic hash on every snapshot |
Tier is authoritative. If any sheet's guidance forces an artifact above your declared tier, that artifact becomes required — this is a tier promotion, not a waiver.
determinism-vs-reproducibility → 01- (pick the class — logical equivalence is enough for most RL; bit-exact only if you have a specific reason)seed-governance → 02- (seeds live in the run config, propagate by derivation, are recorded as inputs)rng-isolation-patterns → 03- (per-environment RNG, per-policy RNG, per-replay-buffer RNG; derive each from a master seed)snapshot-strategy → 04- (full snapshots at episode boundaries; deltas mid-episode if the per-tick state is large)divergence-detection-and-localisation → 05- (per-tick state hash for replay verification; binary-search for the first differing tick)replay-infrastructure-design → 06- (read-only replay for debugging; branching replay for league play)determinism-under-concurrency → 07- (lockstep at vec-env layer; per-rank seed derivation)gpu-determinism → 09- and floating-point-determinism → 08- (cuDNN deterministic mode, atomic-float audit, BLAS pinning)external-effects-substitution → 10- (Effects layer; recorded clock; no live network in replay)property-tests-as-determinism-checks → 12- (replay equivalence + seed isolation + snapshot round-trip)cost-of-determinism → 13- (record what each rule costs; tier justification)99-determinism-and-replay-specification.md and run the consistency gate. /scaffold-replay-system emits the boilerplate.divergence-detection-and-localisation → start with the divergence protocol; localise to the first differing operation before changing anything./diagnose-divergence (dispatches replay-debugger) to bisect to T₀ and attribute the channel.02-), RNG ownership (03-), snapshot/state-capture (04-), concurrency (07-), FP (08-), GPU (09-), external effects (10-), or canonical encoding (11-).12- to catch the regression./check-determinism (yzmir-simulation-foundations) first to catch the common patterns (unseeded RNG, dict iteration, wall-clock leaks)./check-determinism returns clean and the system still desyncs, you have an architectural problem — start at divergence-detection-and-localisation.determinism-vs-reproducibility if the team disagrees on what determinism even means here.snapshot-strategy (04-) — branching replay needs full-state snapshots, not deltas, at every branch point.replay-infrastructure-design (06-) — read-only and branching replay are different machines. Don't conflate.rng-isolation-patterns (03-) — branching from tick T must re-derive every RNG state from the snapshot, not from re-seeding the master.Read determinism-vs-reproducibility first. If the answer is "we just want bugs to reproduce when re-run on the same machine," you may need only XS-tier work (seeds + RNG isolation) without snapshot or replay infrastructure. The pack is scaled to what you actually need.
agent: determinism-reviewer — Reviews a system design for sources of non-determinism. Reads design artifacts, walks the ten channels (seeds, RNG, snapshot, divergence, replay, concurrency, FP, GPU, external effects, canonical encoding), reports gaps with severity, and cites the resolving sheet. Invoked via /scaffold-replay-system's gap-analysis option or directly via the Task tool.agent: replay-debugger — Given a divergence between two runs (or a recorded run and a replay), walks the divergence protocol from 05- back to the first differing operation, attributes the divergence to a channel, and produces a localisation report. Invoked via /diagnose-divergence or /verify-replay --diagnose-on-fail./scaffold-replay-system <component> — Drop in record/replay loop boilerplate aligned to declared tier; consumes the numbered spec set, optionally runs gap-analysis via determinism-reviewer, and emits Effects substitution layer, snapshot envelope, divergence-detection compare-points, and CI hooks./verify-replay <run_record_path> — Re-run a recorded trace and assert class-appropriate equivalence; emits a replay verification statement; chains into replay-debugger on FAIL via the --diagnose-on-fail flag./diagnose-divergence <run_a> <run_b> — Given two divergent runs, localise the first differing operation by dispatching replay-debugger; emits a diagnosis report with channel attribution and next-action suggestions.Run before emitting 99-determinism-and-replay-specification.md. Each check produces a pass/fail line in the gate report. Failures must be addressed or recorded as explicit waivers (with reactivation conditions); silent drops are the failure mode this pack exists to prevent.
| # | Check | Question |
|---|-------|----------|
| 1 | Tier coverage | Every artifact required by the declared tier exists. Tier-promotion required artifacts (07- for concurrent, 08-/09-/10- for L+, 11- for XL or audit-pack-overlap) are present or explicitly N/A with the precondition stated. |
| 2 | Class clarity | 01- names the determinism class with a single sentence a new team member can apply: "two runs are equivalent iff [precise predicate]." Vague claims like "deterministic" without a class fail. |
| 3 | Seed lineage | 02- traces every RNG-bearing component back to the run config's seed. No "uses default seed" entries. No code-path-dependent seed derivation (the same component must derive the same sub-seed regardless of execution order). |
| 4 | RNG isolation | 03- shows no shared RNG across logically-independent components. The "one big RNG" pattern is not present, or is recorded as an explicit accepted-risk with the divergence behaviour it permits. |
| 5 | Snapshot honesty | 04- states what is and is not captured by a snapshot. State outside the snapshot (caches, lazy initialisation, JIT compilation effects, OS scheduler state, GPU memory) is enumerated and addressed (re-derived on rehydration, accepted as non-replayable, or pinned out of band). |
| 6 | Divergence localisation | 05- defines compare-points (tick boundary, decision boundary, snapshot boundary), the hash function over state at each, and the localisation procedure (binary search, bisection, structured comparison). "We just diff the logs" fails. |
| 7 | Replay scope honesty | 06- separates read-only replay (no branching, observation only) from branching replay (rewind to checkpoint, re-execute with new inputs). Capabilities are scoped per workload — "fully replayable" without proof of rewind capability fails. |
| 8 | Cross-pack handoff | If axiom-audit-pipelines artifacts exist, 04- snapshot encoding cross-references the audit pack's canonical-encoding sheet rather than duplicating it. If yzmir-simulation-foundations is in play, 99- cites which /check-determinism violations are covered by which artifact. |
| 9 | Versioning rule | 99- declares its semver and the re-emission rules from the dependency graph. Any change to 01- is class-breaking and forces a major-version bump. |
| 10 | Test-vector strategy | At minimum, the spec names one recorded run-with-known-state-hash that all future runs of the same code at the same seed must reproduce. Without a test vector, "deterministic" is an assertion, not a property. |
| 11 | Concurrency strategy named | If the system has any concurrency (threads, processes, async tasks, actors), 07- names the strategy (A lockstep / B recorded schedule / C schedule-independent) per subsystem and enumerates schedule-sensitive operations. "Mostly deterministic on N cores" fails. |
| 12 | FP policy | If 01- is bit-exact and the system has FP arithmetic, 08- pins libraries (BLAS, NumPy, libm), reduction order (sequential / Kahan / parallel-with-fixed-thread-count), FMA flag, denormal mode, thread-count caps, and tolerance ε for non-bit-exact classes. |
| 13 | GPU determinism | If 01- is bit-exact-or-tighter and the system uses GPU, 09- declares framework knobs (use_deterministic_algorithms, cudnn.deterministic, cudnn.benchmark=False, CUBLAS_WORKSPACE_CONFIG, TF32 policy, NCCL algo+proto), pins driver and library versions, audits atomic-float kernels, and asserts knobs at run start. |
| 14 | External-effects substitution | If the system reads time, network, file system, environment, or any external, 10- enumerates each occurrence, names the substitution pattern (deterministic-function / record-and-replay), and confirms a single Effects layer is the only access path. Mocks-in-production fails. |
| 15 | Canonical state encoding | If 04- snapshots are hashed in 05-, 11- declares the encoding library (JCS or equivalent), pins library version, names the per-tick hashing pattern (incremental / Merkle / projection), forbids pickle in the snapshot path, and pins tensor canonicalisation (endianness, contiguity, dtype encoding). |
| 16 | Property tests | At tier M+, 12- declares the property suite (replay equivalence, seed isolation, snapshot round-trip; +restore idempotence at M+; +fork-and-converge for branching replay; +schedule independence for non-Strategy-B concurrency). The test database is checked in; failing seeds are reproducible across the team. |
| 17 | Cost record | 13- names the per-rule cost (perf, library, refactoring, ops, cognitive), the tier justification, the library trade record, the partial-determinism boundary, and the response-on-budget-breach. A determinism spec without a cost spec is a spec that gets violated quietly under deadline. |
A 99-determinism-and-replay-specification.md whose gate report is older than its latest numbered artifact is stale and must be re-gated before downstream citation.
Checks 11–17 are evaluated only if their precondition holds (system has concurrency / FP / GPU / external effects / hashing / tier ≥ M / always-for-cost). A check whose precondition is absent is recorded as N/A; the consistency gate does not fail on N/A but does require the absence to be explicit, not implicit.
| Change shape | Re-run | Re-gate |
|--------------|--------|---------|
| New RNG-bearing component added | 02-, 03-, 05- (new compare-point), 06- (replay re-derives new RNG), 12- (Property 2 vector) | Checks 3, 4, 6, 16 |
| Determinism class promoted (logical → bit-exact) | 01- (major version bump), all downstream | Re-gate at promoted tier |
| Tier promoted (e.g., S → M for cross-process) | 04- per-process ownership, 05- cross-process compare, 07- cross-process strategy, 12- add property tests | Re-gate at new tier |
| Snapshot encoding change (full → delta) | 04-, 05- hash strategy, 06- rehydration, 11- delta canonicalisation | Checks 5, 6, 7, 15 |
| New compare-point added | 05-, 06- emit-on-replay | Check 6 |
| Branching replay added to a system that previously had read-only only | 04- (full snapshots required at branches), 06-, 12- (Property 5) | Checks 5, 7, 16 |
| External effect (clock, IO, network) introduced | 10- (substitution pattern), 06- (Effects layer wiring), 12- (replay-equivalence test for external) | Checks 14, 16 |
| New thread / process / actor / async task introduced | 07- (concurrency strategy), 12- (schedule-independence test if Strategy A/C) | Check 11 |
| Library upgrade affecting FP, GPU, or canonical encoding | 08-, 09-, 11- as applicable; re-run pinned test vectors | Checks 12, 13, 15 |
| Determinism rule relaxed for perf | 13- cost record updated; affected sheets re-emit | Check 17 + affected technical checks |
Bump the 99- semver on every re-emission. Re-gate before downstream citation.
| Condition | Response |
|-----------|----------|
| The system makes no sequential decisions and has no internal state worth re-running | Stop. The system is stateless or near-stateless; replay is the wrong primitive. Record the determination in a one-page memo per 13-cost-of-determinism.md's "honest no" pattern. |
| The team disagrees on what "deterministic" means and the disagreement is not vocabulary but values (one party will accept "logically equivalent"; another wants "bit-exact across architectures") | Stop at 01-. Resolve the disagreement before any other artifact is written. The class drives the entire downstream design. |
| Required cross-machine bit-exactness conflicts with required GPU performance | Read 09-gpu-determinism-config.md and 13-cost-of-determinism.md together. Typical resolution: lock cuDNN to deterministic mode, accept the perf hit, document the tradeoff in 13-. If the perf budget cannot absorb the cost, re-classify to logical-equivalence with ε in 01-. Do not improvise. |
| Snapshot frequency required for the divergence protocol is impossible at the system's tick rate | Return to 04- and consider per-decision snapshots instead of per-tick, or delta encoding with periodic full snapshots, or accept a coarser localisation. Cross-link 11-canonical-state-encoding.md for hashing patterns that scale. Do not silently drop snapshots. |
| External effects (time, network, third-party calls) cannot be substituted and the system is not closed | Read 10-external-effects-substitution.md. If a closed-world model is impossible (e.g., the system's value depends on calling a third-party API whose responses cannot be recorded), the tier cannot exceed S. Record the limitation in 01- and 13-. |
| Property tests fail with a shrunk counterexample | Class-breaking finding. Treat as a real divergence: dispatch /diagnose-divergence against the failing run, attribute the channel, fix the responsible artifact, re-emit, re-gate. The shrunk seed becomes a permanent regression test via @example(...). |
Is the system supposed to behave the same way when re-run on the same inputs?
├─ No → wrong pack; this is for systems where re-run-equivalence is required
└─ Yes → Continue
Do you need to re-run the system AT ALL, or just have bugs reproduce locally?
├─ Just local reproduction → XS tier (seeds + RNG isolation only)
└─ Need a real replay machine → Continue
Read-only investigation, or branching counterfactual?
├─ Read-only only → snapshot strategy may be sparse; replay infrastructure is single-machine
└─ Branching → full snapshots at branch points; replay infrastructure has rewind primitives
Concurrent (threads, processes, async, actors)?
├─ No → skip 07-
└─ Yes → 07-; pick lockstep / recorded-schedule / schedule-independent
Single machine, or cross-machine reproducibility required?
├─ Single machine → S or M tier
└─ Cross-machine → L tier; 08- (FP), 09- (GPU if used), 10- (external effects), 11- (canonical encoding if hashing)
External effects (clock, network, file system, third-party APIs)?
├─ No → skip 10-
└─ Yes → 10-; the Effects layer is mandatory; record-and-replay or deterministic-function
Is there an existing simulation to investigate (not design)?
├─ Yes → /check-determinism (yzmir-simulation-foundations) first
└─ No → /determinism-vs-reproducibility (this pack) first
Live divergence (two runs that disagree)?
├─ Yes → /diagnose-divergence (this pack) — localises and attributes; do not redesign first
└─ No → continue with design pass
yzmir-simulation-foundations: /check-determinism scans an existing system
→ finds violations of common patterns (unseeded RNG, dict iteration, wall-clock leaks)
→ output is a list of code-level fixes
axiom-determinism-and-replay (this pack): designs a determinism-bearing architecture
→ output is a numbered artifact set defining seed governance, RNG ownership,
snapshot strategy, divergence protocol, replay loop
→ /check-determinism becomes the regression check on the system you built
The boundary: /check-determinism is audit of an existing system against known patterns. This pack is architecture of a new system to make those patterns impossible.
axiom-audit-pipelines: decisions are evidence; canonical bytes, fingerprint chains,
signed exports, retention, threat model OF the log
axiom-determinism-and-replay (this pack): execution is replayable; seeds, RNGs,
snapshots, compare-points, replay loop
The two packs share canonical-encoding hygiene: 04-snapshot-strategy.md and 11-canonical-state-encoding.md cross-reference axiom-audit-pipelines:canonical-encoding-for-fingerprinting rather than duplicating the RFC 8785 / JCS gotcha catalog. The boundary stays clean: audit is for proving a decision happened; replay is for re-running the execution that produced it.
If a system needs both — replay-debuggable AND audit-of-decisions — both packs apply. The audit pack's 08-replay-capability.md (partial replay from trail) is a different thing from this pack's 06-replay-infrastructure-spec.md (full execution machine); the former replays log entries, the latter replays the system that emitted them.
solution-architect's 04-solution-overview.md cites this pack's 99- when the system
has replay obligations
solution-architect's adrs/ cite specific choices (determinism class, snapshot encoding,
RNG hierarchy, compare-point granularity)
solution-architect's 17-risk-register.md cites this pack's 99- for class-breaking-change risk
If solution-architect is in play and the system has replay or determinism obligations, this pack's 99- is a normal input to 04- and to ADRs.
RL substrates are the canonical use case for this pack. An RL training run that cannot be replayed cannot be debugged, cannot be reproduced for a paper, cannot be audited for an evaluation, cannot be branched for league play. If yzmir-deep-rl is the algorithm pack, axiom-determinism-and-replay is the substrate pack underneath it.
| Need | Use This |
|------|----------|
| Define what "deterministic" actually means here | determinism-vs-reproducibility |
| Decide where seeds live and how they propagate | seed-governance |
| Partition RNGs across components | rng-isolation-patterns |
| Decide snapshot cadence and encoding | snapshot-strategy |
| Detect and localise a divergence | divergence-detection-and-localisation |
| Design the replay loop (read-only vs branching) | replay-infrastructure-design |
| Pick a concurrency strategy (lockstep / recorded schedule / schedule-independent) | determinism-under-concurrency |
| Lock floating-point arithmetic across libraries / threads / architectures | floating-point-determinism |
| Lock GPU determinism (cuDNN, cuBLAS, NCCL, atomic-float kernels) | gpu-determinism |
| Substitute external effects (time, network, file system) for replay | external-effects-substitution |
| Choose canonical encoding for snapshot bytes and per-tick hashing | canonical-state-encoding-for-replay |
| Express determinism as testable invariants (Hypothesis, proptest) | property-tests-as-determinism-checks |
| Record what determinism costs and when not to pay | cost-of-determinism |
| Verify an existing simulation against known patterns | /check-determinism (yzmir-simulation-foundations) |
| Review a system design for determinism leaks (severity-rated) | determinism-reviewer agent |
| Localise a divergence between two runs to first-differing op | replay-debugger agent |
| Drop in record/replay loop boilerplate aligned to declared tier | /scaffold-replay-system |
| Re-run a recorded trace and assert class-appropriate equivalence | /verify-replay |
| Given two divergent runs, localise the first differing operation | /diagnose-divergence |
Replay is an architectural property, not an effort. A system has a named determinism class, governed seeds, isolated RNGs, an honest snapshot strategy, a divergence-detection protocol that points at first-differing operations, and a replay loop scoped to the workloads it serves — or it has assumptions that survive only as long as nobody touches them. Design the spec before writing the code; gate the spec for consistency before downstream citation.
After routing, load the appropriate specialist sheet for detailed guidance.
Foundational (always required for any tier):
time.time() anti-patternReplay machinery (S+):
Channel-specific discipline (M/L/XL as triggered):
axiom-audit-pipelines:canonical-encoding-for-fingerprinting; per-tick hashing patterns; tensor canonicalisation; snapshot envelope schemaVerification and trade-off accounting (M+ / always):
development
Use when **managing the delivery of work** rather than building it — running a project or a program, not writing its code. Use when a team is busy but outcomes are not landing, when "when will it be done" has no defensible answer, when status is green every week until it is suddenly red, when dependencies surprise you, when a RAID log is a graveyard, or when several projects must be coordinated toward one outcome (a program). Lean/agile-leaning, honest about where program scale needs predictive structure. Pairs with `/axiom-planning` (turning one workstream into an implementation plan) and `/axiom-sdlc-engineering` (process maturity, requirements traceability, formal governance). Do not load for writing code, picking an architecture, or designing a single feature.
tools
--- name: using-product-management description: Use when a Claude is taking **standing ownership** of a software product and driving it end-to-end across many sessions — discovery, strategy, specs, delivery orchestration, and value validation — deciding *what to build, why, for whom,* and *whether it worked*, with continuity, decision provenance, and an authority boundary that escalates anything irreversible or outward-facing to the human owner. Owns the product disciplines: opportunity assessme
tools
Use when designing, implementing, or auditing an MCP (Model Context Protocol) server — tool API design, idempotency under agent retry, structured error envelopes agents can recover from, schema versioning across model drift, transport reliability (stdio / HTTP), output-shape and pagination discipline, and choosing between tools / resources / prompts / sampling. Also use when an MCP server's tools confuse agents, return unstructured errors, deadlock under concurrent calls, double-execute under retry, or lose state across reconnects. Do not use for general REST/GraphQL API design (use `/web-backend`), for client-side prompt engineering or tool-loop design (use `/llm-specialist`), for general in-process plugin architecture (use `/system-architect`), or for cryptographic-provenance audit trails (use `/audit-pipelines`).
development
Use when running **SQLite or DuckDB inside an application process** as the durable store — not as a development convenience but as the production database. Use when scaling an SQLite layer that worked at low concurrency and is now hitting SQLITE_BUSY, WAL bloat, lock contention, schema-migration ceremony, or correctness gaps under multi-process writers. Use when introducing DuckDB as an OLAP complement to an OLTP SQLite store, or when picking between the two for a new component. Pairs with `/web-backend` (the API surface above the DB) and `/audit-pipelines` (when the DB is also the audit trail). Do not load for server databases (Postgres, MySQL), key-value stores, or ORM choice in isolation.