skills/ledger/SKILL.md
Render the Crucible calibration ledger weekly report — the honest "Crucible caught N silent bugs" headline, verdict breakdown, per-skill severity rates, and the inflation detector. Triggers on "/ledger", "weekly report", "weekly ledger", "caught N", "quality ledger", "calibration report", "render the ledger".
npx skillsauth add raddue/crucible ledgerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Renders docs/ledger/weekly-YYYY-Www.md from the central ledger
~/.claude/crucible/ledger/runs.jsonl (override CRUCIBLE_LEDGER_DIR; plus
falsification.jsonl beside it for the cross-link). The SKILL.md is a thin
prompt wrapper; the source of truth is the testable Python core at
scripts/render_ledger.py. Reports are written to docs/ledger/ relative to
your cwd — run from the crucible repo to update the committed reports.
Skill type: Utility — direct execution, no subagent dispatch.
Manual /ledger [--weeks N] (default N=1, the most recent ISO week).
Invoke the renderer directly. Resolve scripts/render_ledger.py by absolute
path from the plugin root (it self-locates its own modules via __file__, so
no PYTHONPATH is needed and it runs from any cwd):
python3 <plugin_root>/scripts/render_ledger.py --weeks N
That command:
~/.claude/crucible/ledger/runs.jsonl (the
--ledger default; CRUCIBLE_LEDGER_DIR overrides). First-time-ever: if
the file is MISSING, print "no ledger data yet" and exit 0 — do NOT write an
empty report.YYYY-Www).N weeks, writes docs/ledger/weekly-YYYY-Www.md.load_runs): skips blank / malformed / partial-trailing
lines, dedups defensively by (run_id, skill) latest-position-wins.caught_count): counts entries with
would_have_shipped_without_gate == true, EXCLUDING backfilled == true.
Backfilled entries carry severity_histogram: null ⇒ WHS null (the
mechanical L-3 rule) and are reported in a SEPARATE "Backfilled historical
context" section — never in the headline (L-5). This is what test T-5 asserts.significant_rate / fatal_rate (week_summary): computed
from forward-captured entries only (backfilled == false AND
severity_histogram != null). Raw rates are printed from week 1.inflation_alert): alerts when a skill's
significant_rate or fatal_rate exceeds 3× its 4-week rolling median.
Silent for a skill until 4 weeks of forward data exist (the v1 bootstrap
— with no forward history yet, the detector is silent, but raw rates still
print so a human can eyeball drift).falsified_count): the all-time count of falsified
verdicts via the L-9 latest-entry-wins reduction over
falsification.jsonl (scripts.ledger_reduce.reduce, the canonical helper
cited above). Graceful degradation: if falsification.jsonl is absent
(Phase 4 reconciler not built yet), the reduction returns {} ⇒ count 0;
the renderer never crashes./ledger run of a calendar month,
the report appends an advisory checklist prompting a spot-check of 5 random
would_have_shipped_without_gate: true entries against
skills/shared/severity-rubric.md.Design §4 calls for "findings with commit citations". The v1 schema has no
commit field (artifact_hash is null for backfill). So:
backfill-<PR>-quality-gate) cite PR #<PR>,
extracted from the deterministic run_id.run_id) have no commit in v1 ⇒ the citation is
omitted gracefully. No SHAs are invented. A future schema rev capturing the
gating commit can replace this with a real SHA.The headline is would_have_shipped_without_gate == true minus backfilled.
Backfilled entries seed the corpus but never inflate the caught-N number; they
appear only in the separate historical-context section. Inflation drift is
defended structurally (the 3× detector) and culturally (the monthly spot-check).
testing
Standalone instance-bug reviewer — runs a parallel finder fan-out + verify gate over a diff or a path and prints ranked, verified findings. Use when the user says "delve", "find bugs in this diff", "review this for bugs", "scan this file/subsystem for defects", "instance-bug sweep", or wants concrete reproducible defects (not a merge verdict, not systemic health). Works on a PR id, a base..head range, or a path, on any forge (GitHub, GitLab, Bitbucket, self-hosted).
development
The Book of Grudges — cross-session bug graveyard. Every fixed bug is recorded as a structured "grudge"; before touching code, skills query the grudgebook for the files in scope and surface past regressions as forced "DO NOT REPEAT" context. Read mode (pre-flight) and write mode (on bug resolution / fix(*) PR). Machine-local, per-repo, never committed. Triggers on /grudge, "check grudges", "record a grudge", "any past bugs here", "regression oracle", "bug graveyard".
testing
Reconcile the Crucible calibration ledger — walk merged fix/hotfix branches to falsify the originating gating-verdicts, compute per-skill Brier calibration scores, and append a falsification log. Triggers on "/calibration-reconcile", "reconcile ledger", "reconcile calibration", "falsify verdicts", "brier score", "calibration reconcile", "compute brier".
tools
Live-dispatch phase of temper eval harness. Reads stage-manifest.json from a pre-staged dispatch dir; fans Task-tool reviewer dispatches in parallel (max 6); writes per-seq result files; exits. Single bounded session. Pairs with `python -m skills.temper.evals.run_evals stage` and `score`.