skills/debugging/SKILL.md
Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
npx skillsauth add raddue/crucible debuggingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.
Every status update must include:
After compaction: Re-read the session state from the scratch directory (see Session State below) and output current status before continuing.
This requirement exists because: Debugging sessions can involve multiple investigation rounds and fix attempts. Without narration, the user has no visibility into which hypotheses have been tried, what evidence was found, or why the orchestrator is pursuing a particular path.
Execution model: The orchestrator dispatches all investigation and implementation to subagents. The orchestrator NEVER reads code, edits files, or runs tests directly. It forms hypotheses, dispatches work, and makes decisions based on subagent reports.
Depth principle: When in doubt, dispatch MORE investigation agents, not fewer. A bug that looks simple from the surface often has a complex root cause. Spinning up 4-6 focused investigators in parallel costs minutes; missing the root cause costs hours.
Write a status file to ~/.claude/projects/<hash>/memory/pipeline-status.md at every narration point. This file is overwritten (not appended) and provides ambient awareness for the user in a second terminal.
Write the status file at every point where the Communication Requirement mandates narration: before dispatch, after completion, phase transitions, health changes, escalations, and after compaction recovery.
The status file uses this structure (overwritten in full each time):
# Pipeline Status
**Updated:** <current timestamp>
**Started:** <timestamp from first write — persisted across compaction>
**Skill:** debugging
**Phase:** <current phase, e.g. "Synthesis", "4 — Implementation">
**Health:** <GREEN|YELLOW|RED>
**Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED>
**Elapsed:** <computed from Started>
## Recent Events
- [HH:MM] <most recent event>
- [HH:MM] <previous event>
(last 5 events, newest first)
Append after the shared header:
## Investigation
- Hypothesis: "Missing null check in event handler dispatch chain"
- Cycle: 2 of 4 max
- Phase 4 fix attempts: 1 (WIP commit pending verification)
## Compression State
Goal: [original user request / bug report]
Key Decisions:
- [accumulated decisions, max 10]
Active Constraints:
- [constraints affecting remaining investigation]
Next Steps:
1. [immediate next action]
2. [subsequent actions]
Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.
When health is YELLOW or RED, include **Suggested Action:** with a concrete, context-specific sentence (e.g., "Third hypothesis attempted. Consider narrowing the search space or providing additional context.").
Output concise inline status alongside the status file write:
Phase 1 [cycle 2] Investigating: null check hypothesis | GREEN | 34m--- separatorsAfter compaction, before re-writing the status file:
0. Read the ## Compression State section from pipeline-status.md — recover Goal, Key Decisions, Active Constraints, and Next Steps. If absent, skip to step 1.
pipeline-status.md to recover Started timestamp and Recent Events bufferSession Index: path, or if globbing ~/.claude/projects/<hash>/memory/session-index/*/summary.md finds a recent file, read summary.md. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. If summary.md lacks detail for a specific event type (e.g., errors, hypothesis changes, file modifications), use /recall to query events.jsonl with filters for targeted recovery.The debugging skill writes session state to disk at every phase transition, not just on failure. This ensures compaction recovery works regardless of when it occurs.
Scratch directory: /tmp/crucible-debug-<session-id>/ where <session-id> is a timestamp generated at the start of the debugging session.
Write at each phase transition:
phase-state.md: current phase, cycle count, current hypothesis (if formed)hypothesis-log.md: running hypothesis log (updated at Phase 3, after Phase 4 results)synthesis-report.md: latest synthesis report (written after Synthesis completes)implementation-details.md: cumulative record of implementation attempts — what was tried, which files changed, regressions encountered, why it failed (appended after each Phase 4)where-else-state.md: Phase 4.5 state — pre-Phase-4.5 SHA, generalized pattern, siblings found/fixed/remaining (written during Phase 4.5, read during compaction recovery)At each phase transition, in addition to writing session state files, emit a Compression State Block into the conversation. The block captures the reasoning layer (goal, decisions, constraints, next steps) that the session state files do not.
Emit a Compression State Block at:
Context hygiene: After synthesis completes, raw Phase 1 investigation reports are superseded by the synthesis report. The orchestrator should rely on the synthesis report going forward, not the raw reports. After Phase 4 completes (success or failure), the Phase 2 pattern analysis report is superseded by the implementation results. This keeps the orchestrator lean across long sessions.
Compaction recovery:
0. Check for handoff manifests (handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs, Decisions, and Constraints to reconstruct state for the current phase — this supersedes session state files for phase-boundary recovery. If no manifest exists, continue with standard recovery.
phase-state.md to determine current phase and cycle.hypothesis-log.md for hypothesis history.synthesis-report.md for latest investigation findings.implementation-details.md for prior fix attempts.where-else-state.md (if exists) for Phase 4.5 progress — which siblings have been fixed, which remain.Cleanup: Delete scratch directory and .pipeline-active marker after debugging completes (Phase 5 passes clean or escalation to user).
At major phase boundaries (0→1, Synthesis→3, 3.5→4, 4.5→5), write a handoff manifest to the scratch directory instead of emitting a Compression State Block. The manifest defines exactly what the next phase needs — an allowlist. Everything not on the manifest is shed.
Format:
# Phase Handoff: N → M
**Timestamp:** ISO-8601
**Goal:** [original bug report, verbatim]
## Inputs for Phase M
- **[Input name]:** [disk path or inline value]
## Decisions Carried Forward
- [DEC-N] [decision]: [reasoning, one line]
## Active Constraints
- [constraint affecting remaining work]
## Shed Receipt
- [what was shed] → [where it lives on disk]
Rules:
## Compression State in pipeline-status.md with manifest contents.NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't dispatched Phase 1 investigation and received findings back, you cannot propose fixes. If you haven't received a synthesis report, you cannot form a hypothesis. If you haven't formed a hypothesis, you cannot dispatch implementation.
Use for ANY technical issue:
Use this ESPECIALLY when:
Don't skip when:
All investigation and implementation is delegated to subagents via the Agent tool. The orchestrator handles hypothesis formation, dispatch decisions, and escalation -- nothing else.
| Phase | Agent | Model | Rationale | |-------|-------|-------|-----------| | Phase 1 | Error Analysis | Opus | Deep code reading and call-chain tracing | | Phase 1 | Change Analysis | Opus | Cross-file diff analysis | | Phase 1 | Evidence Gathering | Opus | Multi-component data flow tracing | | Phase 1 | Reproduction | Opus | Complex reproduction requires reasoning | | Phase 1 | Deep Dive (any) | Opus | Specialized investigation | | Synthesis | Consolidation | Opus | Cross-referencing, contradiction detection, and causal reasoning — not just summarization | | Phase 2 | Pattern Analysis | Opus | Exhaustive comparison requires depth | | Phase 4 | Implementation | Opus | TDD + root cause fix | | Phase 4.5 | "Where Else?" scan | Opus | Cross-codebase pattern matching and sibling fixing | | Phase 5 | Red-team | Opus | Adversarial analysis | | Phase 5 | Code review | Opus or Sonnet | Lead decides by fix complexity | | Phase 5 | Test gap writer | Opus | Test authoring requires reasoning |
Bug reported / test failure / unexpected behavior
|
v
Orchestrator: Parse initial context (error message, failing test, user description)
|
v
Phase 0: Load codebase context (crucible:cartographer-skill)
|
v
Phase 1: Dispatch 3-6 parallel investigation subagents
| +-- Error Analysis agent (always)
| +-- Change Analysis agent (always)
| +-- Evidence Gathering agent (conditional -- multi-component systems)
| +-- Reproduction agent (conditional -- intermittent/unclear bugs)
| +-- Deep Dive agents (conditional -- 1-2 focused on specific subsystems)
|
v
Synthesis agent: Consolidate all Phase 1 findings -> concise root-cause analysis
|
v
Phase 2: Pattern Analysis agent (skipped if synthesis identified obvious root cause)
|
v
Phase 3: Orchestrator forms hypothesis (no subagent -- lightweight decision-making)
|
v
Phase 3.5: Hypothesis Red-Team (crucible:quality-gate on hypothesis)
| -> Survives? Proceed to Phase 4.
| -> Torn apart? Reform hypothesis or loop back to Phase 1.
|
v
Phase 4: Implementation agent (TDD: failing test, fix, verify)
|
v
Orchestrator: Verify fix -> Success? Phase 4.5. Failed? Cleanup, log, loop back.
-> 3 failures? Escalate to user. If checkpoints exist: "Checkpoints available from prior fix cycles. Restore to a known-good state before manual investigation?"
|
v
Phase 4.5: "Where Else?" scan — find and fix sibling locations
|
v
Phase 5: Quality-gate the fix (crucible:quality-gate) + Code review (crucible:temper)
|
v
Test Gap Writer (if reviews flagged missing coverage)
|
v
Done.
Before any dispatch work, check for a crashed prior debugging session:
<scratch>/.pipeline-active (where <scratch> is ~/.claude/projects/<hash>/memory/)pipeline_id set to current session ID, skill set to "debugging", phase set to "0", start_time set to current ISO-8601 timestamp, scratch_dir and dispatch_dir paths, branch from git branch --show-current, baseline_sha from git rev-parse HEAD). Proceed to Phase 0.pipeline_id: Compaction recovery (existing behavior). Do not re-write the marker.pipeline_id: Previous debugging session crashed. Check marker's branch against current branch — if mismatched, warn the user which branch the crashed session was on. Present to user:
"Previous debugging session on branch [marker.branch] crashed. Start fresh? [yes]" Delete the stale marker. Write a fresh marker. Proceed to Phase 0. (Full replay orchestration for debugging is deferred -- detection and cleanup only for now.)
Marker cleanup: Delete .pipeline-active after debugging completes (Phase 5 passes clean or escalation to user), alongside the existing scratch directory cleanup.
Before any investigation dispatch, use crucible:cartographer-skill (load mode) to pull module context for the area being investigated. If module files exist, include them in every investigator's dispatch file so agents start with structural knowledge instead of wasting turns rediscovering the codebase.
Defect signature loading (for investigators):
defect-signatures/*.md (excluding *.non-matches.md) from the cartographer storage directoryModules field and match against the investigation area's modules:
Path: fieldPath: valueModules directory prefixes[DEFECT_SIGNATURES] section of investigator-prompt.md:
Last loaded update: Loading is pure-read. After all investigator dispatches complete, batch-update the Last loaded field to today on all signatures that were loaded.Grudge pre-flight (regression-oracle, #271): Also query the Book of Grudges for the files under investigation and include any matches in the investigators' dispatch files — a past regression on these files is a prime hypothesis. Resolve the helper by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/grudge_query.py" <files under investigation…>. Best-effort: if unresolved, emit a one-line stderr warning and continue — never block the investigation. See skills/grudge/SKILL.md.
If cartographer data doesn't exist for the relevant area, dispatch a quick Explore agent (subagent_type="Explore", model: haiku) to map the relevant directories and note key files. Include its findings in investigator prompts.
Check the project's CLAUDE.md for a ## Debugging Domains table:
| Signal | Domain | Skills | Context |
|--------|--------|--------|---------|
| file paths contain `/UI/`, `USS`, `VisualElement` | ui | mockup-builder, mock-to-unity, ui-verify | docs/mockups/ |
| error mentions `GridWorld`, `Tile`, `hex` | grid | - | grid system architecture |
Signal types: File path patterns (regex against paths in error/stack trace), error message patterns (regex against error text), user description keywords. Evaluate signals in order; load context for all matching domains.
When domain is detected:
When no domain table exists: Proceed normally. Domain detection is opt-in.
When a referenced skill doesn't exist: Log a warning and proceed without domain enrichment. Never fail on missing config.
Before dispatching investigation agents:
handoff-0-to-1.md with:
## Compression State in pipeline-status.md.Prompt template: ./investigator-prompt.md
Dispatch 3-6 investigation subagents in parallel using the Agent tool in a single message. All subagents use subagent_type="general-purpose", model: opus. Pass all known context (error messages, stack traces, file paths, user description, and cartographer module context from Phase 0) verbatim to each agent -- do not make them search for context you already have.
Bias toward MORE agents, not fewer. Each investigator is cheap. Missing a root cause is expensive. When in doubt about whether to dispatch an additional agent, dispatch it.
Always dispatch:
Error Analysis Agent -- Read error messages, stack traces, and logs. Identify the exact failure point, error codes, and what the error is telling us. Trace the call chain backward to the originating bad value.
Change Analysis Agent -- Check recent changes via git diff, recent commits, new dependencies, config changes, and environmental differences. Identify what changed that could cause this.
Conditionally dispatch (lean toward dispatching):
Evidence Gathering Agent -- For multi-component systems (CI pipelines, API chains, layered architectures). Add diagnostic instrumentation at component boundaries. Log what enters and exits each component. Run once, report where the data flow breaks.
Reproduction Agent -- For intermittent, timing-dependent, or unclear bugs. Attempt to reproduce consistently. Document exact steps, frequency, and conditions. If not reproducible, gather more data rather than guessing.
Deep Dive Agent(s) -- For bugs touching multiple subsystems, dispatch 1-2 additional agents each focused on a specific subsystem or code path. Give each a narrow scope: "Investigate how [specific subsystem] handles [specific scenario]." These agents read deeply into a single area rather than scanning broadly.
Dependency/Environment Agent -- For bugs that might be caused by version mismatches, missing registrations, configuration drift, or framework behavior changes. Check DI registrations, package versions, framework release notes, and environment state.
| Bug Characteristics | Agents to Dispatch | |--------------------|--------------------| | Test failure with clear stack trace | Error + Change + Deep Dive (on the failing subsystem) | | Vague "something broke" across multiple systems | All six agent types | | Intermittent / timing-dependent issue | Error + Change + Reproduction + Deep Dive | | Multi-layer system failure (CI, API chain) | Error + Change + Evidence Gathering + Deep Dive per layer | | Performance regression | Error + Change + Evidence Gathering + Deep Dive (hot path) | | "It worked yesterday" | Error + Change + Dependency/Environment | | Framework/library update broke things | Error + Change + Dependency/Environment + Deep Dive |
Every investigation subagent prompt MUST include the context self-monitoring block from ./investigator-prompt.md. Investigators reading large codebases are prime candidates for context exhaustion. If an agent hits 50%+ utilization with significant investigation remaining, it must report partial findings immediately rather than silently degrading.
Prompt template: ./synthesis-prompt.md
After all Phase 1 agents report back, dispatch a single Synthesis agent (model: opus) that receives all Phase 1 reports verbatim.
Trust-but-verify: The synthesis agent does NOT take investigator claims at face value. It cross-references findings between agents, flags contradictions, and identifies claims that lack concrete evidence (file paths, line numbers, stack traces). Speculative findings are downgraded. Concrete artifacts outrank plausible theories.
The Synthesis agent produces:
Skip-ahead rule: If all Phase 1 agents converge on the same root cause with concrete evidence (not just speculation) and the Synthesis agent confirms it as obvious, the orchestrator may skip Phase 2 and proceed directly to Phase 3 (hypothesis formation).
Prompt template: ./pattern-analyst-prompt.md
Dispatch a single Pattern Analysis agent that receives the synthesis report.
The Pattern Analysis agent:
When to skip: The orchestrator skips Phase 2 when the synthesis report identifies an obvious root cause with high confidence (all investigation agents agree, clear evidence chain).
Before the orchestrator forms a hypothesis (whether Phase 2 ran or was skipped):
handoff-synthesis-to-3.md with:
## Compression State in pipeline-status.md.This phase stays local to the orchestrator. No subagent dispatch.
The orchestrator:
Hypothesis discipline:
Maintain a running log across cycles:
## Cycle 1
- Hypothesis: "[specific hypothesis]"
- Based on: [which reports informed this]
- Result: [filled in after Phase 4 completes]
## Cycle 2
- Hypothesis: "[specific hypothesis]"
- Based on: [which reports informed this]
- Result: [filled in after Phase 4 completes]
Before dispatching the Phase 4 implementer, invoke crucible:quality-gate on the hypothesis with artifact type "hypothesis".
The quality gate challenges:
dead_ends)If hypothesis survives: Proceed to Phase 4. If hypothesis is torn apart: Reform the hypothesis or dispatch additional investigation (back to Phase 1) without wasting a full TDD cycle.
Before dispatching the implementation agent:
handoff-3.5-to-4.md with:
## Compression State in pipeline-status.md.RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-debug-fix-cycle-N" (where N is the hypothesis cycle count) before dispatching the implementation agent. If the fix attempt fails or introduces regressions, this checkpoint allows clean rollback without relying on WIP commit revert mechanics.
Prompt template: ./implementer-prompt.md
Dispatch a single Implementation agent that receives:
implementation-details.md in the scratch directory)The Implementation agent follows strict TDD:
Implementation discipline:
crucible:test-driven-development for proper TDD workflow.After Phase 4 completes, if the implementer modified any files, create a WIP commit regardless of outcome:
git commit -m "fix(wip): [hypothesis summary]" # on success
git commit -m "fix(wip-failed): [hypothesis summary]" # on failure or regressions
This gives every outcome path a clean revert target (git revert <sha>), gives Phase 5 code review a real diff, and isolates Phase 5 test modifications from the core fix. If the full pipeline succeeds, the final commit message is amended to drop the (wip) prefix. If Phase 5 requires changes (test audit updates, gap test additions), those are committed as separate follow-up commits.
On loop-back (failed fix or user-requested revert), git revert <wip-sha> cleanly undoes all Phase 4 changes including new files.
If Phase 4.5 ran (sibling commits exist): use git revert <pre-4.5-sha>..HEAD instead of git revert <wip-sha>. This reverts all sibling commits plus the original WIP commit in one operation. See Phase 4.5 below.
Phase 4.5 sibling commits: Each sibling fix uses the prefix fix(sibling): with a descriptive message. Example: fix(sibling): add icon initialization to StashScreen.OnEnable
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-where-else" before dispatching the scan agent. This replaces the need to manually track the pre-Phase-4.5 SHA for revert mechanics — the checkpoint captures the full working directory state.
After Phase 4 succeeds and the WIP commit is created, dispatch the "Where Else?" scan agent to find and fix analogous locations in the codebase that have the same bug pattern. Phase 4.5 does NOT run on loop-back paths (fix failed, regressions found).
Prompt template: ./where-else-prompt.md
Pre-Phase-4.5 SHA: Immediately after the Phase 4 WIP commit succeeds, record its SHA. This is the last commit before any sibling work begins. Store it in where-else-state.md in the scratch directory.
The scan agent receives three sources of information:
git diff <pre-fix-sha>..HEAD. The structural pattern of what changed: what was missing, what was added, and what makes a location "analogous."When dispatching the Phase 4.5 scan agent, load matching defect signatures from cartographer as additional context:
Last loaded: never as oldest. Sort by Last loaded descending. Tiebreak by Date descending.Path: field; match if any target file path starts with the module's Path: value. Fall back to directory prefix matching if no cartographer modules exist.fix(sibling): <description>where-else-state.md in the scratch directory## Where Else? Scan Report
### Generalized Pattern
[2-3 sentence pattern description]
### Candidates Evaluated: N
### Siblings Fixed: N
- [file:path] — [commit SHA] — Justification: [why this matches]
- ...
### Siblings Skipped: N
- [file:path] — Reason: [why this doesn't match]
- ...
### Siblings Reverted: N
- [file:path] — Test failure: [summary of what failed]
- ...
Phase 4.5 maintains state in <scratch-dir>/where-else-state.md to survive session compaction. The file is updated after each sibling fix. On compaction recovery, the agent reads this file to:
After the Phase 4.5 scan agent reports back, persist the scan results as a cartographer defect signature. This is orchestrator-managed — the recorder only writes files.
Skip condition: Do not write a signature when Phase 4.5 reports "No analogous locations found" (0 candidates evaluated). A pattern with no siblings and no non-matches has no evaluation ledger worth persisting.
Step 1: Dedup check (orchestrator)
~/.claude/projects/<hash>/memory/cartographer/defect-signatures/*.md (excluding *.non-matches.md)## Generalized Pattern section only (each is 2-3 sentences; 20 patterns is ~1000-1500 tokens)update_path to the existing file path (the recorder will merge into it rather than creating new)Step 2: Pre-write pruning (orchestrator) When count of existing signatures would exceed 20 after writing:
Last loaded: never AND Date is less than 30 days oldLast loaded date ascending; Last loaded: never sorts oldestDate fieldStep 3: Dispatch recorder (orchestrator)
Dispatch a Sonnet cartographer recorder agent using crucible:cartographer-skill recorder-prompt.md with the "Record defect signature" directive. Provide:
update_path if dedup found a match (from Step 1)Step 4: Post-recorder validation (orchestrator) After the recorder returns, validate:
Modules field contains valid cartographer module names or directory prefixes
If validation fails, log the failure — the signature is not surfaced to consumers.Step 5: Rename on merge (orchestrator)
If update_path was provided (merge case): rename the file to use today's date prefix while keeping the original slug. YYYY-MM-DD-<slug>.md becomes <today>-<slug>.md. If a companion non-match file exists, rename it to match. This ensures merged signatures do not lose age protection.
Ordering: Dispatch the recorder and wait for completion, then perform the Last loaded batch update (see below). This prevents write races between the recorder and the batch update targeting the same file.
Last loaded batch update: After all subagent dispatches for the current phase complete (including the recorder), batch-update the Last loaded field on all defect signatures that were loaded during Phase 0 or Phase 4.5. The recorder sets Last loaded to today on update_path writes, so skip those files during the batch update.
Over-count recovery: If count exceeds 20 after a failed prune (e.g., all signatures are age-protected), the next invocation's pre-recorder pruning pass cleans up before writing.
Before invoking the quality gate on the fix:
handoff-4.5-to-5.md with:
git diff <pre-fix-sha>..HEAD), conventions path, test file paths, Where Else report path (or "no siblings found"), defect signature path (if written)## Compression State in pipeline-status.md.RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-debug-gate" before invoking the quality gate. If gate fix rounds degrade the fix, this is the rollback target.
After Phase 4.5 completes (or Phase 4 succeeds if no Phase 4.5 ran), the orchestrator runs quality gates before declaring done:
Step 1: Quality-gate the fix — Invoke crucible:quality-gate with artifact type "code" against the changed code. Quality-gate dispatches fresh red-team reviewers to adversarially review the fix for:
Quality-gate handles iteration tracking, stagnation detection, compaction recovery, and user checkpoints. Do NOT invoke crucible:red-team directly — always go through quality-gate for iteration management.
Step 2: Code review — After red-teaming passes clean, invoke crucible:temper against the full diff (from before debugging started to HEAD). The code reviewer checks implementation quality, test coverage, and adherence to project conventions.
If code review finds Critical or Important issues, fix them and re-review per the standard code review loop.
Step 2.5: Test suite audit — Invoke crucible:test-coverage (if available) against the changed code and affected test files. This audits whether existing tests need updating, removal, or modification after the fix. Three categories:
If crucible:test-coverage is not available, skip this step. The test gap writer (Step 3) handles missing coverage but NOT stale/misleading existing tests — this step fills that gap.
The test-coverage skill handles its own fix dispatch and revert-on-failure logic internally. It returns a structured report with actions taken.
Step 3: Test gap writer — If the code reviewer or red-teamer identified missing test coverage for the fix, dispatch a Test Gap Writer agent (Opus) using ./test-gap-writer-prompt.md. Input: reviewer gap findings + fix diff + test-coverage audit report (if available from Step 2.5). The agent writes tests only for gaps specifically flagged in the review — no scope creep. Before writing a new test for a flagged gap, verify no existing test already covers this path (it may have been updated by the test-coverage audit). Tests should PASS immediately since the behavior already exists from the fix. The agent reports per-test PASS/FAIL results. Skipped when reviews report zero coverage gaps.
If all tests PASS: Debugging workflow is complete.
Record the grudge (regression-oracle, #271). Once the root cause is confirmed and the fix is verified, record a grudge so this bug can never silently re-ship. Best-effort (a failed record logs to stderr and never fails the workflow): resolve the helper by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/grudge_append.py" --symptom "<observable failure>" --root-cause "<confirmed cause>" --files "<comma-separated files_touched>" --signature "<optional regex/snippet fingerprint>" --commit "$(git rev-parse HEAD)" --repro "<minimal repro>" --why "<why it kept happening>". See skills/grudge/SKILL.md.
If some tests FAIL (gaps reveal incomplete fix coverage):
fix: address test gap failures for debugging fix), debugging workflow is completeIf Phase 5 quality-gate escalates (stagnation or round limit): Present the quality-gate findings to the user alongside the fix. The user decides:
<pre-4.5-sha>..HEAD if Phase 4.5 ran) and loop back to Phase 1 with the quality-gate findings as new investigation contextThis is user-gated, not automatic. The orchestrator does not decide whether to loop back from Phase 5 on its own.
Only after all gates pass clean (and any test gaps are filled or escalated) is the debugging workflow complete.
Throughout the debugging session, the orchestrator appends timestamped entries to /tmp/crucible-metrics-<session-id>.log.
Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from shared/dispatch-convention.md:
input_chars and model_tier in the manifest entry.output_chars and tool_calls (if available) in the manifest completion entry.At completion, read the metrics log and manifest, then compute and report:
-- Debugging Complete ---------------------------------------
Subagents dispatched: 12 (8 Opus, 4 Sonnet)
Active work time: 1h 15m
Wall clock time: 3h 42m
Hypothesis cycles: 3
Quality gate rounds: 2 (hypothesis: 1, fix: 1)
Est. input tokens: ~15,200 (60,800 chars)
Est. output tokens: ~9,800 (39,200 chars)
Token estimate note: Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%.
-------------------------------------------------------------
Additional debugging metric: hypothesis cycles (number of hypothesis → investigate → implement cycles before resolution).
Efficiency summary computation: Read manifest.jsonl from the dispatch directory. Sum input_chars and output_chars across all completed entries (skip nulls). Divide each by 4 for token estimates. Count dispatches grouped by model_tier. Include these in the debugging completion report alongside existing metrics.
Maintain a decision journal at /tmp/crucible-decisions-<session-id>.log:
[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>
Decision types:
investigator-count — why N investigators dispatchedskip-phase-2 — why Phase 2 was skipped (or not)gate-round — hypothesis red-team results per roundescalation — why orchestrator escalatedhypothesis-reform — why hypothesis was reformed after red-teamAfter the Implementation agent reports back, the orchestrator evaluates four possible outcomes:
Fix works, no regressions -- Log the result in the hypothesis log. Proceed to Phase 4.5 ("Where Else?" blast radius scan). After Phase 4.5 completes, proceed to Phase 5. After Phase 5 passes clean:
v=1, ts=now, skill="debugging", outcome from fix verification,
duration_m from session timing, branch from git, files_touched from git diff --name-only,
metrics={hypotheses count, root_cause_category from fix, where_else_hits count}~/.claude/projects/<hash>/memory/chronicle/signals.jsonlTest passes immediately (no fix applied) -- The implementer's reproduction test passed before any fix was written. Two possibilities:
git revert <wip-sha>) to remove the non-reproducing test. Loop back to Phase 3 to reform the hypothesis, or Phase 1 if the root cause itself is in question.Fix works but introduces regressions -- Start a new investigation cycle targeting the regressions. The original fix stays; the regressions are a new bug. Critical: Pass the original bug context (hypothesis, fix applied, original root cause) to the new investigation agents as background context, with the constraint: "The original fix must not be reverted. Investigate why the fix caused regressions and propose an additive solution."
Fix does not resolve the issue -- Before looping back:
git revert <wip-sha> (see Commit Strategy above). This cleanly undoes all Phase 4 changes including any new files created during refactoring. If Phase 4.5 ran (sibling commits exist): use git revert <pre-4.5-sha>..HEAD instead of git revert <wip-sha> to revert all sibling commits plus the original WIP commit.
git checkout <wip-sha> -- <test-file-path>) and commit it separately (test: preserve reproduction test from cycle N).git status and report any remaining modifications or untracked files. If any remain, clean them up before proceeding.Context Preservation: Session state is written to disk at every phase transition (see Session State and Compaction Recovery above). On failed cycles, additionally append implementation details (fix attempted, files changed, regressions, why it failed) to implementation-details.md in the scratch directory. This gives the next Phase 4 implementer actionable context about what was tried at the code level, not just the hypothesis level.
Stagnation ownership: The debugging skill's stagnation detector owns cycle-to-cycle decisions (loop back vs escalate). Quality-gate's stagnation detector owns within-gate decisions (round-to-round within a single Phase 3.5 or Phase 5 invocation). When quality-gate escalates within a gate (e.g., hypothesis keeps getting torn apart), the debugging orchestrator counts that as a failed cycle and updates the hypothesis log accordingly before deciding whether to loop back. Quality-gate history from prior invocations does not carry over — each gate invocation starts fresh.
Track a stagnation metric across cycles — the hypothesis specificity score:
| Metric | What to Track | |--------|--------------| | Root causes identified | How many distinct root causes were surfaced across all investigators | | Evidence strength | How many findings had concrete evidence (file:line, stack trace, git blame) vs speculation | | New information | Did this cycle surface information that was NOT available in prior cycles? |
Stagnation rule: If Cycle N+1 surfaces no new information compared to Cycle N (same root causes, same evidence, same gaps), the orchestrator STOPS and escalates immediately. Do not dispatch Cycle N+2 — the investigation is stuck, not progressing.
Confirm before auto-escalating on judgment-call stagnation. The "no new information" test is unambiguous when both sides are identical (zero new evidence, zero new files touched, identical hypothesis list) — auto-escalate in that case. But when Cycle N+1 surfaced some new content that the orchestrator judges immaterial ("technically a new file but same module", "another instance of the same null check"), surface the side-by-side comparison to the user before escalating:
[debugging] Stagnation candidate — Cycle N vs N+1:
- Cycle N hypotheses: <list>
- Cycle N+1 hypotheses: <list>
- New evidence in N+1: <delta or "none material in my judgment">
Escalate to user, or dispatch Cycle N+2 with a different angle?
The user has context the model lacks about which "new" findings are actually load-bearing for their mental model. Default-on-away: escalate (matches the conservative-on-stuck instinct). The zero-delta auto-escalate path stays silent.
| Cycle | Action | |-------|--------| | 1 | Normal flow — dispatch 3-6 investigators | | 2 | Loop back with learnings. Dispatch MORE agents than Cycle 1. Explicitly exclude paths already ruled out. | | 3 | Final attempt — investigation agents are instructed to look for something fundamentally different from previous hypotheses. Add Deep Dive agents targeting areas not yet investigated. | | 4 | No dispatch. Present the full hypothesis log to the user. Flag as likely architectural problem. Discuss fundamentals before attempting more fixes. |
Stagnation overrides cycle count: If stagnation is detected at any cycle (even Cycle 2), escalate immediately rather than waiting for Cycle 4.
Pattern indicating architectural problem (Cycle 4 or stagnation escalation):
This is NOT a failed hypothesis -- this is a wrong architecture. Discuss with your human partner before attempting more fixes.
| Phase | Agent(s) | Key Activities | Success Criteria | |-------|----------|---------------|------------------| | 0. Context | Cartographer + optional Explore | Load module context for investigators | Codebase context ready for prompts | | 1. Investigation | 3-6 parallel subagents (Opus) | Read errors, check changes, gather evidence, deep dive, reproduce | Raw findings collected | | Synthesis | 1 subagent (Opus) | Consolidate, cross-reference, rank by evidence quality | Concise root-cause analysis | | 2. Pattern | 1 subagent (Opus, skippable) | Find working examples, compare exhaustively | Differences identified | | 3. Hypothesis | Orchestrator (no subagent) | Form hypothesis, check log | Specific testable hypothesis | | 3.5 Red-Team | Quality gate (on hypothesis) | Challenge hypothesis completeness | Hypothesis survives or is reformed | | 4. Implementation | 1 subagent (Opus) | TDD fix cycle with evidence log | Bug resolved, tests pass, TDD log | | 4.5. Where Else? | 1 subagent (Opus) + 1 recorder (Sonnet) | Find and fix sibling locations; persist defect signature | Siblings fixed, signature written (if 1+ candidates) | | 5. Quality Gate | Red-team + code review | Adversarial review, quality check | Both pass clean | | 5b. Test Audit | Test coverage skill (conditional) | Audit existing tests for staleness after fix | Stale tests updated/removed | | 5c. Test Gaps | Test gap writer (Opus, conditional) | Write tests for reviewer-flagged gaps | All gap tests pass |
This skill produces hypotheses (Phase 3.5) and fixes (Phase 5).
When used standalone: Debugging is the outermost orchestrator and MUST invoke quality-gate at Phase 3.5 (on hypotheses) and Phase 5 (on fixes). These gates are non-negotiable regardless of fix size — a "one-liner" fix is not exempt.
When used as a sub-skill: The parent orchestrator is responsible for dispatching gates (per the Invocation Convention: "Skills NEVER self-invoke quality-gate"). If you are unsure whether you are standalone or a sub-skill, invoke the gate — double-gating is preferable to no gating.
The only legitimate skip is at Phase 5 when there is no code change (bug was already resolved). Do not extrapolate from this — it applies only to the specific "no code change" scenario, not to "small" or "trivial" changes.
Gate tracking: Before declaring done, verify that Phase 3.5 (hypothesis gate) and Phase 5 (fix gate, unless legitimately skipped for no-code-change) each show round count >= 1 with clean final rounds. If any gate was skipped with explicit user approval, record it as USER_SKIP.
If you catch yourself thinking:
Orchestrator discipline violations:
Communication violations:
Classic debugging traps (still apply):
Quality gate violations:
Compression State violations:
ALL of these mean: STOP. Return to the correct phase.
If 3+ cycles failed: Escalate to user. Question the architecture. Do not dispatch Cycle 4 agents.
Watch for these redirections:
When you see these: STOP. Return to Phase 1. Dispatch fresh investigation.
| Excuse | Reality | |--------|---------| | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. | | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. | | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. | | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. | | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. | | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. | | "I see the problem, let me fix it" | Seeing symptoms does not equal understanding root cause. | | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. | | "Let me just peek at the code real quick" | Orchestrators dispatch, they don't investigate. Send a subagent. | | "I'll dispatch implementation without a hypothesis" | No hypothesis = no direction. The agent will guess. Form the hypothesis first. |
The rationalizations above are self-originated — you talking yourself out of process. The traps below are user-originated — the user (reasonably, in good faith) providing a diagnosis that tempts you to skip investigation. These are harder to resist because accommodating the user feels helpful.
| User says | You think | Reality | |-----------|-----------|---------| | "I found the bug, here's the fix" | "Their analysis looks right, I'll verify quickly and apply" | User diagnoses are hypotheses, not conclusions. Verify independently — plausible ≠ confirmed. | | "The stack trace points right to it" | "The evidence is clear, skip to fix" | Stack traces show where errors surface, not where they originate. The throw site is often a symptom, not the cause. | | "This is blocking N engineers" | "Speed matters more than process right now" | A wrong fix under time pressure creates two problems: the original bug plus the bad fix. Systematic debugging is faster than guess-and-check, especially under pressure. | | "Same bug as last sprint" | "Apply the same fix pattern" | Same symptom ≠ same cause. The prior fix may have been incomplete, or a different root cause produces identical symptoms. | | "Just add error handling around it" | "That'll prevent the crash at least" | Error handling that silences the symptom is camouflage, not a fix. The root cause persists and will surface elsewhere. | | "It's obviously X, I just need you to fix it" | "The user has more context, trust their judgment" | Trust their observations (symptoms, timeline, reproduction steps). Verify their conclusions (root cause, fix). These are different things. | | "We can investigate later, just patch it now" | "Pragmatism — ship the fix, investigate in a follow-up" | 'Investigate later' means 'never investigate.' The patch becomes the permanent fix. Do it right now. |
If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
But: 95% of "no root cause" cases are incomplete investigation. Dispatch more agents before concluding this.
Prompt templates (used when dispatching subagents):
./investigator-prompt.md -- Phase 1 investigation agent prompt./synthesis-prompt.md -- Synthesis agent prompt./pattern-analyst-prompt.md -- Phase 2 pattern analysis agent prompt./implementer-prompt.md -- Phase 4 implementation agent prompt./where-else-prompt.md -- Phase 4.5 "Where Else?" scan agent prompt./test-gap-writer-prompt.md -- Phase 5 test gap writer prompt (when reviews flag missing coverage)Supporting techniques (available in this directory):
root-cause-tracing.md -- Trace bugs backward through call stack to find original triggerdefense-in-depth.md -- Add validation at multiple layers after finding root causecondition-based-waiting.md -- Replace arbitrary timeouts with condition pollingRelated skills:
crucible:test-driven-development -- Implementation agent follows TDD for Phase 4crucible:verify -- Verify fix worked before claiming successcrucible:parallel -- Phase 1 parallel dispatch patterncrucible:quality-gate -- Adversarial review in Phase 5 (iteration tracking, stagnation detection, compaction recovery)crucible:red-team -- Invoked indirectly via quality-gate (stagnation detection pattern also used in loop-back)crucible:test-coverage -- Phase 5 Step 2.5: audit existing tests for staleness, needed updates, or removal after the fix (if available)Does not dispatch /recon or /assay -- uses specialized investigation agents (Error Analysis, Change Analysis, Evidence Gathering, Reproduction) that are categorically different from structural investigation. Hypothesis evaluation uses quality-gate, not assay. See #147 for rationale.
Required skills:
crucible:cartographer-skill -- Phase 0: load module context for investigators and defect signatures. Phase 4 completion: record discoveries. Phase 4.5 completion: persist defect signature via recorder dispatch.Recommended skills:
crucible:forge -- Retrospective after fix verified (captures debugging lessons)crucible:checkpoint -- Shadow git checkpoints before implementation, sibling fixes, and quality gate (pre-debug-fix-cycle-N, pre-where-else, pre-debug-gate). Provides structured rollback for fix attempts and sibling work.From debugging sessions:
testing
Standalone instance-bug reviewer — runs a parallel finder fan-out + verify gate over a diff or a path and prints ranked, verified findings. Use when the user says "delve", "find bugs in this diff", "review this for bugs", "scan this file/subsystem for defects", "instance-bug sweep", or wants concrete reproducible defects (not a merge verdict, not systemic health). Works on a PR id, a base..head range, or a path, on any forge (GitHub, GitLab, Bitbucket, self-hosted).
testing
Render the Crucible calibration ledger weekly report — the honest "Crucible caught N silent bugs" headline, verdict breakdown, per-skill severity rates, and the inflation detector. Triggers on "/ledger", "weekly report", "weekly ledger", "caught N", "quality ledger", "calibration report", "render the ledger".
development
The Book of Grudges — cross-session bug graveyard. Every fixed bug is recorded as a structured "grudge"; before touching code, skills query the grudgebook for the files in scope and surface past regressions as forced "DO NOT REPEAT" context. Read mode (pre-flight) and write mode (on bug resolution / fix(*) PR). Machine-local, per-repo, never committed. Triggers on /grudge, "check grudges", "record a grudge", "any past bugs here", "regression oracle", "bug graveyard".
testing
Reconcile the Crucible calibration ledger — walk merged fix/hotfix branches to falsify the originating gating-verdicts, compute per-skill Brier calibration scores, and append a falsification log. Triggers on "/calibration-reconcile", "reconcile ledger", "reconcile calibration", "falsify verdicts", "brier score", "calibration reconcile", "compute brier".