Systematic Debugging

Overview

All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.

Violating the letter of this process is violating the spirit of debugging.

Communication Requirement (Non-Negotiable)

Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.

Every status update must include:

Current phase — Which debugging phase you're in
Hypothesis status — Current hypothesis being tested (or "forming hypothesis")
What just completed — What the last agent reported (investigation findings, fix results)
What's being dispatched next — What you're about to do and why
Cycle count — Which hypothesis cycle you're on (cycle 1, cycle 2, etc.)

After compaction: Re-read the session state from the scratch directory (see Session State below) and output current status before continuing.

This requirement exists because: Debugging sessions can involve multiple investigation rounds and fix attempts. Without narration, the user has no visibility into which hypotheses have been tried, what evidence was found, or why the orchestrator is pursuing a particular path.

Execution model: The orchestrator dispatches all investigation and implementation to subagents. The orchestrator NEVER reads code, edits files, or runs tests directly. It forms hypotheses, dispatches work, and makes decisions based on subagent reports.

Depth principle: When in doubt, dispatch MORE investigation agents, not fewer. A bug that looks simple from the surface often has a complex root cause. Spinning up 4-6 focused investigators in parallel costs minutes; missing the root cause costs hours.

Pipeline Status

Write a status file to ~/.claude/projects/<hash>/memory/pipeline-status.md at every narration point. This file is overwritten (not appended) and provides ambient awareness for the user in a second terminal.

Write Triggers

Write the status file at every point where the Communication Requirement mandates narration: before dispatch, after completion, phase transitions, health changes, escalations, and after compaction recovery.

Status File Format

The status file uses this structure (overwritten in full each time):

# Pipeline Status
**Updated:** <current timestamp>
**Started:** <timestamp from first write — persisted across compaction>
**Skill:** debugging
**Phase:** <current phase, e.g. "Synthesis", "4 — Implementation">
**Health:** <GREEN|YELLOW|RED>
**Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED>
**Elapsed:** <computed from Started>

## Recent Events
- [HH:MM] <most recent event>
- [HH:MM] <previous event>
(last 5 events, newest first)

Skill-Specific Body

Append after the shared header:

## Investigation
- Hypothesis: "Missing null check in event handler dispatch chain"
- Cycle: 2 of 4 max
- Phase 4 fix attempts: 1 (WIP commit pending verification)

## Compression State
Goal: [original user request / bug report]
Key Decisions:
- [accumulated decisions, max 10]
Active Constraints:
- [constraints affecting remaining investigation]
Next Steps:
1. [immediate next action]
2. [subsequent actions]

Health State Machine

Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.

Phase boundaries (reset to GREEN): Phase 0->1, 1->Synthesis, Synthesis->2, 2->3, 3->4, 4->4.5, 4.5->5. Sub-phases (3.5, within Phase 3) do NOT reset.
YELLOW: hypothesis cycle 3+, quality gate round 5+, fix retry in progress
RED: escalation pending, stagnation detected, 3 fix failures reached

When health is YELLOW or RED, include **Suggested Action:** with a concrete, context-specific sentence (e.g., "Third hypothesis attempted. Consider narrowing the search space or providing additional context.").

Inline CLI Format

Output concise inline status alongside the status file write:

Minor transitions (dispatch, completion): one-liner, e.g. Phase 1 [cycle 2] Investigating: null check hypothesis | GREEN | 34m
Phase changes and escalations: expanded block with --- separators
Health transitions: always expanded with old -> new health

Compaction Recovery

After compaction, before re-writing the status file: 0. Read the ## Compression State section from pipeline-status.md — recover Goal, Key Decisions, Active Constraints, and Next Steps. If absent, skip to step 1.

Read the rest of pipeline-status.md to recover Started timestamp and Recent Events buffer
Reconstruct phase, health, and skill-specific body from internal state files (see Session State below)
Emit a Compression State Block into the conversation to seed the new context window 3.5. Read session index summary (supplementary): If the CSB Scratch State contains a Session Index: path, or if globbing ~/.claude/projects/<hash>/memory/session-index/*/summary.md finds a recent file, read summary.md. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. If summary.md lacks detail for a specific event type (e.g., errors, hypothesis changes, file modifications), use /recall to query events.jsonl with filters for targeted recovery.
Write the updated status file
Output inline status to CLI

Session State and Compaction Recovery

The debugging skill writes session state to disk at every phase transition, not just on failure. This ensures compaction recovery works regardless of when it occurs.

Scratch directory: /tmp/crucible-debug-<session-id>/ where <session-id> is a timestamp generated at the start of the debugging session.

Write at each phase transition:

phase-state.md: current phase, cycle count, current hypothesis (if formed)
hypothesis-log.md: running hypothesis log (updated at Phase 3, after Phase 4 results)
synthesis-report.md: latest synthesis report (written after Synthesis completes)
implementation-details.md: cumulative record of implementation attempts — what was tried, which files changed, regressions encountered, why it failed (appended after each Phase 4)
where-else-state.md: Phase 4.5 state — pre-Phase-4.5 SHA, generalized pattern, siblings found/fixed/remaining (written during Phase 4.5, read during compaction recovery)

At each phase transition, in addition to writing session state files, emit a Compression State Block into the conversation. The block captures the reasoning layer (goal, decisions, constraints, next steps) that the session state files do not.

Checkpoint Timing

Emit a Compression State Block at:

Phase transitions: 0→1, Synthesis→3, 3.5→4, 4.5→5 — emit a Phase Handoff Manifest (see below) instead of a Compression State Block at these major boundaries. Other transitions (1→Synthesis, Synthesis→2, 2→3, 3→3.5, 4→4.5) continue to use CSBs.
Hypothesis cycles: After each hypothesis is formed or invalidated
Fix attempts: After each Phase 4 implementation attempt completes (success or failure)
Escalations: Before any escalation to user
Health transitions: On any GREEN->YELLOW or YELLOW->RED transition

Context hygiene: After synthesis completes, raw Phase 1 investigation reports are superseded by the synthesis report. The orchestrator should rely on the synthesis report going forward, not the raw reports. After Phase 4 completes (success or failure), the Phase 2 pattern analysis report is superseded by the implementation results. This keeps the orchestrator lean across long sessions.

Compaction recovery: 0. Check for handoff manifests (handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs, Decisions, and Constraints to reconstruct state for the current phase — this supersedes session state files for phase-boundary recovery. If no manifest exists, continue with standard recovery.

Read phase-state.md to determine current phase and cycle.
Read hypothesis-log.md for hypothesis history.
Read synthesis-report.md for latest investigation findings.
Read implementation-details.md for prior fix attempts.
Read where-else-state.md (if exists) for Phase 4.5 progress — which siblings have been fixed, which remain.
Output status to user and continue from the current phase.

Cleanup: Delete scratch directory and .pipeline-active marker after debugging completes (Phase 5 passes clean or escalation to user).

Phase Handoff Manifest

At major phase boundaries (0→1, Synthesis→3, 3.5→4, 4.5→5), write a handoff manifest to the scratch directory instead of emitting a Compression State Block. The manifest defines exactly what the next phase needs — an allowlist. Everything not on the manifest is shed.

Format:

# Phase Handoff: N → M
**Timestamp:** ISO-8601
**Goal:** [original bug report, verbatim]

## Inputs for Phase M
- **[Input name]:** [disk path or inline value]

## Decisions Carried Forward
- [DEC-N] [decision]: [reasoning, one line]

## Active Constraints
- [constraint affecting remaining work]

## Shed Receipt
- [what was shed] → [where it lives on disk]

Rules:

After writing the manifest, emit an explicit shed statement.
After writing the manifest, update ## Compression State in pipeline-status.md with manifest contents.
CSBs continue at non-major-boundary checkpoint triggers (1→Synthesis, Synthesis→2, 2→3, 3→3.5, 4→4.5, hypothesis cycles, fix attempts, escalations, health transitions).
Backward compatibility: If no manifest exists at a recovery point, fall back to CSB-based recovery.

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If you haven't dispatched Phase 1 investigation and received findings back, you cannot propose fixes. If you haven't received a synthesis report, you cannot form a hypothesis. If you haven't formed a hypothesis, you cannot dispatch implementation.

When to Use

Use for ANY technical issue:

Test failures
Bugs in production
Unexpected behavior
Performance problems
Build failures
Integration issues

Use this ESPECIALLY when:

Under time pressure (emergencies make guessing tempting)
"Just one quick fix" seems obvious
You've already tried multiple fixes
Previous fix didn't work
You don't fully understand the issue

Don't skip when:

Issue seems simple (simple bugs have root causes too)
You're in a hurry (rushing guarantees rework)
Manager wants it fixed NOW (systematic is faster than thrashing)

The Orchestrator-Subagent Debugging Workflow

All investigation and implementation is delegated to subagents via the Agent tool. The orchestrator handles hypothesis formation, dispatch decisions, and escalation -- nothing else.

Subagent Model Selection

| Phase | Agent | Model | Rationale | |-------|-------|-------|-----------| | Phase 1 | Error Analysis | Opus | Deep code reading and call-chain tracing | | Phase 1 | Change Analysis | Opus | Cross-file diff analysis | | Phase 1 | Evidence Gathering | Opus | Multi-component data flow tracing | | Phase 1 | Reproduction | Opus | Complex reproduction requires reasoning | | Phase 1 | Deep Dive (any) | Opus | Specialized investigation | | Synthesis | Consolidation | Opus | Cross-referencing, contradiction detection, and causal reasoning — not just summarization | | Phase 2 | Pattern Analysis | Opus | Exhaustive comparison requires depth | | Phase 4 | Implementation | Opus | TDD + root cause fix | | Phase 4.5 | "Where Else?" scan | Opus | Cross-codebase pattern matching and sibling fixing | | Phase 5 | Red-team | Opus | Adversarial analysis | | Phase 5 | Code review | Opus or Sonnet | Lead decides by fix complexity | | Phase 5 | Test gap writer | Opus | Test authoring requires reasoning |

Workflow Overview

Bug reported / test failure / unexpected behavior
    |
    v
Orchestrator: Parse initial context (error message, failing test, user description)
    |
    v
Phase 0: Load codebase context (crucible:cartographer-skill)
    |
    v
Phase 1: Dispatch 3-6 parallel investigation subagents
    |  +-- Error Analysis agent (always)
    |  +-- Change Analysis agent (always)
    |  +-- Evidence Gathering agent (conditional -- multi-component systems)
    |  +-- Reproduction agent (conditional -- intermittent/unclear bugs)
    |  +-- Deep Dive agents (conditional -- 1-2 focused on specific subsystems)
    |
    v
Synthesis agent: Consolidate all Phase 1 findings -> concise root-cause analysis
    |
    v
Phase 2: Pattern Analysis agent (skipped if synthesis identified obvious root cause)
    |
    v
Phase 3: Orchestrator forms hypothesis (no subagent -- lightweight decision-making)
    |
    v
Phase 3.5: Hypothesis Red-Team (crucible:quality-gate on hypothesis)
    |  -> Survives? Proceed to Phase 4.
    |  -> Torn apart? Reform hypothesis or loop back to Phase 1.
    |
    v
Phase 4: Implementation agent (TDD: failing test, fix, verify)
    |
    v
Orchestrator: Verify fix -> Success? Phase 4.5. Failed? Cleanup, log, loop back.
    -> 3 failures? Escalate to user. If checkpoints exist: "Checkpoints available from prior fix cycles. Restore to a known-good state before manual investigation?"
    |
    v
Phase 4.5: "Where Else?" scan — find and fix sibling locations
    |
    v
Phase 5: Quality-gate the fix (crucible:quality-gate) + Code review (crucible:temper)
    |
    v
Test Gap Writer (if reviews flagged missing coverage)
    |
    v
Done.

Phase -1: Pipeline-Active Marker Check

Before any dispatch work, check for a crashed prior debugging session:

Check <scratch>/.pipeline-active (where <scratch> is ~/.claude/projects/<hash>/memory/)
Not found: Write the pipeline-active marker (JSON with pipeline_id set to current session ID, skill set to "debugging", phase set to "0", start_time set to current ISO-8601 timestamp, scratch_dir and dispatch_dir paths, branch from git branch --show-current, baseline_sha from git rev-parse HEAD). Proceed to Phase 0.
Found, same pipeline_id: Compaction recovery (existing behavior). Do not re-write the marker.
Found, different pipeline_id: Previous debugging session crashed. Check marker's branch against current branch — if mismatched, warn the user which branch the crashed session was on. Present to user:

"Previous debugging session on branch [marker.branch] crashed. Start fresh? [yes]" Delete the stale marker. Write a fresh marker. Proceed to Phase 0. (Full replay orchestration for debugging is deferred -- detection and cleanup only for now.)

Marker cleanup: Delete .pipeline-active after debugging completes (Phase 5 passes clean or escalation to user), alongside the existing scratch directory cleanup.

Phase 0: Load Codebase Context

Before any investigation dispatch, use crucible:cartographer-skill (load mode) to pull module context for the area being investigated. If module files exist, include them in every investigator's dispatch file so agents start with structural knowledge instead of wasting turns rediscovering the codebase.

Defect signature loading (for investigators):

Glob defect-signatures/*.md (excluding *.non-matches.md) from the cartographer storage directory
For each signature, read its Modules field and match against the investigation area's modules:
- Read each cartographer module file's Path: field
- A file is in a module if the file path starts with the module's Path: value
- When the investigation spans multiple modules, load signatures for all matched modules
- Directory prefix fallback: When no cartographer modules exist, match if any target file path starts with any of the signature's Modules directory prefixes
For matching signatures, validate all file paths still exist on disk — drop stale entries silently
Inject into the [DEFECT_SIGNATURES] section of investigator-prompt.md:
- Generalized pattern (always)
- Confirmed siblings list (always)
- Unresolved siblings list (always)
- Non-match companion file IS loaded for investigators, truncated to 50 entries at load time with note "(list truncated to 50 most recent entries)"
Last loaded update: Loading is pure-read. After all investigator dispatches complete, batch-update the Last loaded field to today on all signatures that were loaded.

Grudge pre-flight (regression-oracle, #271): Also query the Book of Grudges for the files under investigation and include any matches in the investigators' dispatch files — a past regression on these files is a prime hypothesis. Resolve the helper by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/grudge_query.py" <files under investigation…>. Best-effort: if unresolved, emit a one-line stderr warning and continue — never block the investigation. See skills/grudge/SKILL.md.

If cartographer data doesn't exist for the relevant area, dispatch a quick Explore agent (subagent_type="Explore", model: haiku) to map the relevant directories and note key files. Include its findings in investigator prompts.

Domain Detection

Check the project's CLAUDE.md for a ## Debugging Domains table:

| Signal | Domain | Skills | Context |
|--------|--------|--------|---------|
| file paths contain `/UI/`, `USS`, `VisualElement` | ui | mockup-builder, mock-to-unity, ui-verify | docs/mockups/ |
| error mentions `GridWorld`, `Tile`, `hex` | grid | - | grid system architecture |

Signal types: File path patterns (regex against paths in error/stack trace), error message patterns (regex against error text), user description keywords. Evaluate signals in order; load context for all matching domains.

When domain is detected:

Auto-load referenced skills' SKILL.md into investigator prompts (see Domain Context section in investigator-prompt.md)
Add a domain-specific investigator to Phase 1
Give Phase 4 implementer domain skill context
Load files from the Context column

When no domain table exists: Proceed normally. Domain detection is opt-in.

When a referenced skill doesn't exist: Log a warning and proceed without domain enrichment. Never fail on missing config.

Phase Handoff: 0 → 1

Before dispatching investigation agents:

Write handoff-0-to-1.md with:
- Goal: bug report / user description, verbatim
- Inputs for Phase 1: cartographer module paths, defect signature paths (with match notes), domain context (if any), error messages / stack traces (verbatim), reproduction steps (if any)
- Decisions Carried Forward: (typically empty at this point)
- Active Constraints: user-stated constraints
- Shed Receipt: raw cartographer exploration output → module files on disk
Emit shed statement: "Phase 0 context shed. Module files, defect signatures, and error context captured in manifest. Raw exploration output is not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 1: Investigation (Parallel Subagent Dispatch)

Prompt template: ./investigator-prompt.md

Dispatch 3-6 investigation subagents in parallel using the Agent tool in a single message. All subagents use subagent_type="general-purpose", model: opus. Pass all known context (error messages, stack traces, file paths, user description, and cartographer module context from Phase 0) verbatim to each agent -- do not make them search for context you already have.

Bias toward MORE agents, not fewer. Each investigator is cheap. Missing a root cause is expensive. When in doubt about whether to dispatch an additional agent, dispatch it.

Always dispatch:

Error Analysis Agent -- Read error messages, stack traces, and logs. Identify the exact failure point, error codes, and what the error is telling us. Trace the call chain backward to the originating bad value.
Change Analysis Agent -- Check recent changes via git diff, recent commits, new dependencies, config changes, and environmental differences. Identify what changed that could cause this.

Conditionally dispatch (lean toward dispatching):

Evidence Gathering Agent -- For multi-component systems (CI pipelines, API chains, layered architectures). Add diagnostic instrumentation at component boundaries. Log what enters and exits each component. Run once, report where the data flow breaks.
Reproduction Agent -- For intermittent, timing-dependent, or unclear bugs. Attempt to reproduce consistently. Document exact steps, frequency, and conditions. If not reproducible, gather more data rather than guessing.
Deep Dive Agent(s) -- For bugs touching multiple subsystems, dispatch 1-2 additional agents each focused on a specific subsystem or code path. Give each a narrow scope: "Investigate how [specific subsystem] handles [specific scenario]." These agents read deeply into a single area rather than scanning broadly.
Dependency/Environment Agent -- For bugs that might be caused by version mismatches, missing registrations, configuration drift, or framework behavior changes. Check DI registrations, package versions, framework release notes, and environment state.

Phase 1 Dispatch Heuristics

| Bug Characteristics | Agents to Dispatch | |--------------------|--------------------| | Test failure with clear stack trace | Error + Change + Deep Dive (on the failing subsystem) | | Vague "something broke" across multiple systems | All six agent types | | Intermittent / timing-dependent issue | Error + Change + Reproduction + Deep Dive | | Multi-layer system failure (CI, API chain) | Error + Change + Evidence Gathering + Deep Dive per layer | | Performance regression | Error + Change + Evidence Gathering + Deep Dive (hot path) | | "It worked yesterday" | Error + Change + Dependency/Environment | | Framework/library update broke things | Error + Change + Dependency/Environment + Deep Dive |

Context Self-Monitoring (All Phase 1 Agents)

Every investigation subagent prompt MUST include the context self-monitoring block from ./investigator-prompt.md. Investigators reading large codebases are prime candidates for context exhaustion. If an agent hits 50%+ utilization with significant investigation remaining, it must report partial findings immediately rather than silently degrading.

Synthesis: Consolidate Findings

Prompt template: ./synthesis-prompt.md

After all Phase 1 agents report back, dispatch a single Synthesis agent (model: opus) that receives all Phase 1 reports verbatim.

Trust-but-verify: The synthesis agent does NOT take investigator claims at face value. It cross-references findings between agents, flags contradictions, and identifies claims that lack concrete evidence (file paths, line numbers, stack traces). Speculative findings are downgraded. Concrete artifacts outrank plausible theories.

The Synthesis agent produces:

A 200-400 word root-cause analysis
Ranked list of likely causes (most to least probable), each with evidence strength rating
Cross-references between agent findings (where they agree, where they contradict)
Identified unknowns or gaps in evidence
Recommendation: is the root cause obvious, or is pattern analysis needed?

Skip-ahead rule: If all Phase 1 agents converge on the same root cause with concrete evidence (not just speculation) and the Synthesis agent confirms it as obvious, the orchestrator may skip Phase 2 and proceed directly to Phase 3 (hypothesis formation).

Phase 2: Pattern Analysis (Skippable)

Prompt template: ./pattern-analyst-prompt.md

Dispatch a single Pattern Analysis agent that receives the synthesis report.

The Pattern Analysis agent:

Finds working examples of similar code/patterns in the same codebase
Compares working examples against the broken code exhaustively
Lists every difference, however small -- does not assume "that can't matter"
Identifies dependencies, config, environment, and assumptions
Reports back with specific differences and their likely relevance

When to skip: The orchestrator skips Phase 2 when the synthesis report identifies an obvious root cause with high confidence (all investigation agents agree, clear evidence chain).

Phase Handoff: Synthesis → 3

Before the orchestrator forms a hypothesis (whether Phase 2 ran or was skipped):

Write handoff-synthesis-to-3.md with:
- Goal: original bug report, verbatim
- Inputs for Phase 3: synthesis report path, pattern analysis report path (or "skipped" with reason), hypothesis log path (empty if cycle 1), key file paths surfaced during investigation
- Decisions Carried Forward: skip-phase-2 decision (if applicable), investigator-count decision
- Active Constraints: constraints from investigation findings
- Shed Receipt: raw Phase 1 investigation reports (3-6 agent reports) → synthesis report captures consolidated findings
Emit shed statement: "Investigation context shed. Synthesis report captures consolidated findings. Raw investigator reports are not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 3: Hypothesis Formation (Orchestrator Only -- No Subagent)

This phase stays local to the orchestrator. No subagent dispatch.

The orchestrator:

Reads the synthesis report (and Phase 2 report if it was dispatched)
Forms a single, specific, testable hypothesis: "I think X is the root cause because Y"
Checks the hypothesis log -- do not repeat a hypothesis that already failed
Logs the hypothesis before dispatching Phase 4

Hypothesis discipline:

Be specific, not vague. "The null reference is caused by X not being initialized before Y calls it" -- not "something with initialization."
One hypothesis at a time. Do not bundle multiple theories.
If you cannot form a hypothesis from the reports, dispatch more investigation -- do not guess.

Hypothesis Log Format

Maintain a running log across cycles:

## Cycle 1
- Hypothesis: "[specific hypothesis]"
- Based on: [which reports informed this]
- Result: [filled in after Phase 4 completes]

## Cycle 2
- Hypothesis: "[specific hypothesis]"
- Based on: [which reports informed this]
- Result: [filled in after Phase 4 completes]

Phase 3.5: Hypothesis Red-Team

Before dispatching the Phase 4 implementer, invoke crucible:quality-gate on the hypothesis with artifact type "hypothesis".

The quality gate challenges:

Does the hypothesis explain ALL symptoms, or just some?
Could the root cause be upstream of what the hypothesis targets?
If this hypothesis is correct, what other symptoms should we expect? Do we see them?
Has this pattern been tried and failed before? (check hypothesis log and cartographer landmines for dead_ends)

If hypothesis survives: Proceed to Phase 4. If hypothesis is torn apart: Reform the hypothesis or dispatch additional investigation (back to Phase 1) without wasting a full TDD cycle.

Phase Handoff: 3.5 → 4

Before dispatching the implementation agent:

Write handoff-3.5-to-4.md with:
- Goal: original bug report, verbatim
- Inputs for Phase 4: validated hypothesis (verbatim), target file paths, conventions path, hypothesis log path, implementation details log path (if prior cycles exist)
- Decisions Carried Forward: hypothesis rationale, red-team survival notes
- Active Constraints: constraints on the fix approach
- Shed Receipt: red-team round details, hypothesis formation reasoning → hypothesis log captures outcomes; gate round details are shed
Emit shed statement: "Hypothesis formation context shed. Validated hypothesis, target files, and hypothesis log on disk. Red-team round details are not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 4: Implementation (Single Subagent -- TDD)

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-debug-fix-cycle-N" (where N is the hypothesis cycle count) before dispatching the implementation agent. If the fix attempt fails or introduces regressions, this checkpoint allows clean rollback without relying on WIP commit revert mechanics.

Prompt template: ./implementer-prompt.md

Dispatch a single Implementation agent that receives:

The hypothesis (verbatim)
Relevant file paths identified during investigation
Project conventions and test standards
The hypothesis log (so it knows what hypotheses were already tried)
The implementation details log from prior cycles (so it knows what code-level approaches were tried, which files were changed, and why they failed — see implementation-details.md in the scratch directory)

The Implementation agent follows strict TDD:

Write a failing test that reproduces the bug per the hypothesis
Run the test -- verify it fails for the expected reason
Implement the minimal fix addressing the root cause
Run the test -- verify it passes
Run the broader test suite -- verify no regressions
Report back with a structured Implementation Report

Implementation discipline:

ONE change at a time. No "while I'm here" improvements.
No bundled refactoring.
Fix the root cause, not the symptom.
Uses crucible:test-driven-development for proper TDD workflow.

Commit Strategy

After Phase 4 completes, if the implementer modified any files, create a WIP commit regardless of outcome:

git commit -m "fix(wip): [hypothesis summary]"        # on success
git commit -m "fix(wip-failed): [hypothesis summary]"  # on failure or regressions

This gives every outcome path a clean revert target (git revert <sha>), gives Phase 5 code review a real diff, and isolates Phase 5 test modifications from the core fix. If the full pipeline succeeds, the final commit message is amended to drop the (wip) prefix. If Phase 5 requires changes (test audit updates, gap test additions), those are committed as separate follow-up commits.

On loop-back (failed fix or user-requested revert), git revert <wip-sha> cleanly undoes all Phase 4 changes including new files.

If Phase 4.5 ran (sibling commits exist): use git revert <pre-4.5-sha>..HEAD instead of git revert <wip-sha>. This reverts all sibling commits plus the original WIP commit in one operation. See Phase 4.5 below.

Phase 4.5 sibling commits: Each sibling fix uses the prefix fix(sibling): with a descriptive message. Example: fix(sibling): add icon initialization to StashScreen.OnEnable

Phase 4.5: "Where Else?" Blast Radius Scan

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-where-else" before dispatching the scan agent. This replaces the need to manually track the pre-Phase-4.5 SHA for revert mechanics — the checkpoint captures the full working directory state.

After Phase 4 succeeds and the WIP commit is created, dispatch the "Where Else?" scan agent to find and fix analogous locations in the codebase that have the same bug pattern. Phase 4.5 does NOT run on loop-back paths (fix failed, regressions found).

Prompt template: ./where-else-prompt.md

Pre-Phase-4.5 SHA: Immediately after the Phase 4 WIP commit succeeds, record its SHA. This is the last commit before any sibling work begins. Store it in where-else-state.md in the scratch directory.

Three Input Signals

The scan agent receives three sources of information:

The fix diff — git diff <pre-fix-sha>..HEAD. The structural pattern of what changed: what was missing, what was added, and what makes a location "analogous."
Cartographer module context — If cartographer data is available from Phase 0, structurally similar modules (other screens, panels, managers, handlers that follow the same architectural pattern as the fixed code).
Implementer's "Analogous Locations" report — The Phase 4 implementer's observations about locations they noticed during the fix that may have the same pattern. This is the highest-quality signal — the implementer was already reading the code.

Existing Defect Signatures (Fourth Input Signal)

When dispatching the Phase 4.5 scan agent, load matching defect signatures from cartographer as additional context:

Selection: Load at most 3 matching signatures. Treat Last loaded: never as oldest. Sort by Last loaded descending. Tiebreak by Date descending.
Module matching: Same as build/debug load — read each cartographer module file's Path: field; match if any target file path starts with the module's Path: value. Fall back to directory prefix matching if no cartographer modules exist.
What to load: Signature file (generalized pattern, confirmed siblings, unresolved siblings) plus non-match companion file paths only (without full reason text). The scan agent can read individual companion files on demand if it needs the reasoning.
Guidance to scan agent: Prioritize evaluating candidates NOT listed as confirmed non-matches in existing signatures, but may still evaluate non-match locations if context budget allows. If the new scan confirms the same pattern still exists in a previously-cleared location, note this as a "stale non-match" in the report.

Scan Agent Behavior

Analyze the fix pattern. Read the diff. Extract the structural pattern. Produce a 2-3 sentence generalized pattern description.
Build candidate list. Combine all three input signals, deduplicate candidates.
Evaluate each candidate. Read the code at that location. Determine if the same pattern/omission exists. If yes: write a justification explaining WHY this location matches semantically (not just structurally). If no: record as skipped with reason.
Fix confirmed siblings. For each confirmed sibling:
- Apply the same fix pattern
- Run tests
- If tests pass: commit with fix(sibling): <description>
- If tests fail: revert that single commit, record as "reverted — test failure", continue with remaining candidates
- After each fix (or revert), update where-else-state.md in the scratch directory
Report back. Structured report:

## Where Else? Scan Report

### Generalized Pattern
[2-3 sentence pattern description]

### Candidates Evaluated: N

### Siblings Fixed: N
- [file:path] — [commit SHA] — Justification: [why this matches]
- ...

### Siblings Skipped: N
- [file:path] — Reason: [why this doesn't match]
- ...

### Siblings Reverted: N
- [file:path] — Test failure: [summary of what failed]
- ...

Compaction Recovery

Phase 4.5 maintains state in <scratch-dir>/where-else-state.md to survive session compaction. The file is updated after each sibling fix. On compaction recovery, the agent reads this file to:

Skip siblings already fixed
Resume from where it left off with remaining siblings
Recover the pre-Phase-4.5 SHA (needed for revert mechanics)
Recover the generalized pattern (avoids re-deriving from the diff)

Rules

Apply the SAME fix pattern — do not invent new approaches for siblings
One commit per sibling — clean revert granularity
If tests fail for a sibling, revert and continue — do not debug the sibling
Do not modify the original fix
If no candidates are found, report "No analogous locations found" and proceed to Phase 5
No cap on sibling count — fix all confirmed siblings

Defect Signature Persistence

After the Phase 4.5 scan agent reports back, persist the scan results as a cartographer defect signature. This is orchestrator-managed — the recorder only writes files.

Skip condition: Do not write a signature when Phase 4.5 reports "No analogous locations found" (0 candidates evaluated). A pattern with no siblings and no non-matches has no evaluation ledger worth persisting.

Step 1: Dedup check (orchestrator)

Glob ~/.claude/projects/<hash>/memory/cartographer/defect-signatures/*.md (excluding *.non-matches.md)
Read each file's ## Generalized Pattern section only (each is 2-3 sentences; 20 patterns is ~1000-1500 tokens)
Compare semantically with the new scan report's generalized pattern
Dedup rubric: Two patterns are duplicates if they describe the same root cause in the same codebase area. Different areas or different root causes = distinct signatures. When in doubt, treat as distinct — a near-duplicate is recoverable via pruning, a false merge is not.
If a match is found, set update_path to the existing file path (the recorder will merge into it rather than creating new)
If no match, the recorder creates a new file using the content hash slug

Step 2: Pre-write pruning (orchestrator) When count of existing signatures would exceed 20 after writing:

Build the pruning-eligible set: exclude signatures where Last loaded: never AND Date is less than 30 days old
Sort by Last loaded date ascending; Last loaded: never sorts oldest
Among ties, prune the oldest by Date field
Write-before-delete ordering: dispatch the recorder first, then delete pruned files after the recorder succeeds

Step 3: Dispatch recorder (orchestrator) Dispatch a Sonnet cartographer recorder agent using crucible:cartographer-skill recorder-prompt.md with the "Record defect signature" directive. Provide:

Phase 4.5 scan report (generalized pattern, confirmed siblings, reverted siblings, skipped siblings)
Original fix metadata (file path, commit SHA, commit message summary, issue number)
Cartographer module names from Phase 0 (or directory prefix fallbacks)
update_path if dedup found a match (from Step 1)

Step 4: Post-recorder validation (orchestrator) After the recorder returns, validate:

The signature file exists on disk
Sibling entries are within the 30-entry cap
The Modules field contains valid cartographer module names or directory prefixes If validation fails, log the failure — the signature is not surfaced to consumers.

Step 5: Rename on merge (orchestrator) If update_path was provided (merge case): rename the file to use today's date prefix while keeping the original slug. YYYY-MM-DD-<slug>.md becomes <today>-<slug>.md. If a companion non-match file exists, rename it to match. This ensures merged signatures do not lose age protection.

Ordering: Dispatch the recorder and wait for completion, then perform the Last loaded batch update (see below). This prevents write races between the recorder and the batch update targeting the same file.

Last loaded batch update: After all subagent dispatches for the current phase complete (including the recorder), batch-update the Last loaded field on all defect signatures that were loaded during Phase 0 or Phase 4.5. The recorder sets Last loaded to today on update_path writes, so skip those files during the batch update.

Over-count recovery: If count exceeds 20 after a failed prune (e.g., all signatures are age-protected), the next invocation's pre-recorder pruning pass cleans up before writing.

Phase Handoff: 4.5 → 5

Before invoking the quality gate on the fix:

Write handoff-4.5-to-5.md with:
- Goal: original bug report, verbatim
- Inputs for Phase 5: full diff (git diff <pre-fix-sha>..HEAD), conventions path, test file paths, Where Else report path (or "no siblings found"), defect signature path (if written)
- Decisions Carried Forward: root cause, fix approach, sibling fix decisions
- Active Constraints: constraints on review scope
- Shed Receipt: Phase 4 TDD cycle details, Phase 4.5 candidate evaluation reasoning → fix is in the diff, sibling results in the report
Emit shed statement: "Implementation context shed. Fix diff, test files, and sibling report on disk. TDD cycle details and candidate evaluation reasoning are not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 5: Red-Team and Code Review (Post-Fix Quality Gate)

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-debug-gate" before invoking the quality gate. If gate fix rounds degrade the fix, this is the rollback target.

After Phase 4.5 completes (or Phase 4 succeeds if no Phase 4.5 ran), the orchestrator runs quality gates before declaring done:

Step 1: Quality-gate the fix — Invoke crucible:quality-gate with artifact type "code" against the changed code. Quality-gate dispatches fresh red-team reviewers to adversarially review the fix for:

Edge cases the fix doesn't handle
New failure modes introduced by the fix
Assumptions that could break under different conditions
Regression risks not covered by the test

Quality-gate handles iteration tracking, stagnation detection, compaction recovery, and user checkpoints. Do NOT invoke crucible:red-team directly — always go through quality-gate for iteration management.

Step 2: Code review — After red-teaming passes clean, invoke crucible:temper against the full diff (from before debugging started to HEAD). The code reviewer checks implementation quality, test coverage, and adherence to project conventions.

If code review finds Critical or Important issues, fix them and re-review per the standard code review loop.

Step 2.5: Test suite audit — Invoke crucible:test-coverage (if available) against the changed code and affected test files. This audits whether existing tests need updating, removal, or modification after the fix. Three categories:

Tests to update — assertions, descriptions, or setup now wrong or misleading given the fix (includes stale assertions expecting old values)
Tests to delete — tests for removed code paths
Coincidence tests — tests whose setup exercises changed code but assertions verify unrelated properties (flagged for judgment, not auto-fixed)

If crucible:test-coverage is not available, skip this step. The test gap writer (Step 3) handles missing coverage but NOT stale/misleading existing tests — this step fills that gap.

The test-coverage skill handles its own fix dispatch and revert-on-failure logic internally. It returns a structured report with actions taken.

Step 3: Test gap writer — If the code reviewer or red-teamer identified missing test coverage for the fix, dispatch a Test Gap Writer agent (Opus) using ./test-gap-writer-prompt.md. Input: reviewer gap findings + fix diff + test-coverage audit report (if available from Step 2.5). The agent writes tests only for gaps specifically flagged in the review — no scope creep. Before writing a new test for a flagged gap, verify no existing test already covers this path (it may have been updated by the test-coverage audit). Tests should PASS immediately since the behavior already exists from the fix. The agent reports per-test PASS/FAIL results. Skipped when reviews report zero coverage gaps.

If all tests PASS: Debugging workflow is complete.

Record the grudge (regression-oracle, #271). Once the root cause is confirmed and the fix is verified, record a grudge so this bug can never silently re-ship. Best-effort (a failed record logs to stderr and never fails the workflow): resolve the helper by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/grudge_append.py" --symptom "<observable failure>" --root-cause "<confirmed cause>" --files "<comma-separated files_touched>" --signature "<optional regex/snippet fingerprint>" --commit "$(git rev-parse HEAD)" --repro "<minimal repro>" --why "<why it kept happening>". See skills/grudge/SKILL.md.

If some tests FAIL (gaps reveal incomplete fix coverage):

Dispatch a fresh implementer (Opus) with the failing test(s), their failure messages, gap descriptions, and the original bug context (hypothesis, root cause)
Implementer fixes the incomplete coverage, re-runs ALL test gap writer tests (not just failures — catches regressions from the fix)
If all tests pass after fix: commit (fix: address test gap failures for debugging fix), debugging workflow is complete
If tests still fail after one fix attempt: escalate to user with:
- The original bug and confirmed root cause
- What was fixed in Phase 4
- Which test gaps were detected by reviewers
- What the retry implementer attempted
- Which tests still fail and their current failure messages

If Phase 5 quality-gate escalates (stagnation or round limit): Present the quality-gate findings to the user alongside the fix. The user decides:

(a) Accept the fix with known issues noted
(b) Revert and loop back — revert the WIP commit (or range revert <pre-4.5-sha>..HEAD if Phase 4.5 ran) and loop back to Phase 1 with the quality-gate findings as new investigation context
(c) Stop debugging — end the session with the current state

This is user-gated, not automatic. The orchestrator does not decide whether to loop back from Phase 5 on its own.

Only after all gates pass clean (and any test gaps are filled or escalated) is the debugging workflow complete.

Session Metrics

Throughout the debugging session, the orchestrator appends timestamped entries to /tmp/crucible-metrics-<session-id>.log.

Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from shared/dispatch-convention.md:

Before dispatching: Measure the dispatch file size in characters. Record input_chars and model_tier in the manifest entry.
After dispatch returns: Measure the subagent response length in characters. Record output_chars and tool_calls (if available) in the manifest completion entry.

At completion, read the metrics log and manifest, then compute and report:

-- Debugging Complete ---------------------------------------
  Subagents dispatched:  12 (8 Opus, 4 Sonnet)
  Active work time:      1h 15m
  Wall clock time:       3h 42m
  Hypothesis cycles:     3
  Quality gate rounds:   2 (hypothesis: 1, fix: 1)
  Est. input tokens:    ~15,200 (60,800 chars)
  Est. output tokens:   ~9,800 (39,200 chars)
  Token estimate note:  Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%.
-------------------------------------------------------------

Additional debugging metric: hypothesis cycles (number of hypothesis → investigate → implement cycles before resolution).

Efficiency summary computation: Read manifest.jsonl from the dispatch directory. Sum input_chars and output_chars across all completed entries (skip nulls). Divide each by 4 for token estimates. Count dispatches grouped by model_tier. Include these in the debugging completion report alongside existing metrics.

Pipeline Decision Journal

Maintain a decision journal at /tmp/crucible-decisions-<session-id>.log:

[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>

Decision types:

investigator-count — why N investigators dispatched
skip-phase-2 — why Phase 2 was skipped (or not)
gate-round — hypothesis red-team results per round
escalation — why orchestrator escalated
hypothesis-reform — why hypothesis was reformed after red-team

Loop-back, Cleanup, and Escalation

After the Implementation agent reports back, the orchestrator evaluates four possible outcomes:

Fix works, no regressions -- Log the result in the hypothesis log. Proceed to Phase 4.5 ("Where Else?" blast radius scan). After Phase 4.5 completes, proceed to Phase 5. After Phase 5 passes clean:

RECOMMENDED: Use crucible:forge (retrospective mode) — capture the debugging journey and lessons learned
Chronicle signal fallback: If forge retrospective will not run (user declined, session ending), append a minimal chronicle signal directly:
- Construct signal: v=1, ts=now, skill="debugging", outcome from fix verification, duration_m from session timing, branch from git, files_touched from git diff --name-only, metrics={hypotheses count, root_cause_category from fix, where_else_hits count}
- Append as a single JSON line to ~/.claude/projects/<hash>/memory/chronicle/signals.jsonl
- If forge retrospective WILL run, skip this step (forge Step 8.5 handles it)
RECOMMENDED: Use crucible:cartographer-skill (record mode) — persist any new codebase knowledge discovered during investigation

Test passes immediately (no fix applied) -- The implementer's reproduction test passed before any fix was written. Two possibilities:

Bug was already resolved (by investigation side effects, environment change, or prior cycle). Verify by running the original reproduction steps. If the original bug is gone: proceed to Phase 5 but skip Step 1 (quality-gate on code) since there is no code change. Go directly to Step 2 (code review) scoped to the reproduction test file only.
Test doesn't reproduce the bug (hypothesis was wrong about the reproduction). Log the hypothesis as "wrong — test did not reproduce" in the hypothesis log. Revert the WIP commit (git revert <wip-sha>) to remove the non-reproducing test. Loop back to Phase 3 to reform the hypothesis, or Phase 1 if the root cause itself is in question.

Fix works but introduces regressions -- Start a new investigation cycle targeting the regressions. The original fix stays; the regressions are a new bug. Critical: Pass the original bug context (hypothesis, fix applied, original root cause) to the new investigation agents as background context, with the constraint: "The original fix must not be reverted. Investigate why the fix caused regressions and propose an additive solution."

Fix does not resolve the issue -- Before looping back:

Log the failure in the hypothesis log with metrics (see Stagnation Detection below)
Test triage: Dispatch a quick Opus subagent to read the test and the hypothesis log, then decide: keep the test (if it validly reproduces the bug regardless of the failed fix) or remove it (if it was hypothesis-specific and doesn't reproduce the actual bug). The orchestrator does not make this judgment directly — it requires reading code.
Revert the WIP commit using git revert <wip-sha> (see Commit Strategy above). This cleanly undoes all Phase 4 changes including any new files created during refactoring. If Phase 4.5 ran (sibling commits exist): use git revert <pre-4.5-sha>..HEAD instead of git revert <wip-sha> to revert all sibling commits plus the original WIP commit.
- If triage decided "keep the test": dispatch a subagent to recover the test file from the reverted commit (git checkout <wip-sha> -- <test-file-path>) and commit it separately (test: preserve reproduction test from cycle N).
- If triage decided "remove the test": no further action — the revert already removed it.
Verify the working tree is clean: dispatch the cleanup agent to run git status and report any remaining modifications or untracked files. If any remain, clean them up before proceeding.
Loop back to Phase 1 with the new information from the failed attempt. On loop-back, dispatch MORE agents than the prior cycle, not fewer — widen the investigation.

Context Preservation: Session state is written to disk at every phase transition (see Session State and Compaction Recovery above). On failed cycles, additionally append implementation details (fix attempted, files changed, regressions, why it failed) to implementation-details.md in the scratch directory. This gives the next Phase 4 implementer actionable context about what was tried at the code level, not just the hypothesis level.

Stagnation Detection (from red-team pattern)

Stagnation ownership: The debugging skill's stagnation detector owns cycle-to-cycle decisions (loop back vs escalate). Quality-gate's stagnation detector owns within-gate decisions (round-to-round within a single Phase 3.5 or Phase 5 invocation). When quality-gate escalates within a gate (e.g., hypothesis keeps getting torn apart), the debugging orchestrator counts that as a failed cycle and updates the hypothesis log accordingly before deciding whether to loop back. Quality-gate history from prior invocations does not carry over — each gate invocation starts fresh.

Track a stagnation metric across cycles — the hypothesis specificity score:

| Metric | What to Track | |--------|--------------| | Root causes identified | How many distinct root causes were surfaced across all investigators | | Evidence strength | How many findings had concrete evidence (file:line, stack trace, git blame) vs speculation | | New information | Did this cycle surface information that was NOT available in prior cycles? |

Stagnation rule: If Cycle N+1 surfaces no new information compared to Cycle N (same root causes, same evidence, same gaps), the orchestrator STOPS and escalates immediately. Do not dispatch Cycle N+2 — the investigation is stuck, not progressing.

Confirm before auto-escalating on judgment-call stagnation. The "no new information" test is unambiguous when both sides are identical (zero new evidence, zero new files touched, identical hypothesis list) — auto-escalate in that case. But when Cycle N+1 surfaced some new content that the orchestrator judges immaterial ("technically a new file but same module", "another instance of the same null check"), surface the side-by-side comparison to the user before escalating:

[debugging] Stagnation candidate — Cycle N vs N+1:
  - Cycle N hypotheses: <list>
  - Cycle N+1 hypotheses: <list>
  - New evidence in N+1: <delta or "none material in my judgment">
Escalate to user, or dispatch Cycle N+2 with a different angle?

The user has context the model lacks about which "new" findings are actually load-bearing for their mental model. Default-on-away: escalate (matches the conservative-on-stuck instinct). The zero-delta auto-escalate path stays silent.

Escalation Tiers

| Cycle | Action | |-------|--------| | 1 | Normal flow — dispatch 3-6 investigators | | 2 | Loop back with learnings. Dispatch MORE agents than Cycle 1. Explicitly exclude paths already ruled out. | | 3 | Final attempt — investigation agents are instructed to look for something fundamentally different from previous hypotheses. Add Deep Dive agents targeting areas not yet investigated. | | 4 | No dispatch. Present the full hypothesis log to the user. Flag as likely architectural problem. Discuss fundamentals before attempting more fixes. |

Stagnation overrides cycle count: If stagnation is detected at any cycle (even Cycle 2), escalate immediately rather than waiting for Cycle 4.

Pattern indicating architectural problem (Cycle 4 or stagnation escalation):

Each fix reveals new shared state, coupling, or problems in different places
Fixes require massive refactoring to implement
Each fix creates new symptoms elsewhere
Investigation keeps finding the same root causes but fixes don't resolve them

This is NOT a failed hypothesis -- this is a wrong architecture. Discuss with your human partner before attempting more fixes.

Quick Reference

| Phase | Agent(s) | Key Activities | Success Criteria | |-------|----------|---------------|------------------| | 0. Context | Cartographer + optional Explore | Load module context for investigators | Codebase context ready for prompts | | 1. Investigation | 3-6 parallel subagents (Opus) | Read errors, check changes, gather evidence, deep dive, reproduce | Raw findings collected | | Synthesis | 1 subagent (Opus) | Consolidate, cross-reference, rank by evidence quality | Concise root-cause analysis | | 2. Pattern | 1 subagent (Opus, skippable) | Find working examples, compare exhaustively | Differences identified | | 3. Hypothesis | Orchestrator (no subagent) | Form hypothesis, check log | Specific testable hypothesis | | 3.5 Red-Team | Quality gate (on hypothesis) | Challenge hypothesis completeness | Hypothesis survives or is reformed | | 4. Implementation | 1 subagent (Opus) | TDD fix cycle with evidence log | Bug resolved, tests pass, TDD log | | 4.5. Where Else? | 1 subagent (Opus) + 1 recorder (Sonnet) | Find and fix sibling locations; persist defect signature | Siblings fixed, signature written (if 1+ candidates) | | 5. Quality Gate | Red-team + code review | Adversarial review, quality check | Both pass clean | | 5b. Test Audit | Test coverage skill (conditional) | Audit existing tests for staleness after fix | Stale tests updated/removed | | 5c. Test Gaps | Test gap writer (Opus, conditional) | Write tests for reviewer-flagged gaps | All gap tests pass |

Quality Gate

This skill produces hypotheses (Phase 3.5) and fixes (Phase 5).

When used standalone: Debugging is the outermost orchestrator and MUST invoke quality-gate at Phase 3.5 (on hypotheses) and Phase 5 (on fixes). These gates are non-negotiable regardless of fix size — a "one-liner" fix is not exempt.

When used as a sub-skill: The parent orchestrator is responsible for dispatching gates (per the Invocation Convention: "Skills NEVER self-invoke quality-gate"). If you are unsure whether you are standalone or a sub-skill, invoke the gate — double-gating is preferable to no gating.

The only legitimate skip is at Phase 5 when there is no code change (bug was already resolved). Do not extrapolate from this — it applies only to the specific "no code change" scenario, not to "small" or "trivial" changes.

Gate tracking: Before declaring done, verify that Phase 3.5 (hypothesis gate) and Phase 5 (fix gate, unless legitimately skipped for no-code-change) each show round count >= 1 with clean final rounds. If any gate was skipped with explicit user approval, record it as USER_SKIP.

Red Flags -- STOP and Follow Process

If you catch yourself thinking:

Orchestrator discipline violations:

"Let me just read this one file quickly"
"I'll fix this inline instead of dispatching"
"I already know what's wrong, I'll skip investigation"
"Let me just run the tests myself to check"
"I'll look at the code to confirm before dispatching"

Communication violations:

"Dispatching agents without narrating what you're doing and why"

Classic debugging traps (still apply):

"Quick fix for now, investigate later"
"Just try changing X and see if it works"
"Skip the test, I'll manually verify"
"It's probably X, let me fix that"
"I don't fully understand but this might work"
"Here are the main problems: [lists fixes without investigation]"
Proposing solutions before dispatching Phase 1
Forming hypotheses before receiving synthesis report
"One more fix attempt" (when already at Cycle 3+)
Each fix reveals new problem in different place

Quality gate violations:

"This fix is too small to need a quality gate"
"It's just a one-liner, the gate won't find anything"
Skipping Phase 3.5 or Phase 5 quality gate without explicit user approval
Declaring a quality gate "done" after fixing findings without a clean verification round (fixing is not passing)
Extrapolating from the "no code change" skip to justify skipping on small changes
Interpreting general user feedback as approval to skip a quality gate that has not yet run

Compression State violations:

Skipping Compression State Block emission at checkpoint boundaries
Emitting a Compression State Block at a major phase boundary (0→1, Synthesis→3, 3.5→4, 4.5→5) instead of writing a handoff manifest
Skipping the shed statement after a manifest write
Emitting a Compression State Block with stale or missing Key Decisions (decisions must be cumulative across all prior blocks)
Allowing the Goal field to drift across successive Compression State Blocks (must match original user request)
Exceeding 10 entries in the Key Decisions list without overflow-compressing the oldest
Treating session index summary as authoritative over CSB state (session index is supplementary narrative, CSB is authoritative state)

ALL of these mean: STOP. Return to the correct phase.

If 3+ cycles failed: Escalate to user. Question the architecture. Do not dispatch Cycle 4 agents.

Your Human Partner's Signals You're Doing It Wrong

Watch for these redirections:

"Is that not happening?" - You assumed without dispatching verification
"Will it show us...?" - You should have dispatched evidence gathering
"Stop guessing" - You're proposing fixes without investigation reports
"Ultrathink this" - Question fundamentals, not just symptoms
"We're stuck?" (frustrated) - Your dispatched approach isn't working

When you see these: STOP. Return to Phase 1. Dispatch fresh investigation.

Common Rationalizations

| Excuse | Reality | |--------|---------| | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. | | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. | | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. | | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. | | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. | | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. | | "I see the problem, let me fix it" | Seeing symptoms does not equal understanding root cause. | | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. | | "Let me just peek at the code real quick" | Orchestrators dispatch, they don't investigate. Send a subagent. | | "I'll dispatch implementation without a hypothesis" | No hypothesis = no direction. The agent will guess. Form the hypothesis first. |

User-Provided Diagnosis Traps

The rationalizations above are self-originated — you talking yourself out of process. The traps below are user-originated — the user (reasonably, in good faith) providing a diagnosis that tempts you to skip investigation. These are harder to resist because accommodating the user feels helpful.

| User says | You think | Reality | |-----------|-----------|---------| | "I found the bug, here's the fix" | "Their analysis looks right, I'll verify quickly and apply" | User diagnoses are hypotheses, not conclusions. Verify independently — plausible ≠ confirmed. | | "The stack trace points right to it" | "The evidence is clear, skip to fix" | Stack traces show where errors surface, not where they originate. The throw site is often a symptom, not the cause. | | "This is blocking N engineers" | "Speed matters more than process right now" | A wrong fix under time pressure creates two problems: the original bug plus the bad fix. Systematic debugging is faster than guess-and-check, especially under pressure. | | "Same bug as last sprint" | "Apply the same fix pattern" | Same symptom ≠ same cause. The prior fix may have been incomplete, or a different root cause produces identical symptoms. | | "Just add error handling around it" | "That'll prevent the crash at least" | Error handling that silences the symptom is camouflage, not a fix. The root cause persists and will surface elsewhere. | | "It's obviously X, I just need you to fix it" | "The user has more context, trust their judgment" | Trust their observations (symptoms, timeline, reproduction steps). Verify their conclusions (root cause, fix). These are different things. | | "We can investigate later, just patch it now" | "Pragmatism — ship the fix, investigate in a follow-up" | 'Investigate later' means 'never investigate.' The patch becomes the permanent fix. Do it right now. |

When Process Reveals "No Root Cause"

If systematic investigation reveals issue is truly environmental, timing-dependent, or external:

You've completed the process
Document what you investigated (the hypothesis log serves as this record)
Dispatch an implementation agent to add appropriate handling (retry, timeout, error message)
Add monitoring/logging for future investigation

But: 95% of "no root cause" cases are incomplete investigation. Dispatch more agents before concluding this.

Supporting Techniques and Prompt Templates

Prompt templates (used when dispatching subagents):

./investigator-prompt.md -- Phase 1 investigation agent prompt
./synthesis-prompt.md -- Synthesis agent prompt
./pattern-analyst-prompt.md -- Phase 2 pattern analysis agent prompt
./implementer-prompt.md -- Phase 4 implementation agent prompt
./where-else-prompt.md -- Phase 4.5 "Where Else?" scan agent prompt
./test-gap-writer-prompt.md -- Phase 5 test gap writer prompt (when reviews flag missing coverage)

Supporting techniques (available in this directory):

root-cause-tracing.md -- Trace bugs backward through call stack to find original trigger
defense-in-depth.md -- Add validation at multiple layers after finding root cause
condition-based-waiting.md -- Replace arbitrary timeouts with condition polling

Related skills:

crucible:test-driven-development -- Implementation agent follows TDD for Phase 4
crucible:verify -- Verify fix worked before claiming success
crucible:parallel -- Phase 1 parallel dispatch pattern
crucible:quality-gate -- Adversarial review in Phase 5 (iteration tracking, stagnation detection, compaction recovery)
crucible:red-team -- Invoked indirectly via quality-gate (stagnation detection pattern also used in loop-back)
crucible:test-coverage -- Phase 5 Step 2.5: audit existing tests for staleness, needed updates, or removal after the fix (if available)

Does not dispatch /recon or /assay -- uses specialized investigation agents (Error Analysis, Change Analysis, Evidence Gathering, Reproduction) that are categorically different from structural investigation. Hypothesis evaluation uses quality-gate, not assay. See #147 for rationale.

Required skills:

crucible:cartographer-skill -- Phase 0: load module context for investigators and defect signatures. Phase 4 completion: record discoveries. Phase 4.5 completion: persist defect signature via recorder dispatch.

Recommended skills:

crucible:forge -- Retrospective after fix verified (captures debugging lessons)
crucible:checkpoint -- Shadow git checkpoints before implementation, sibling fixes, and quality gate (pre-debug-fix-cycle-N, pre-where-else, pre-debug-gate). Provides structured rollback for fix attempts and sibling work.

Real-World Impact

From debugging sessions:

Systematic approach: 15-30 minutes to fix
Random fixes approach: 2-3 hours of thrashing
First-time fix rate: 95% vs 40%
New bugs introduced: Near zero vs common

Systematic Debugging

Overview

All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.

Violating the letter of this process is violating the spirit of debugging.

Communication Requirement (Non-Negotiable)

Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.

Every status update must include:

Current phase — Which debugging phase you're in
Hypothesis status — Current hypothesis being tested (or "forming hypothesis")
What just completed — What the last agent reported (investigation findings, fix results)
What's being dispatched next — What you're about to do and why
Cycle count — Which hypothesis cycle you're on (cycle 1, cycle 2, etc.)

After compaction: Re-read the session state from the scratch directory (see Session State below) and output current status before continuing.

Pipeline Status

Write Triggers

Status File Format

The status file uses this structure (overwritten in full each time):

# Pipeline Status
**Updated:** <current timestamp>
**Started:** <timestamp from first write — persisted across compaction>
**Skill:** debugging
**Phase:** <current phase, e.g. "Synthesis", "4 — Implementation">
**Health:** <GREEN|YELLOW|RED>
**Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED>
**Elapsed:** <computed from Started>

## Recent Events
- [HH:MM] <most recent event>
- [HH:MM] <previous event>
(last 5 events, newest first)

Skill-Specific Body

Append after the shared header:

## Investigation
- Hypothesis: "Missing null check in event handler dispatch chain"
- Cycle: 2 of 4 max
- Phase 4 fix attempts: 1 (WIP commit pending verification)

## Compression State
Goal: [original user request / bug report]
Key Decisions:
- [accumulated decisions, max 10]
Active Constraints:
- [constraints affecting remaining investigation]
Next Steps:
1. [immediate next action]
2. [subsequent actions]

Health State Machine

Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.

Phase boundaries (reset to GREEN): Phase 0->1, 1->Synthesis, Synthesis->2, 2->3, 3->4, 4->4.5, 4.5->5. Sub-phases (3.5, within Phase 3) do NOT reset.
YELLOW: hypothesis cycle 3+, quality gate round 5+, fix retry in progress
RED: escalation pending, stagnation detected, 3 fix failures reached

Inline CLI Format

Output concise inline status alongside the status file write:

Minor transitions (dispatch, completion): one-liner, e.g. Phase 1 [cycle 2] Investigating: null check hypothesis | GREEN | 34m
Phase changes and escalations: expanded block with --- separators
Health transitions: always expanded with old -> new health

Compaction Recovery

Read the rest of pipeline-status.md to recover Started timestamp and Recent Events buffer
Reconstruct phase, health, and skill-specific body from internal state files (see Session State below)
Emit a Compression State Block into the conversation to seed the new context window 3.5. Read session index summary (supplementary): If the CSB Scratch State contains a Session Index: path, or if globbing ~/.claude/projects/<hash>/memory/session-index/*/summary.md finds a recent file, read summary.md. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. If summary.md lacks detail for a specific event type (e.g., errors, hypothesis changes, file modifications), use /recall to query events.jsonl with filters for targeted recovery.
Write the updated status file
Output inline status to CLI

Session State and Compaction Recovery

The debugging skill writes session state to disk at every phase transition, not just on failure. This ensures compaction recovery works regardless of when it occurs.

Scratch directory: /tmp/crucible-debug-<session-id>/ where <session-id> is a timestamp generated at the start of the debugging session.

Write at each phase transition:

phase-state.md: current phase, cycle count, current hypothesis (if formed)
hypothesis-log.md: running hypothesis log (updated at Phase 3, after Phase 4 results)
synthesis-report.md: latest synthesis report (written after Synthesis completes)
implementation-details.md: cumulative record of implementation attempts — what was tried, which files changed, regressions encountered, why it failed (appended after each Phase 4)
where-else-state.md: Phase 4.5 state — pre-Phase-4.5 SHA, generalized pattern, siblings found/fixed/remaining (written during Phase 4.5, read during compaction recovery)

Checkpoint Timing

Emit a Compression State Block at:

Phase transitions: 0→1, Synthesis→3, 3.5→4, 4.5→5 — emit a Phase Handoff Manifest (see below) instead of a Compression State Block at these major boundaries. Other transitions (1→Synthesis, Synthesis→2, 2→3, 3→3.5, 4→4.5) continue to use CSBs.
Hypothesis cycles: After each hypothesis is formed or invalidated
Fix attempts: After each Phase 4 implementation attempt completes (success or failure)
Escalations: Before any escalation to user
Health transitions: On any GREEN->YELLOW or YELLOW->RED transition

Read phase-state.md to determine current phase and cycle.
Read hypothesis-log.md for hypothesis history.
Read synthesis-report.md for latest investigation findings.
Read implementation-details.md for prior fix attempts.
Read where-else-state.md (if exists) for Phase 4.5 progress — which siblings have been fixed, which remain.
Output status to user and continue from the current phase.

Cleanup: Delete scratch directory and .pipeline-active marker after debugging completes (Phase 5 passes clean or escalation to user).

Phase Handoff Manifest

Format:

# Phase Handoff: N → M
**Timestamp:** ISO-8601
**Goal:** [original bug report, verbatim]

## Inputs for Phase M
- **[Input name]:** [disk path or inline value]

## Decisions Carried Forward
- [DEC-N] [decision]: [reasoning, one line]

## Active Constraints
- [constraint affecting remaining work]

## Shed Receipt
- [what was shed] → [where it lives on disk]

Rules:

After writing the manifest, emit an explicit shed statement.
After writing the manifest, update ## Compression State in pipeline-status.md with manifest contents.
CSBs continue at non-major-boundary checkpoint triggers (1→Synthesis, Synthesis→2, 2→3, 3→3.5, 4→4.5, hypothesis cycles, fix attempts, escalations, health transitions).
Backward compatibility: If no manifest exists at a recovery point, fall back to CSB-based recovery.

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

When to Use

Use for ANY technical issue:

Test failures
Bugs in production
Unexpected behavior
Performance problems
Build failures
Integration issues

Use this ESPECIALLY when:

Under time pressure (emergencies make guessing tempting)
"Just one quick fix" seems obvious
You've already tried multiple fixes
Previous fix didn't work
You don't fully understand the issue

Don't skip when:

Issue seems simple (simple bugs have root causes too)
You're in a hurry (rushing guarantees rework)
Manager wants it fixed NOW (systematic is faster than thrashing)

The Orchestrator-Subagent Debugging Workflow

All investigation and implementation is delegated to subagents via the Agent tool. The orchestrator handles hypothesis formation, dispatch decisions, and escalation -- nothing else.

Subagent Model Selection

Workflow Overview

Bug reported / test failure / unexpected behavior
    |
    v
Orchestrator: Parse initial context (error message, failing test, user description)
    |
    v
Phase 0: Load codebase context (crucible:cartographer-skill)
    |
    v
Phase 1: Dispatch 3-6 parallel investigation subagents
    |  +-- Error Analysis agent (always)
    |  +-- Change Analysis agent (always)
    |  +-- Evidence Gathering agent (conditional -- multi-component systems)
    |  +-- Reproduction agent (conditional -- intermittent/unclear bugs)
    |  +-- Deep Dive agents (conditional -- 1-2 focused on specific subsystems)
    |
    v
Synthesis agent: Consolidate all Phase 1 findings -> concise root-cause analysis
    |
    v
Phase 2: Pattern Analysis agent (skipped if synthesis identified obvious root cause)
    |
    v
Phase 3: Orchestrator forms hypothesis (no subagent -- lightweight decision-making)
    |
    v
Phase 3.5: Hypothesis Red-Team (crucible:quality-gate on hypothesis)
    |  -> Survives? Proceed to Phase 4.
    |  -> Torn apart? Reform hypothesis or loop back to Phase 1.
    |
    v
Phase 4: Implementation agent (TDD: failing test, fix, verify)
    |
    v
Orchestrator: Verify fix -> Success? Phase 4.5. Failed? Cleanup, log, loop back.
    -> 3 failures? Escalate to user. If checkpoints exist: "Checkpoints available from prior fix cycles. Restore to a known-good state before manual investigation?"
    |
    v
Phase 4.5: "Where Else?" scan — find and fix sibling locations
    |
    v
Phase 5: Quality-gate the fix (crucible:quality-gate) + Code review (crucible:temper)
    |
    v
Test Gap Writer (if reviews flagged missing coverage)
    |
    v
Done.

Phase -1: Pipeline-Active Marker Check

Before any dispatch work, check for a crashed prior debugging session:

Check <scratch>/.pipeline-active (where <scratch> is ~/.claude/projects/<hash>/memory/)
Not found: Write the pipeline-active marker (JSON with pipeline_id set to current session ID, skill set to "debugging", phase set to "0", start_time set to current ISO-8601 timestamp, scratch_dir and dispatch_dir paths, branch from git branch --show-current, baseline_sha from git rev-parse HEAD). Proceed to Phase 0.
Found, same pipeline_id: Compaction recovery (existing behavior). Do not re-write the marker.
Found, different pipeline_id: Previous debugging session crashed. Check marker's branch against current branch — if mismatched, warn the user which branch the crashed session was on. Present to user:

"Previous debugging session on branch [marker.branch] crashed. Start fresh? [yes]" Delete the stale marker. Write a fresh marker. Proceed to Phase 0. (Full replay orchestration for debugging is deferred -- detection and cleanup only for now.)

Marker cleanup: Delete .pipeline-active after debugging completes (Phase 5 passes clean or escalation to user), alongside the existing scratch directory cleanup.

Phase 0: Load Codebase Context

Defect signature loading (for investigators):

Glob defect-signatures/*.md (excluding *.non-matches.md) from the cartographer storage directory
For each signature, read its Modules field and match against the investigation area's modules:
- Read each cartographer module file's Path: field
- A file is in a module if the file path starts with the module's Path: value
- When the investigation spans multiple modules, load signatures for all matched modules
- Directory prefix fallback: When no cartographer modules exist, match if any target file path starts with any of the signature's Modules directory prefixes
For matching signatures, validate all file paths still exist on disk — drop stale entries silently
Inject into the [DEFECT_SIGNATURES] section of investigator-prompt.md:
- Generalized pattern (always)
- Confirmed siblings list (always)
- Unresolved siblings list (always)
- Non-match companion file IS loaded for investigators, truncated to 50 entries at load time with note "(list truncated to 50 most recent entries)"
Last loaded update: Loading is pure-read. After all investigator dispatches complete, batch-update the Last loaded field to today on all signatures that were loaded.

Domain Detection

Check the project's CLAUDE.md for a ## Debugging Domains table:

| Signal | Domain | Skills | Context |
|--------|--------|--------|---------|
| file paths contain `/UI/`, `USS`, `VisualElement` | ui | mockup-builder, mock-to-unity, ui-verify | docs/mockups/ |
| error mentions `GridWorld`, `Tile`, `hex` | grid | - | grid system architecture |

When domain is detected:

Auto-load referenced skills' SKILL.md into investigator prompts (see Domain Context section in investigator-prompt.md)
Add a domain-specific investigator to Phase 1
Give Phase 4 implementer domain skill context
Load files from the Context column

When no domain table exists: Proceed normally. Domain detection is opt-in.

When a referenced skill doesn't exist: Log a warning and proceed without domain enrichment. Never fail on missing config.

Phase Handoff: 0 → 1

Before dispatching investigation agents:

Write handoff-0-to-1.md with:
- Goal: bug report / user description, verbatim
- Inputs for Phase 1: cartographer module paths, defect signature paths (with match notes), domain context (if any), error messages / stack traces (verbatim), reproduction steps (if any)
- Decisions Carried Forward: (typically empty at this point)
- Active Constraints: user-stated constraints
- Shed Receipt: raw cartographer exploration output → module files on disk
Emit shed statement: "Phase 0 context shed. Module files, defect signatures, and error context captured in manifest. Raw exploration output is not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 1: Investigation (Parallel Subagent Dispatch)

Prompt template: ./investigator-prompt.md

Bias toward MORE agents, not fewer. Each investigator is cheap. Missing a root cause is expensive. When in doubt about whether to dispatch an additional agent, dispatch it.

Always dispatch:

Error Analysis Agent -- Read error messages, stack traces, and logs. Identify the exact failure point, error codes, and what the error is telling us. Trace the call chain backward to the originating bad value.
Change Analysis Agent -- Check recent changes via git diff, recent commits, new dependencies, config changes, and environmental differences. Identify what changed that could cause this.

Conditionally dispatch (lean toward dispatching):

Evidence Gathering Agent -- For multi-component systems (CI pipelines, API chains, layered architectures). Add diagnostic instrumentation at component boundaries. Log what enters and exits each component. Run once, report where the data flow breaks.
Reproduction Agent -- For intermittent, timing-dependent, or unclear bugs. Attempt to reproduce consistently. Document exact steps, frequency, and conditions. If not reproducible, gather more data rather than guessing.
Deep Dive Agent(s) -- For bugs touching multiple subsystems, dispatch 1-2 additional agents each focused on a specific subsystem or code path. Give each a narrow scope: "Investigate how [specific subsystem] handles [specific scenario]." These agents read deeply into a single area rather than scanning broadly.
Dependency/Environment Agent -- For bugs that might be caused by version mismatches, missing registrations, configuration drift, or framework behavior changes. Check DI registrations, package versions, framework release notes, and environment state.

Phase 1 Dispatch Heuristics

Context Self-Monitoring (All Phase 1 Agents)

Synthesis: Consolidate Findings

Prompt template: ./synthesis-prompt.md

After all Phase 1 agents report back, dispatch a single Synthesis agent (model: opus) that receives all Phase 1 reports verbatim.

The Synthesis agent produces:

A 200-400 word root-cause analysis
Ranked list of likely causes (most to least probable), each with evidence strength rating
Cross-references between agent findings (where they agree, where they contradict)
Identified unknowns or gaps in evidence
Recommendation: is the root cause obvious, or is pattern analysis needed?

Phase 2: Pattern Analysis (Skippable)

Prompt template: ./pattern-analyst-prompt.md

Dispatch a single Pattern Analysis agent that receives the synthesis report.

The Pattern Analysis agent:

Finds working examples of similar code/patterns in the same codebase
Compares working examples against the broken code exhaustively
Lists every difference, however small -- does not assume "that can't matter"
Identifies dependencies, config, environment, and assumptions
Reports back with specific differences and their likely relevance

When to skip: The orchestrator skips Phase 2 when the synthesis report identifies an obvious root cause with high confidence (all investigation agents agree, clear evidence chain).

Phase Handoff: Synthesis → 3

Before the orchestrator forms a hypothesis (whether Phase 2 ran or was skipped):

Write handoff-synthesis-to-3.md with:
- Goal: original bug report, verbatim
- Inputs for Phase 3: synthesis report path, pattern analysis report path (or "skipped" with reason), hypothesis log path (empty if cycle 1), key file paths surfaced during investigation
- Decisions Carried Forward: skip-phase-2 decision (if applicable), investigator-count decision
- Active Constraints: constraints from investigation findings
- Shed Receipt: raw Phase 1 investigation reports (3-6 agent reports) → synthesis report captures consolidated findings
Emit shed statement: "Investigation context shed. Synthesis report captures consolidated findings. Raw investigator reports are not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 3: Hypothesis Formation (Orchestrator Only -- No Subagent)

This phase stays local to the orchestrator. No subagent dispatch.

The orchestrator:

Reads the synthesis report (and Phase 2 report if it was dispatched)
Forms a single, specific, testable hypothesis: "I think X is the root cause because Y"
Checks the hypothesis log -- do not repeat a hypothesis that already failed
Logs the hypothesis before dispatching Phase 4

Hypothesis discipline:

Be specific, not vague. "The null reference is caused by X not being initialized before Y calls it" -- not "something with initialization."
One hypothesis at a time. Do not bundle multiple theories.
If you cannot form a hypothesis from the reports, dispatch more investigation -- do not guess.

Hypothesis Log Format

Maintain a running log across cycles:

## Cycle 1
- Hypothesis: "[specific hypothesis]"
- Based on: [which reports informed this]
- Result: [filled in after Phase 4 completes]

## Cycle 2
- Hypothesis: "[specific hypothesis]"
- Based on: [which reports informed this]
- Result: [filled in after Phase 4 completes]

Phase 3.5: Hypothesis Red-Team

Before dispatching the Phase 4 implementer, invoke crucible:quality-gate on the hypothesis with artifact type "hypothesis".

The quality gate challenges:

Does the hypothesis explain ALL symptoms, or just some?
Could the root cause be upstream of what the hypothesis targets?
If this hypothesis is correct, what other symptoms should we expect? Do we see them?
Has this pattern been tried and failed before? (check hypothesis log and cartographer landmines for dead_ends)

If hypothesis survives: Proceed to Phase 4. If hypothesis is torn apart: Reform the hypothesis or dispatch additional investigation (back to Phase 1) without wasting a full TDD cycle.

Phase Handoff: 3.5 → 4

Before dispatching the implementation agent:

Write handoff-3.5-to-4.md with:
- Goal: original bug report, verbatim
- Inputs for Phase 4: validated hypothesis (verbatim), target file paths, conventions path, hypothesis log path, implementation details log path (if prior cycles exist)
- Decisions Carried Forward: hypothesis rationale, red-team survival notes
- Active Constraints: constraints on the fix approach
- Shed Receipt: red-team round details, hypothesis formation reasoning → hypothesis log captures outcomes; gate round details are shed
Emit shed statement: "Hypothesis formation context shed. Validated hypothesis, target files, and hypothesis log on disk. Red-team round details are not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 4: Implementation (Single Subagent -- TDD)

Prompt template: ./implementer-prompt.md

Dispatch a single Implementation agent that receives:

The hypothesis (verbatim)
Relevant file paths identified during investigation
Project conventions and test standards
The hypothesis log (so it knows what hypotheses were already tried)
The implementation details log from prior cycles (so it knows what code-level approaches were tried, which files were changed, and why they failed — see implementation-details.md in the scratch directory)

The Implementation agent follows strict TDD:

Write a failing test that reproduces the bug per the hypothesis
Run the test -- verify it fails for the expected reason
Implement the minimal fix addressing the root cause
Run the test -- verify it passes
Run the broader test suite -- verify no regressions
Report back with a structured Implementation Report

Implementation discipline:

ONE change at a time. No "while I'm here" improvements.
No bundled refactoring.
Fix the root cause, not the symptom.
Uses crucible:test-driven-development for proper TDD workflow.

Commit Strategy

After Phase 4 completes, if the implementer modified any files, create a WIP commit regardless of outcome:

git commit -m "fix(wip): [hypothesis summary]"        # on success
git commit -m "fix(wip-failed): [hypothesis summary]"  # on failure or regressions

On loop-back (failed fix or user-requested revert), git revert <wip-sha> cleanly undoes all Phase 4 changes including new files.

Phase 4.5 sibling commits: Each sibling fix uses the prefix fix(sibling): with a descriptive message. Example: fix(sibling): add icon initialization to StashScreen.OnEnable

Phase 4.5: "Where Else?" Blast Radius Scan

Prompt template: ./where-else-prompt.md

Three Input Signals

The scan agent receives three sources of information:

The fix diff — git diff <pre-fix-sha>..HEAD. The structural pattern of what changed: what was missing, what was added, and what makes a location "analogous."
Cartographer module context — If cartographer data is available from Phase 0, structurally similar modules (other screens, panels, managers, handlers that follow the same architectural pattern as the fixed code).
Implementer's "Analogous Locations" report — The Phase 4 implementer's observations about locations they noticed during the fix that may have the same pattern. This is the highest-quality signal — the implementer was already reading the code.

Existing Defect Signatures (Fourth Input Signal)

When dispatching the Phase 4.5 scan agent, load matching defect signatures from cartographer as additional context:

Selection: Load at most 3 matching signatures. Treat Last loaded: never as oldest. Sort by Last loaded descending. Tiebreak by Date descending.
Module matching: Same as build/debug load — read each cartographer module file's Path: field; match if any target file path starts with the module's Path: value. Fall back to directory prefix matching if no cartographer modules exist.
What to load: Signature file (generalized pattern, confirmed siblings, unresolved siblings) plus non-match companion file paths only (without full reason text). The scan agent can read individual companion files on demand if it needs the reasoning.
Guidance to scan agent: Prioritize evaluating candidates NOT listed as confirmed non-matches in existing signatures, but may still evaluate non-match locations if context budget allows. If the new scan confirms the same pattern still exists in a previously-cleared location, note this as a "stale non-match" in the report.

Scan Agent Behavior

Analyze the fix pattern. Read the diff. Extract the structural pattern. Produce a 2-3 sentence generalized pattern description.
Build candidate list. Combine all three input signals, deduplicate candidates.
Evaluate each candidate. Read the code at that location. Determine if the same pattern/omission exists. If yes: write a justification explaining WHY this location matches semantically (not just structurally). If no: record as skipped with reason.
Fix confirmed siblings. For each confirmed sibling:
- Apply the same fix pattern
- Run tests
- If tests pass: commit with fix(sibling): <description>
- If tests fail: revert that single commit, record as "reverted — test failure", continue with remaining candidates
- After each fix (or revert), update where-else-state.md in the scratch directory
Report back. Structured report:

## Where Else? Scan Report

### Generalized Pattern
[2-3 sentence pattern description]

### Candidates Evaluated: N

### Siblings Fixed: N
- [file:path] — [commit SHA] — Justification: [why this matches]
- ...

### Siblings Skipped: N
- [file:path] — Reason: [why this doesn't match]
- ...

### Siblings Reverted: N
- [file:path] — Test failure: [summary of what failed]
- ...

Compaction Recovery

Phase 4.5 maintains state in <scratch-dir>/where-else-state.md to survive session compaction. The file is updated after each sibling fix. On compaction recovery, the agent reads this file to:

Skip siblings already fixed
Resume from where it left off with remaining siblings
Recover the pre-Phase-4.5 SHA (needed for revert mechanics)
Recover the generalized pattern (avoids re-deriving from the diff)

Rules

Apply the SAME fix pattern — do not invent new approaches for siblings
One commit per sibling — clean revert granularity
If tests fail for a sibling, revert and continue — do not debug the sibling
Do not modify the original fix
If no candidates are found, report "No analogous locations found" and proceed to Phase 5
No cap on sibling count — fix all confirmed siblings

Defect Signature Persistence

After the Phase 4.5 scan agent reports back, persist the scan results as a cartographer defect signature. This is orchestrator-managed — the recorder only writes files.

Step 1: Dedup check (orchestrator)

Glob ~/.claude/projects/<hash>/memory/cartographer/defect-signatures/*.md (excluding *.non-matches.md)
Read each file's ## Generalized Pattern section only (each is 2-3 sentences; 20 patterns is ~1000-1500 tokens)
Compare semantically with the new scan report's generalized pattern
Dedup rubric: Two patterns are duplicates if they describe the same root cause in the same codebase area. Different areas or different root causes = distinct signatures. When in doubt, treat as distinct — a near-duplicate is recoverable via pruning, a false merge is not.
If a match is found, set update_path to the existing file path (the recorder will merge into it rather than creating new)
If no match, the recorder creates a new file using the content hash slug

Step 2: Pre-write pruning (orchestrator) When count of existing signatures would exceed 20 after writing:

Build the pruning-eligible set: exclude signatures where Last loaded: never AND Date is less than 30 days old
Sort by Last loaded date ascending; Last loaded: never sorts oldest
Among ties, prune the oldest by Date field
Write-before-delete ordering: dispatch the recorder first, then delete pruned files after the recorder succeeds

Phase 4.5 scan report (generalized pattern, confirmed siblings, reverted siblings, skipped siblings)
Original fix metadata (file path, commit SHA, commit message summary, issue number)
Cartographer module names from Phase 0 (or directory prefix fallbacks)
update_path if dedup found a match (from Step 1)

Step 4: Post-recorder validation (orchestrator) After the recorder returns, validate:

The signature file exists on disk
Sibling entries are within the 30-entry cap
The Modules field contains valid cartographer module names or directory prefixes If validation fails, log the failure — the signature is not surfaced to consumers.

Over-count recovery: If count exceeds 20 after a failed prune (e.g., all signatures are age-protected), the next invocation's pre-recorder pruning pass cleans up before writing.

Phase Handoff: 4.5 → 5

Before invoking the quality gate on the fix:

Write handoff-4.5-to-5.md with:
- Goal: original bug report, verbatim
- Inputs for Phase 5: full diff (git diff <pre-fix-sha>..HEAD), conventions path, test file paths, Where Else report path (or "no siblings found"), defect signature path (if written)
- Decisions Carried Forward: root cause, fix approach, sibling fix decisions
- Active Constraints: constraints on review scope
- Shed Receipt: Phase 4 TDD cycle details, Phase 4.5 candidate evaluation reasoning → fix is in the diff, sibling results in the report
Emit shed statement: "Implementation context shed. Fix diff, test files, and sibling report on disk. TDD cycle details and candidate evaluation reasoning are not carried forward."
Update ## Compression State in pipeline-status.md.
Do NOT emit a Compression State Block.

Phase 5: Red-Team and Code Review (Post-Fix Quality Gate)

After Phase 4.5 completes (or Phase 4 succeeds if no Phase 4.5 ran), the orchestrator runs quality gates before declaring done:

Edge cases the fix doesn't handle
New failure modes introduced by the fix
Assumptions that could break under different conditions
Regression risks not covered by the test

If code review finds Critical or Important issues, fix them and re-review per the standard code review loop.

Tests to update — assertions, descriptions, or setup now wrong or misleading given the fix (includes stale assertions expecting old values)
Tests to delete — tests for removed code paths
Coincidence tests — tests whose setup exercises changed code but assertions verify unrelated properties (flagged for judgment, not auto-fixed)

If crucible:test-coverage is not available, skip this step. The test gap writer (Step 3) handles missing coverage but NOT stale/misleading existing tests — this step fills that gap.

The test-coverage skill handles its own fix dispatch and revert-on-failure logic internally. It returns a structured report with actions taken.

If all tests PASS: Debugging workflow is complete.

If some tests FAIL (gaps reveal incomplete fix coverage):

Dispatch a fresh implementer (Opus) with the failing test(s), their failure messages, gap descriptions, and the original bug context (hypothesis, root cause)
Implementer fixes the incomplete coverage, re-runs ALL test gap writer tests (not just failures — catches regressions from the fix)
If all tests pass after fix: commit (fix: address test gap failures for debugging fix), debugging workflow is complete
If tests still fail after one fix attempt: escalate to user with:
- The original bug and confirmed root cause
- What was fixed in Phase 4
- Which test gaps were detected by reviewers
- What the retry implementer attempted
- Which tests still fail and their current failure messages

If Phase 5 quality-gate escalates (stagnation or round limit): Present the quality-gate findings to the user alongside the fix. The user decides:

(a) Accept the fix with known issues noted
(b) Revert and loop back — revert the WIP commit (or range revert <pre-4.5-sha>..HEAD if Phase 4.5 ran) and loop back to Phase 1 with the quality-gate findings as new investigation context
(c) Stop debugging — end the session with the current state

This is user-gated, not automatic. The orchestrator does not decide whether to loop back from Phase 5 on its own.

Only after all gates pass clean (and any test gaps are filled or escalated) is the debugging workflow complete.

Session Metrics

Throughout the debugging session, the orchestrator appends timestamped entries to /tmp/crucible-metrics-<session-id>.log.

Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from shared/dispatch-convention.md:

Before dispatching: Measure the dispatch file size in characters. Record input_chars and model_tier in the manifest entry.
After dispatch returns: Measure the subagent response length in characters. Record output_chars and tool_calls (if available) in the manifest completion entry.

At completion, read the metrics log and manifest, then compute and report:

-- Debugging Complete ---------------------------------------
  Subagents dispatched:  12 (8 Opus, 4 Sonnet)
  Active work time:      1h 15m
  Wall clock time:       3h 42m
  Hypothesis cycles:     3
  Quality gate rounds:   2 (hypothesis: 1, fix: 1)
  Est. input tokens:    ~15,200 (60,800 chars)
  Est. output tokens:   ~9,800 (39,200 chars)
  Token estimate note:  Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%.
-------------------------------------------------------------

Additional debugging metric: hypothesis cycles (number of hypothesis → investigate → implement cycles before resolution).

Pipeline Decision Journal

Maintain a decision journal at /tmp/crucible-decisions-<session-id>.log:

[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>

Decision types:

investigator-count — why N investigators dispatched
skip-phase-2 — why Phase 2 was skipped (or not)
gate-round — hypothesis red-team results per round
escalation — why orchestrator escalated
hypothesis-reform — why hypothesis was reformed after red-team

Loop-back, Cleanup, and Escalation

After the Implementation agent reports back, the orchestrator evaluates four possible outcomes:

RECOMMENDED: Use crucible:forge (retrospective mode) — capture the debugging journey and lessons learned
Chronicle signal fallback: If forge retrospective will not run (user declined, session ending), append a minimal chronicle signal directly:
- Construct signal: v=1, ts=now, skill="debugging", outcome from fix verification, duration_m from session timing, branch from git, files_touched from git diff --name-only, metrics={hypotheses count, root_cause_category from fix, where_else_hits count}
- Append as a single JSON line to ~/.claude/projects/<hash>/memory/chronicle/signals.jsonl
- If forge retrospective WILL run, skip this step (forge Step 8.5 handles it)
RECOMMENDED: Use crucible:cartographer-skill (record mode) — persist any new codebase knowledge discovered during investigation

Test passes immediately (no fix applied) -- The implementer's reproduction test passed before any fix was written. Two possibilities:

Bug was already resolved (by investigation side effects, environment change, or prior cycle). Verify by running the original reproduction steps. If the original bug is gone: proceed to Phase 5 but skip Step 1 (quality-gate on code) since there is no code change. Go directly to Step 2 (code review) scoped to the reproduction test file only.
Test doesn't reproduce the bug (hypothesis was wrong about the reproduction). Log the hypothesis as "wrong — test did not reproduce" in the hypothesis log. Revert the WIP commit (git revert <wip-sha>) to remove the non-reproducing test. Loop back to Phase 3 to reform the hypothesis, or Phase 1 if the root cause itself is in question.

Fix does not resolve the issue -- Before looping back:

Log the failure in the hypothesis log with metrics (see Stagnation Detection below)
Test triage: Dispatch a quick Opus subagent to read the test and the hypothesis log, then decide: keep the test (if it validly reproduces the bug regardless of the failed fix) or remove it (if it was hypothesis-specific and doesn't reproduce the actual bug). The orchestrator does not make this judgment directly — it requires reading code.
Revert the WIP commit using git revert <wip-sha> (see Commit Strategy above). This cleanly undoes all Phase 4 changes including any new files created during refactoring. If Phase 4.5 ran (sibling commits exist): use git revert <pre-4.5-sha>..HEAD instead of git revert <wip-sha> to revert all sibling commits plus the original WIP commit.
- If triage decided "keep the test": dispatch a subagent to recover the test file from the reverted commit (git checkout <wip-sha> -- <test-file-path>) and commit it separately (test: preserve reproduction test from cycle N).
- If triage decided "remove the test": no further action — the revert already removed it.
Verify the working tree is clean: dispatch the cleanup agent to run git status and report any remaining modifications or untracked files. If any remain, clean them up before proceeding.
Loop back to Phase 1 with the new information from the failed attempt. On loop-back, dispatch MORE agents than the prior cycle, not fewer — widen the investigation.

Stagnation Detection (from red-team pattern)

Track a stagnation metric across cycles — the hypothesis specificity score:

[debugging] Stagnation candidate — Cycle N vs N+1:
  - Cycle N hypotheses: <list>
  - Cycle N+1 hypotheses: <list>
  - New evidence in N+1: <delta or "none material in my judgment">
Escalate to user, or dispatch Cycle N+2 with a different angle?

Escalation Tiers

Stagnation overrides cycle count: If stagnation is detected at any cycle (even Cycle 2), escalate immediately rather than waiting for Cycle 4.

Pattern indicating architectural problem (Cycle 4 or stagnation escalation):

Each fix reveals new shared state, coupling, or problems in different places
Fixes require massive refactoring to implement
Each fix creates new symptoms elsewhere
Investigation keeps finding the same root causes but fixes don't resolve them

This is NOT a failed hypothesis -- this is a wrong architecture. Discuss with your human partner before attempting more fixes.

Quick Reference

Quality Gate

This skill produces hypotheses (Phase 3.5) and fixes (Phase 5).

Red Flags -- STOP and Follow Process

If you catch yourself thinking:

Orchestrator discipline violations:

"Let me just read this one file quickly"
"I'll fix this inline instead of dispatching"
"I already know what's wrong, I'll skip investigation"
"Let me just run the tests myself to check"
"I'll look at the code to confirm before dispatching"

Communication violations:

"Dispatching agents without narrating what you're doing and why"

Classic debugging traps (still apply):

"Quick fix for now, investigate later"
"Just try changing X and see if it works"
"Skip the test, I'll manually verify"
"It's probably X, let me fix that"
"I don't fully understand but this might work"
"Here are the main problems: [lists fixes without investigation]"
Proposing solutions before dispatching Phase 1
Forming hypotheses before receiving synthesis report
"One more fix attempt" (when already at Cycle 3+)
Each fix reveals new problem in different place

Quality gate violations:

"This fix is too small to need a quality gate"
"It's just a one-liner, the gate won't find anything"
Skipping Phase 3.5 or Phase 5 quality gate without explicit user approval
Declaring a quality gate "done" after fixing findings without a clean verification round (fixing is not passing)
Extrapolating from the "no code change" skip to justify skipping on small changes
Interpreting general user feedback as approval to skip a quality gate that has not yet run

Compression State violations:

Skipping Compression State Block emission at checkpoint boundaries
Emitting a Compression State Block at a major phase boundary (0→1, Synthesis→3, 3.5→4, 4.5→5) instead of writing a handoff manifest
Skipping the shed statement after a manifest write
Emitting a Compression State Block with stale or missing Key Decisions (decisions must be cumulative across all prior blocks)
Allowing the Goal field to drift across successive Compression State Blocks (must match original user request)
Exceeding 10 entries in the Key Decisions list without overflow-compressing the oldest
Treating session index summary as authoritative over CSB state (session index is supplementary narrative, CSB is authoritative state)

ALL of these mean: STOP. Return to the correct phase.

If 3+ cycles failed: Escalate to user. Question the architecture. Do not dispatch Cycle 4 agents.

Your Human Partner's Signals You're Doing It Wrong

Watch for these redirections:

"Is that not happening?" - You assumed without dispatching verification
"Will it show us...?" - You should have dispatched evidence gathering
"Stop guessing" - You're proposing fixes without investigation reports
"Ultrathink this" - Question fundamentals, not just symptoms
"We're stuck?" (frustrated) - Your dispatched approach isn't working

When you see these: STOP. Return to Phase 1. Dispatch fresh investigation.

Common Rationalizations

User-Provided Diagnosis Traps

When Process Reveals "No Root Cause"

If systematic investigation reveals issue is truly environmental, timing-dependent, or external:

You've completed the process
Document what you investigated (the hypothesis log serves as this record)
Dispatch an implementation agent to add appropriate handling (retry, timeout, error message)
Add monitoring/logging for future investigation

But: 95% of "no root cause" cases are incomplete investigation. Dispatch more agents before concluding this.

Supporting Techniques and Prompt Templates

Prompt templates (used when dispatching subagents):

./investigator-prompt.md -- Phase 1 investigation agent prompt
./synthesis-prompt.md -- Synthesis agent prompt
./pattern-analyst-prompt.md -- Phase 2 pattern analysis agent prompt
./implementer-prompt.md -- Phase 4 implementation agent prompt
./where-else-prompt.md -- Phase 4.5 "Where Else?" scan agent prompt
./test-gap-writer-prompt.md -- Phase 5 test gap writer prompt (when reviews flag missing coverage)

Supporting techniques (available in this directory):

root-cause-tracing.md -- Trace bugs backward through call stack to find original trigger
defense-in-depth.md -- Add validation at multiple layers after finding root cause
condition-based-waiting.md -- Replace arbitrary timeouts with condition polling

Related skills:

crucible:test-driven-development -- Implementation agent follows TDD for Phase 4
crucible:verify -- Verify fix worked before claiming success
crucible:parallel -- Phase 1 parallel dispatch pattern
crucible:quality-gate -- Adversarial review in Phase 5 (iteration tracking, stagnation detection, compaction recovery)
crucible:red-team -- Invoked indirectly via quality-gate (stagnation detection pattern also used in loop-back)
crucible:test-coverage -- Phase 5 Step 2.5: audit existing tests for staleness, needed updates, or removal after the fix (if available)

Required skills:

crucible:cartographer-skill -- Phase 0: load module context for investigators and defect signatures. Phase 4 completion: record discoveries. Phase 4.5 completion: persist defect signature via recorder dispatch.

Recommended skills:

crucible:forge -- Retrospective after fix verified (captures debugging lessons)
crucible:checkpoint -- Shadow git checkpoints before implementation, sibling fixes, and quality gate (pre-debug-fix-cycle-N, pre-where-else, pre-debug-gate). Provides structured rollback for fix attempts and sibling work.

Real-World Impact

From debugging sessions:

Systematic approach: 15-30 minutes to fix
Random fixes approach: 2-3 hours of thrashing
First-time fix rate: 95% vs 40%
New bugs introduced: Near zero vs common

Adoption

raddue/debugging

$ install --global

Security Scan Results

SKILL.md

Systematic Debugging

Overview

Communication Requirement (Non-Negotiable)

Pipeline Status

Write Triggers

Status File Format

Skill-Specific Body

Health State Machine

Inline CLI Format

Compaction Recovery

Session State and Compaction Recovery

Checkpoint Timing

Phase Handoff Manifest

The Iron Law

When to Use

The Orchestrator-Subagent Debugging Workflow

Subagent Model Selection

Workflow Overview

Phase -1: Pipeline-Active Marker Check

Phase 0: Load Codebase Context

Domain Detection

Phase Handoff: 0 → 1

Phase 1: Investigation (Parallel Subagent Dispatch)

Phase 1 Dispatch Heuristics

Context Self-Monitoring (All Phase 1 Agents)

Synthesis: Consolidate Findings

Phase 2: Pattern Analysis (Skippable)

Phase Handoff: Synthesis → 3

Phase 3: Hypothesis Formation (Orchestrator Only -- No Subagent)

Hypothesis Log Format

Phase 3.5: Hypothesis Red-Team

Phase Handoff: 3.5 → 4

Phase 4: Implementation (Single Subagent -- TDD)

Commit Strategy

Phase 4.5: "Where Else?" Blast Radius Scan

Three Input Signals

Existing Defect Signatures (Fourth Input Signal)

Scan Agent Behavior

Compaction Recovery

Rules

Defect Signature Persistence

Phase Handoff: 4.5 → 5

Phase 5: Red-Team and Code Review (Post-Fix Quality Gate)

Session Metrics

Pipeline Decision Journal

Loop-back, Cleanup, and Escalation

Stagnation Detection (from red-team pattern)

Escalation Tiers

Quick Reference

Quality Gate

Red Flags -- STOP and Follow Process

Your Human Partner's Signals You're Doing It Wrong

Common Rationalizations

User-Provided Diagnosis Traps

When Process Reveals "No Root Cause"

Supporting Techniques and Prompt Templates

Real-World Impact

Related Skills

raddue/delve

raddue/ledger

raddue/grudge

raddue/calibration-reconcile

raddue/debugging

$ install --global

Security Scan Results

SKILL.md

Systematic Debugging

Overview

Communication Requirement (Non-Negotiable)

Pipeline Status

Write Triggers

Status File Format

Skill-Specific Body

Health State Machine

Inline CLI Format