Build

Overview

All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.

All subagent returns use the Ledger Return Protocol. Every subagent returns exactly one Evidence Receipt per shared/return-convention.md; the orchestrator applies the two-tier receipt linter (Tier 1 structural + Tier 2 witness verification — full grammar in the shared convention) to every Task return before acting on the declared VERDICT. A lint failure is treated as structurally BLOCKED.

The orchestrator maintains a per-run Invariant Cairn at ~/.claude/projects/<project-hash>/memory/cairn/cairn-<run-id>.md per shared/cairn-convention.md. See the ## Cairn (Layer 3) section below for build-specific phase definitions, terminal condition, and mandatory-invariant categories.

End-to-end development pipeline: interactive design, autonomous planning with adversarial review, team-based execution with per-task code and test review. One command, idea to completion.

Announce at start: "I'm using the build skill to run the full development pipeline."

Session index event: At startup, if session indexing is active (session index path discoverable via glob), emit a skill_start event to the outbox: {"ts":"<now>","seq":0,"type":"skill_start","summary":"Starting /build for <user goal>","detail":{"skill":"build","goal":"<user goal>"}}. See skills/shared/session-index-convention.md for the outbox pattern.

Guiding principle: Quality over velocity. This pipeline produces correct, well-integrated, maintainable output — even if slower. Parallel execution is available for independent work, but sequential with quality gates is the default.

Mock Dispatch Mode (eval-gate)

This mode exists for the skills/build/evals/ eval-gate harness. It is enabled iff CRUCIBLE_BUILD_EVAL_MOCK_DIR is set in the environment. Production runs MUST leave this variable unset, in which case this mode is a no-op and the orchestrator behaves exactly as if this section were not present.

Env-var contract. Three variables, all consumed only when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set:

CRUCIBLE_BUILD_EVAL_MOCK_DIR=<path> — directory of canned subagent return receipts. Filenames follow <seq>-<template-name>.md with fallback <template-name>.md (e.g. 1-plan-writer.md, then plan-writer.md). Missing mock → halt immediately with a clear error; no silent fallthrough.
CRUCIBLE_BUILD_EVAL_MODE=feature|refactor — pre-set answer to the Mode Detection prompt. When present, the orchestrator skips the AskUserQuestion call in Mode Detection and uses this value.
CRUCIBLE_BUILD_EVAL_USER_INPUT_DIR=<path> — directory of canned user-input turns, named turn-<N>.md. Each AskUserQuestion call (other than Mode Detection, which uses CRUCIBLE_BUILD_EVAL_MODE) consumes the next sequential turn. If the next turn-file is missing, halt before proceeding — this is the b4 fixture's design: build correctly stops when it needs input it does not have.

Substitution rule (defined ONCE here; referenced from intercept sites below): at every dispatch site, the dispatch file is STILL written to the normal dispatch dir (trace integrity preserved). Only the Task/Agent tool invocation is replaced — instead of invoking the tool, read $CRUCIBLE_BUILD_EVAL_MOCK_DIR/<seq>-<template-name>.md (or <template-name>.md) and treat its contents as the subagent's return receipt. Apply the normal receipt linter and manifest sweep as if the receipt had come from a live agent.

Boundary behavior. MockNotFound / MockUserInputMissing errors raised by the mock loader halt the build run with a clear stderr message. They do not silently fall through. The eval-gate harness detects these halts via on-disk artifacts (absent phase-handoff manifests, pipeline-active marker still at the original phase) — the harness does NOT catch these exceptions across the build-runtime boundary.

Pointer reminders. Sections that reference these env vars (Mode Detection, Phase 1 Step 2 / 2.5 / 3, Phase 2 Step 1 / 2, Phase 3 Step 3) contain short inline pointers back to this section. The substitution rule is defined here only; the pointers exist so a reader scanning a dispatch site doesn't need to re-derive the contract.

See also: skills/build/evals/README.md for harness usage; skills/shared/dispatch-convention.md for the dispatch-file protocol the substitution rule preserves.

Cairn (Layer 3)

The orchestrator maintains an Invariant Cairn per shared/cairn-convention.md. Build-specific bindings:

Phase mapping. The build pipeline's four phases (1 Design, 2 Plan, 3 Execute, 4 Completion) map 1:1 to cairn phases. Mid-phase sub-stages (e.g. Phase 3 Wave N, Phase 4 gate rounds) do NOT get their own cairn phase counter — they are internal to the owning phase and contribute a single LEDGER line when the owning phase completes.
Phase transitions. At every 1→2, 2→3, 3→4 transition, the orchestrator: (a) writes any correctness-critical phase-N invariants, (b) appends the phase-N LEDGER line with dispatches=/receipts=/verdict=, (c) single atomic Write advancing PHASE to phase N+1. Uses the phase handoff manifest (handoff-N-to-M.md) as input evidence for the invariants.
Terminal phase. Phase 4 sealing — after finish-skill completes and the pipeline-active marker is deleted. At terminal sealing, delete active-run.md; leave cairn-<run-id>.md in place.
Mandatory-invariant categories for build. Each phase-exit MUST capture:
- Design exit: the one-sentence architectural commitment, plus any RED-flag constraint surfaced by red-team that later phases must preserve.
- Plan exit: the task list's load-bearing dependencies (e.g. "Task 3 unblocks Tasks 5-7; T2 review tier"); any non-obvious refactoring risk.
- Execute exit: every noticed-not-touching entry that is correctness-critical for a future task or for post-merge review; every test-gap finding that the run chose to leave uncovered.
- Completion exit: acceptance-test outcome; siege dispatch decision + outcome; any skipped gate with Acknowledged: true.
Reconciliation on phase entry. Runs the full Reconciliation Pass (5 rules) against receipt-ledger.jsonl and the in-context Tripwire Manifest. Rule 1 local-repair is authorized for trailing-receipts-in-current-phase LEDGER under-count only.
Composition with Phase Handoff Manifest. The cairn and the existing Phase Handoff Manifest overlap in intent but not in scope: the handoff manifest is a per-transition snapshot of inputs for the next phase; the cairn is the cumulative load-bearing state across the whole run. Both are maintained; neither replaces the other. On Recovery Protocol invocation, the orchestrator reads the cairn first (authoritative for load-bearing state) then the most recent handoff manifest (authoritative for current-phase inputs).

Tripwire Manifest Sweep (Layer 2)

Starting with convention v1.1, every subagent returns a receipt carrying TRIPWIRE:, SUPERSEDES:, and (if the subagent dispatched children) TRIPWIRE-CHILD: lines. The full grammar and predicate vocabulary live in shared/return-convention.md. This section defines how the orchestrator uses them.

Manifest: After each Task return (post-lint), append one line to the in-context manifest:

<rcpt-sha256-prefix-12>  <skill>/<dispatch-id>  <verdict>  TRIPWIRE: <predicates>  [SUPERSEDED_BY=<prefix>]  [keys=<skill>:<k>:<v>,…]  [files=<path>:<h6>,…]

Extract keys= and files= discriminators at insertion time (severity-max, *-count CLAIM keys namespaced by skill; EDIT/WROTE paths with first 6 hex of post-edit hash). Truncate each list at 8; overflow becomes more=<N> and forces mandatory fire on peer-dispatch-disagrees.

Sweep (the dispatch-loop clause): The orchestrator MAY NOT dispatch the next subagent until it has:

Applied Layer 1 two-tier linter to the just-returned receipt. Lint failure → re-dispatch, DO NOT sweep.
Appended the manifest entry.
Processed SUPERSEDES: — marked each cited predecessor SUPERSEDED_BY=<new-prefix>.
Evaluated self-checks (verdict=FAIL, exec-exit!=0, suspicion>=N-self) on the new receipt — no Read needed.
Evaluated forward-checks against every active (not SUPERSEDED_BY=*) prior manifest entry, over the union of that entry's TRIPWIRE and TRIPWIRE-CHILD predicate sets:
- claims-touch(glob) / wrote(glob) / read(glob) — path-glob match against the new receipt's TRACE or CLAIMS citations.
- suspicion>=N — new receipt's SUSPICION ≥ N.
- peer-dispatch-disagrees(<dim>) — same-skill, same-target, discriminator mismatch (evaluated via manifest keys=/files=; more= overflow → mandatory fire).
- always — fires unconditionally.
For each firing predicate on manifest entry M, Read M's full receipt from disk and narrate the re-read: "tripwire <predicate> on <M-prefix> fired from <new-prefix>; re-read M."
Only then dispatch the next subagent.

Supersession fix-flow. A fix-agent dispatched after a FAIL receipt normally supersedes that FAIL. Its receipt MUST cite the FAIL's hash-prefix in SUPERSEDES: and in at least one CLAIM from=<prefix>#…, AND its WITNESS must be kind ∈ {exec, grep} with ran=TRACE#N (not SKIPPED/UNRUNNABLE). Tier-2 then verifies the witness — supersession only survives if the original failure no longer reproduces.

Mandatory-work declarations for build's subagent types (add to each dispatch template's ## Return Format section):

Implementer (feature): run-tests, apply-edits.
Implementer (refactor, atomic): run-blast-radius-tests, apply-edits.
Reviewer (code / test): read-artifact, emit-findings.
Cleanup agent: read-diff, emit-recommendation.
Plan writer / plan reviewer: read-design, emit-artifact.
Acceptance-test writer / test-gap writer / adversarial tester: run-tests, emit-tests.

Communication Requirement (Non-Negotiable)

Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.

Every status update must include:

Current phase — Which pipeline phase you're in
What just completed — What the last agent reported
What's being dispatched next — What you're about to do and why
Task checklist — Current status of all tasks (pending/in-progress/complete)

After compaction: If you just experienced context compaction, re-read the task list from disk and output current status before continuing. Do NOT proceed silently.

Examples of GOOD narration:

"Phase 3, Task 4 complete. Reviewer found 2 Important issues — dispatching implementer to fix. Tasks: [1] ✓ [2] ✓ [3] ✓ [4] fixing [5-8] pending"

"Phase 2 complete. Plan passed review with 0 issues on round 2. Dispatching innovate on the plan."

This requirement exists because: Long-running autonomous pipelines can run for hours. Without narration, the user sees nothing but a spinner. They can't assess progress, can't decide whether to intervene, and can't learn from the pipeline's decisions.

Pipeline Discipline (Non-Negotiable)

NEVER skip quality gate steps. Every artifact must pass its quality gate before proceeding to the next phase. No exceptions, no shortcuts.

BLOCK semantics: Phase transitions are gated. You CANNOT proceed from Phase 1→2, 2→3, or 3→4 without the gate for that phase passing. If a gate fails, fix the issues and re-run the gate. Do not silently skip a gate because "it looks fine" or "we already reviewed it."

If you find yourself about to skip a gate: STOP. Re-read this section. The gate exists because skipping it has caused real production incidents and hours of wasted time. Run the gate.

Anti-Rationalization Table — build

| Rationalization | Rebuttal | Rule | |---|---|---| | "This task is small/simple/trivial, the quality gate would just find nits." | Small changes have the same bug density per line as large ones. QG has never run on a Crucible artifact without finding at least one real issue. | Run the quality gate on every phase artifact, regardless of size. | | "Phase N looks fine, I can skip the gate and move on." | Self-assessment of artifact quality is exactly the bias the gate exists to counter. "Looks fine" is the failure mode, not a pass criterion. | Phase transitions are BLOCKED without a verified PASS verdict marker for the prior phase. | | "The fix agent addressed the findings, so the gate is done." | Fixing is not passing. Fix rounds routinely introduce new issues or incompletely resolve old ones. A clean verification round is required. | The gate is only complete after a fresh red-team round returns 0 Fatal, 0 Significant. | | "The user said 'looks good' / 'move on' — that's approval to skip the gate." | General feedback is not skip approval. Only an unambiguous instruction that explicitly references the gate counts. | Require literal SKIP GATE (or equivalent explicit phrase) before recording Status: SKIPPED. | | "I can fix this one finding myself instead of dispatching a fix agent." | Orchestrator-applied fixes conflate coordination with remediation and bypass the fix journal. Every fix — even trivial — goes through a fix agent. | Orchestrator never edits the artifact directly; always dispatch the fix agent. | | "Innovate/red-team seem redundant on top of the quality gate, I'll skip them." | They are not redundant. Innovate is divergent; red-team is adversarial; QG is iterative remediation. Skipping any one of them is a documented regression (feedback_never_skip_gates). | Run innovate and red-team on every artifact, every time. | | "I'll just finish the task list and narrate at the end." | Long-running autonomous pipelines are invisible without narration. Silent runs prevent the user from intervening or learning. | Narrate before every dispatch and after every completion — non-negotiable. |

Gate Ledger Protocol

Tamper-evident audit trail for phase transitions and gate verdicts. This is defense-in-depth — it raises the cost of gate-skipping from zero to nonzero by requiring structured state to be maintained and verified. An external enforcement hook (gate-ledger-guard.sh) provides mechanical enforcement by blocking unauthorized PASS writes.

File location: ~/.claude/projects/<project-hash>/memory/build-gate-ledger.md

Relationship to pipeline-status.md: pipeline-status.md is ambient user awareness (overwritten at every narration point). build-gate-ledger.md is the gate verdict audit trail (updated per phase as gates pass). Both are needed; neither replaces the other.

PipelineID Generation

At pipeline start, generate a PipelineID via date -u +build-%Y%m%d-%H%M%S. This ID:

Is persisted in the ledger header
Is passed to quality-gate invocations as pipeline_id
Is used by the enforcement hook to cross-check verdict markers
Is unique per build run (timestamp-based)

Ledger Format

# Build Gate Ledger
Run: <ISO-8601 timestamp>
PipelineID: <build-YYYYMMDD-HHMMSS>
Goal: <user request>
Mode: <feature | refactor>

## Phase 1: Design
Status: NOT_STARTED

## Phase 2: Plan
Status: NOT_STARTED

## Phase 3: Execute
Status: NOT_STARTED

## Phase 4: Completion
Status: NOT_STARTED

Format constraints:

One key-value pair per line: Key: value
Fixed key names: Status, Gate, Artifact, Tasks, Reason, Acknowledged, PipelineID
Status values: NOT_STARTED, IN_PROGRESS, PASS, COMPLETE, FAIL, SKIPPED, INFERRED
Phase headers are ## Phase N: Name — always 4 phases, always in order
No prose, no paragraphs, no nested structure

Ledger Initialization

Runs during build startup, after mode detection but before Phase 1 begins:

Check for existing ledger at canonical path
If found: run Run Isolation checks (see below)
If not found (or user chose "start fresh"): write new ledger including Run, PipelineID, Goal, and Mode header fields, then all four phases with Status: NOT_STARTED
The ledger MUST exist before Phase 1 transitions to IN_PROGRESS

After writing any ledger (fresh or reconstructed), immediately re-read the ledger header to extract the PipelineID into the active in-memory state. This is a defensive consistency practice — ensures the in-memory value always matches the persisted value.

Run Isolation

Stale detection prevents cross-run contamination:

Compaction recovery (same run): If pipeline-status.md Started timestamp matches the ledger's Run timestamp, this is the same build run recovering from compaction. Auto-resume without prompting.
New session with existing ledger: If the ledger exists but pipeline-status.md is missing or its Started timestamp doesn't match the ledger's Run, prompt: "Found existing ledger for '[goal]' (started [timestamp], Phase N [status]). Resume this run? [y/n]". On "no", archive the old ledger via Bash mv to build-gate-ledger-<old-timestamp>.md. If the target filename already exists, append a counter suffix (-2, -3, etc.).
No existing ledger: Create fresh.

Orphan Cleanup

Requires: Active PipelineID established (from Ledger Initialization + Run Isolation). This step runs AFTER the resume/fresh decision is resolved.

Scan ~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md for verdict markers. Delete any whose PipelineID does not match the active PipelineID. If resuming: use the resumed build's PipelineID (from the existing ledger). If starting fresh: use the newly generated PipelineID.

Note: If the session is recovering via INFERRED reconstruction (new PipelineID generated), markers from the old run will be cleaned up. This is intentional — the design requires a fresh QG run for INFERRED→PASS upgrade, not reuse of old markers.

Timestamps and File Operations

Timestamps: Obtained via Bash date -u +%Y-%m-%dT%H:%M:%S (Bash is allowed for date commands that don't reference .claude/ paths)
Ledger archival (rename): Uses Bash mv since Write/Read/Edit/Glob have no rename capability
All other ledger operations (create, read, update): MUST use Write and Read tools, NOT Bash. This is a hard constraint due to .claude/ path restrictions.

Enforcement Rules

Before each phase transition, read build-gate-ledger.md and check the previous phase's status:

Gate check: If the previous phase's Status is NOT in {PASS, COMPLETE (Phase 3 only), SKIPPED with Acknowledged: true}, output:
```
PHASE GATE BLOCKED: Cannot start Phase N — Phase N-1 gate has not passed.
Current state: [status]
Run the quality gate on Phase N-1's artifact before proceeding.
```
This means INFERRED, IN_PROGRESS, FAIL, and NOT_STARTED all trigger BLOCKED.
Phase 1 exception: Phase 1 (Design) has no predecessor gate — it always starts.
Phase 3 exception: Phase 3 transitions to COMPLETE (not PASS) when all tasks are done and per-task code reviews pass. COMPLETE satisfies the gate requirement for Phase 4. No verdict marker is required for Phase 3.

Verdict Marker Verification

After quality-gate returns with a verdict, verify the verdict marker before writing to the ledger:

Glob for verdict markers: ~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md
Filter by PipelineID match — only markers with the current build's PipelineID
Sort by the Timestamp field value inside the marker file (parsed as ISO-8601), take the most recent
Verify: marker exists, Verdict is PASS, PipelineID matches current build's PipelineID
If verification passes: write PASS to the ledger with Gate timestamp and Artifact path
If verification fails:
- Normal flow (marker missing/mismatched after a just-run gate): do NOT write PASS. Output warning and re-invoke quality-gate on the same artifact.
- INFERRED recovery (PipelineID mismatch or missing marker on an INFERRED phase): prompt the user for the artifact path, then offer to run the gate or type SKIP GATE.
After writing the ledger entry, delete the verdict marker (it has served its purpose). This applies to all verdict outcomes — PASS, FAIL, STAGNATION, and ESCALATED markers are all deleted after the corresponding ledger entry is written. [PLAN ADDITION — extends the design doc's PASS-only deletion to all verdict outcomes for cleanliness.]

Skip Escape Hatch

If the user explicitly wants to bypass a gate:

Example of a SKIPPED phase in the ledger:

## Phase 2: Plan
Status: SKIPPED
Gate: 2026-04-13T15:00:00
Reason: User requested skip
Acknowledged: true

Confirmation protocol: [Default: option (a) — separate-turn required, matching the design doc's two-step flow. User may override to option (b) before implementation.]

The orchestrator outputs: "Gate skip requested. Type SKIP GATE to confirm. This will be logged."
The orchestrator halts execution and waits. The user's NEXT message must contain exactly SKIP GATE. A SKIP GATE token in the same message as the skip request does NOT satisfy the confirmation requirement.
The orchestrator writes Status: SKIPPED with Reason field to the ledger.

Per-phase acknowledgment: SKIPPED requires one acknowledgment per phase, not per boundary. Before starting Phase N, the orchestrator checks all prior phases. Any prior phase with Status: SKIPPED that has not yet been Acknowledged: true triggers the BLOCKED message. The user types SKIP GATE once per skipped phase, and the ledger records Acknowledged: true. Subsequent boundaries do not re-prompt for already-acknowledged skips.

Missing artifact handling: If a phase was SKIPPED because no artifact was produced, retroactive gating requires the user to supply the artifact path: "To run the gate on Phase N, provide the artifact path." If no artifact exists, retroactive gating is not possible — the phase remains SKIPPED.

Recovery from SKIPPED: If the user later wants to properly gate a skipped phase, they can ask to "run the gate on Phase N." The orchestrator transitions SKIPPED → IN_PROGRESS, runs the quality gate on the phase's artifact, and writes the result normally.

Phase 4 completion warning: If ANY prior phase has Status: SKIPPED, Phase 4 outputs a prominent warning listing all skipped gates before presenting finish options.

State Machine

Phase 1: Design
  NOT_STARTED → IN_PROGRESS (design skill starts)
  IN_PROGRESS → PASS (quality gate verdict marker verified)
  IN_PROGRESS → FAIL (quality gate escalates — stagnation/regression)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE — does NOT unlock next phase without acknowledgment)
  SKIPPED → IN_PROGRESS (user asks to run the gate retroactively)
  INFERRED → IN_PROGRESS (user runs gate after compaction recovery)
  INFERRED → SKIPPED (user types SKIP GATE after compaction recovery)

Phase 2: Plan
  NOT_STARTED → IN_PROGRESS (requires Phase 1 Status = PASS or SKIPPED+Acknowledged)
  [same transitions as Phase 1]

Phase 3: Execute (no quality gate — uses COMPLETE instead of PASS)
  NOT_STARTED → IN_PROGRESS (requires Phase 2 Status = PASS or SKIPPED+Acknowledged)
  IN_PROGRESS → COMPLETE (all tasks done, per-task reviews passed, verification gates green)
  IN_PROGRESS → FAIL (task failures, user escalation)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE)
  SKIPPED → IN_PROGRESS (user asks to run retroactively)
  Note: Phase 3 has no QG invocation. COMPLETE satisfies Phase 4's gate requirement.

Phase 4: Completion
  NOT_STARTED → IN_PROGRESS (requires Phase 3 Status = COMPLETE or SKIPPED+Acknowledged. PASS is unreachable for Phase 3.)
  IN_PROGRESS → PASS (quality gate verdict marker verified)
  IN_PROGRESS → FAIL (quality gate escalates)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE)
  SKIPPED → IN_PROGRESS (user asks to run retroactively)
  IN_PROGRESS includes: emit skip warnings if any prior phase SKIPPED

Compaction Recovery (Ledger)

build-gate-ledger.md is on disk and survives compaction. Recovery precedence when state is partial:

Ledger exists, handoff manifest missing: Use ledger to determine which phase to resume from. Prompt: "Gate ledger shows Phase N passed, but the phase handoff context was lost. Confirm resume from Phase N+1?" If PASS but no handoff, also prompt for Phase N inputs (design doc path, plan path, etc.) before proceeding.
Handoff manifest exists, ledger missing: Reconstruct ledger from manifests. Mark the current phase as INFERRED (not PASS). Mark predecessor phases as PASS (handoff existence proves the boundary was crossed). Generate a new PipelineID and write it to the reconstructed ledger header. After writing, re-read the ledger header to extract the PipelineID into active state. INFERRED phases trigger the gate-blocked check — the orchestrator must run a fresh quality gate (with matching PipelineID) or the user must type SKIP GATE.
Both missing: Fresh start. Prompt user.

Quality Gate Requirement (Non-Negotiable)

Every quality gate in this pipeline MUST run to completion. This is NOT optional — you may NOT self-assess whether a quality gate is "needed" based on task size, complexity, or scope.

Quality gates are unconditional at all three gate points:

Phase 1, Step 2 — Design doc gate
Phase 2, Step 3 — Plan gate
Phase 4, Step 6 — Implementation gate

Common rationalizations that are NEVER valid reasons to skip:

"This is a small change"
"This is trivial / simple / straightforward"
"This is just a config change / documentation update / one-liner"
"The quality gate won't find anything on something this simple"
"I fixed the findings, so the gate is done" — fixing findings is NOT the same as passing the gate. The iteration loop must complete with a clean verification round (0 Fatal, 0 Significant on a fresh review). Fix agents introduce new issues or incompletely resolve old ones — that is why fresh-eyes re-review exists.

This requirement exists because: Quality gates consistently find issues the pipeline misses regardless of task size. There is no category of task that is immune. In observed runs, tasks self-assessed as "trivial" had the same defect rate as complex tasks. The only way to skip a quality gate is with explicit user approval — an unambiguous instruction specifically referencing the gate, not general feedback like "looks good" or "move on."

Pipeline Status

Write a status file to ~/.claude/projects/<hash>/memory/pipeline-status.md at every narration point. This file is overwritten (not appended) and provides ambient awareness for the user in a second terminal.

Write Triggers

Write the status file at every point where the Communication Requirement mandates narration: before dispatch, after completion, phase transitions, health changes, escalations, and after compaction recovery.

Status File Format

The status file uses this structure (overwritten in full each time):

# Pipeline Status
**Updated:** <current timestamp>
**Started:** <timestamp from first write — persisted across compaction>
**Skill:** build
**Phase:** <current phase, e.g. "3 — Execute (Autonomous)">
**Health:** <GREEN|YELLOW|RED>
**Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED>
**Elapsed:** <computed from Started>

## Recent Events
- [HH:MM] <most recent event>
- [HH:MM] <previous event>
(last 5 events, newest first)

Skill-Specific Body

Append after the shared header:

## Task Progress
| # | Task | Tier | Status | Duration |
|---|------|------|--------|----------|
| 1 | Auth middleware | T3 | DONE | 12m |
| 2 | Route handlers | T2 | IN REVIEW (code, pass 1) | 18m+ |
| 3 | Database layer | T1 | PENDING | — |

## Quality Gates
- Design: PASSED (2 rounds)
- Plan: PASSED (1 round)
- Task tiers: 1x T1, 1x T2, 1x T3
- Code: not yet reached

## Checkpoints
- Last checkpoint: pre-wave-3 (12:45:30)
- Total checkpoints: 7
- Shadow repo: healthy

## Compression State
Goal: [original user request]
Key Decisions:
- [accumulated decisions, max 10]
Active Constraints:
- [constraints affecting remaining work]
Next Steps:
1. [immediate next action]
2. [subsequent actions]

The Compression State section is a semantic subset of the full Compression State Block emitted into the conversation. It omits Files Modified (recoverable from git) and Scratch State (fixed per skill). It is the first section read during compaction recovery.

Health State Machine

Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.

Phase boundaries (reset to GREEN): Phase 1->2, 2->3, 3->4
YELLOW: review loop round 3+, quality gate round 5+, retry in progress
RED: escalation pending, stagnation detected, test suite failure unresolved

When health is YELLOW or RED, include **Suggested Action:** with a concrete, context-specific sentence (e.g., "Code review looping on Task 4. Check recent events for recurring patterns.").

Inline CLI Format

Output concise inline status alongside the status file write:

Minor transitions (dispatch, completion): one-liner, e.g. Phase 3 [4/8] Task 4 IN REVIEW (pass 1) | GREEN | 1h 12m
Phase changes and escalations: expanded block with --- separators
Health transitions: always expanded with old -> new health

Compaction Recovery

After compaction, before re-writing the status file: 0. Read the ## Compression State section from pipeline-status.md — recover Goal, Key Decisions, Active Constraints, and Next Steps. If the section is absent (pre-update pipeline), skip to step 1.

0.5. Check for handoff manifests (handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs, Decisions, and Constraints to reconstruct state for the current phase — this supersedes the Compression State section for phase-boundary recovery. If no manifest exists, continue with CSB-based recovery.

Read the rest of pipeline-status.md to recover Started timestamp and Recent Events buffer
Reconstruct phase, health, and skill-specific body from internal state files
If crucible:checkpoint was used: verify checkpoint availability by checking for the shadow repo at the computed path. Log available checkpoint count. Do not restore — just confirm checkpoints are recoverable.
Emit a Compression State Block into the conversation to seed the new context window with recovered state 4.5. Read session index summary (supplementary): If the CSB Scratch State contains a Session Index: path, or if globbing ~/.claude/projects/<hash>/memory/session-index/*/summary.md finds a recent file, read summary.md. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. If summary.md lacks detail for a specific event type (e.g., errors, decisions, file changes), use /recall to query events.jsonl with filters for targeted recovery.
Write the updated status file
Output inline status to CLI

Compression State Block

At checkpoint boundaries (see Checkpoint Timing below), emit the following structured block into the conversation. This block signals to the auto-compactor which state is critical to preserve. Also persist the semantic subset (Goal, Key Decisions, Active Constraints, Next Steps) to the ## Compression State section of pipeline-status.md.

===COMPRESSION_STATE===
Goal: [original user request, one sentence]
Skill: [skill name]
Phase: [current phase identifier]
Health: [GREEN|YELLOW|RED]
Mode: [skill-specific mode if applicable, omit otherwise]

Progress:
- [completed milestone 1]
- [completed milestone 2]
- [current work in progress]

Key Decisions (this session):
- [DEC-1] [decision]: [reasoning, one line]
- [DEC-2] [decision]: [reasoning, one line]

Active Constraints:
- [constraint that affects remaining work]
- [constraint from prior phase that still applies]

Files Modified:
- [file path]: [what changed, one line]

Scratch State:
- Location: [scratch directory path]
- Session Index: [~/.claude/projects/<hash>/memory/session-index/<session-id>/ if active, omit if not]
- Recovery: [which files to read first, in order]

Next Steps:
1. [immediate next action]
2. [action after that]
3. [remaining work summary]
===END_COMPRESSION_STATE===

Rules:

Key Decisions list is capped at 10. When adding an 11th, compress the oldest low-impact decision into a single-line Progress entry annotated "[compressed from decisions]".
Each Compression State Block includes the FULL accumulated decision list, not just new decisions since the last block. Decisions accumulate across compressions.
Progress entries are cumulative — include all completed milestones, not just since the last block.
Files Modified lists only files changed since the last block emission. On first block of a session, list all files changed so far.
Goal must be the original user request verbatim or a faithful one-sentence paraphrase. Do not let it drift across compressions.

Checkpoint Timing

Emit a Compression State Block into the conversation AND update the ## Compression State section in pipeline-status.md at these points:

Phase transitions: 1→2, 2→3, 3→4 — emit a Phase Handoff Manifest (see below) instead of a Compression State Block at these points
Phase 3 progress: After every 3 task completions
Quality gate entry/exit: Before first quality gate round dispatch and after gate completes (pass or escalation)
Escalations: Before any escalation to user
Health transitions: On any GREEN->YELLOW or YELLOW->RED transition

These triggers are a superset of the existing pipeline-status.md write triggers. The Compression State Block is emitted alongside (not instead of) the normal narration and status file write.

Phase Handoff Manifest

At phase boundaries (1→2, 2→3, 3→4), write a handoff manifest to the scratch directory instead of emitting a Compression State Block. The manifest defines exactly what the next phase needs — an allowlist, not a denylist. Everything not on the manifest is shed.

Format:

# Phase Handoff: N → M
**Timestamp:** ISO-8601
**Goal:** [original user request, verbatim]
**Mode:** feature | refactor

## Inputs for Phase M
- **[Input name]:** [disk path or inline value]

## Decisions Carried Forward
- [DEC-N] [decision]: [reasoning, one line]

## Active Constraints
- [constraint affecting remaining work]

## Shed Receipt
- [what was shed] → [where it lives on disk]

Rules:

After writing the manifest, emit an explicit shed statement: list what context is no longer needed, where it lives on disk, and that the orchestrator operates from manifest inputs only going forward.
After writing the manifest, update the ## Compression State section in pipeline-status.md with the manifest contents (Goal, Decisions, Constraints, and the Inputs as Next Steps). This ensures compaction recovery can reconstruct state even if the manifest is lost.
CSBs continue at all non-boundary checkpoint triggers (intra-phase progress, quality gate entry/exit, escalations, health transitions).
Backward compatibility: If a handoff manifest does not exist at a recovery point, fall back to CSB-based recovery (existing behavior).

Mode Detection

Before dispatching the design skill, determine whether this build is:

Feature mode (default) — adding new capability. Success = new acceptance tests pass.
Refactor mode — restructuring existing code. Success = existing behavior preserved + structural goals met.

Detection: If the user's intent is ambiguous, ask directly before proceeding:

"Is this adding new behavior, or restructuring existing code without changing what it does?"

The user's answer sets the mode for the entire pipeline. No special syntax needed.

Eval-gate pointer (Mock Dispatch Mode): if CRUCIBLE_BUILD_EVAL_MODE is set, use its value (feature or refactor) as the mode-detection answer and skip the AskUserQuestion call. The substitution rule lives in the ## Mock Dispatch Mode (eval-gate) section near the top of this file.

Mode Propagation

Propagate refactor mode to subagents through:

New refactor-specific prompt templates — contract-test-writer-prompt.md and refactor-implementer-addendum.md are standalone files used only in refactor mode. Select these instead of (or in addition to) the feature-mode equivalents.
Appended context blocks — For existing prompts that serve both modes (plan-writer-prompt.md, build-implementer-prompt.md), append a "Refactor Mode Context" section when composing the dispatch file. The templates remain flat markdown — the orchestrator decides what to include.
Scratch file for compaction recovery — Persist the current mode in /tmp/crucible-build-mode.md containing mode: refactor or mode: feature plus the baseline commit SHA. Only one build runs per session, so a well-known filename is sufficient.

Compaction Recovery

Build's existing compaction step must read the Compression State FIRST (step 0 from Pipeline Status Compaction Recovery), then the mode file, before re-reading the task list or any other state. On resumption after compaction:

Read ## Compression State from pipeline-status.md — recover goal, decisions, constraints, next steps. 0.5. Check for handoff manifests (handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs and Mode to bootstrap recovery — this supersedes the mode file for phase-boundary state.
Read /tmp/crucible-build-mode.md — recover mode and baseline commit SHA.
If file is missing: Default to feature mode and warn.
If mode is refactor: Verify baseline commit SHA exists.
Read build-gate-ledger.md — if it exists, apply Gate Ledger Compaction Recovery (see Compaction Recovery subsection under Gate Ledger Protocol). Use the ledger's phase statuses to determine the resume point. If the ledger is missing but handoff manifests exist, reconstruct with INFERRED status.
After mode and ledger are recovered: Proceed with general state reconstruction (task list, phase, health).

Phase 1: Design (Interactive)

Step -1: Resume Detection and Pipeline-Active Marker

Before any design or dispatch work, check for a crashed prior pipeline:

Check <scratch>/.pipeline-active (where <scratch> is ~/.claude/projects/<hash>/memory/)
Not found: Write the pipeline-active marker (JSON with pipeline_id set to current session ID, skill set to "build", phase set to "1", start_time set to current ISO-8601 timestamp, scratch_dir set to the scratch directory path, dispatch_dir set to the dispatch directory path, branch from git branch --show-current, baseline_sha from git rev-parse HEAD). Proceed to Step 0.
Found, same pipeline_id as current session: This is a compaction recovery scenario. Follow existing compaction recovery procedures. Do not re-write the marker.
Found, different pipeline_id: a. Branch guard: Compare marker's branch field against current git branch --show-current. If they differ, warn: "Previous build on branch [marker.branch] crashed at Phase [phase]. You are currently on [current-branch]. Switch to [marker.branch] before resuming? [switch+resume / start fresh / abort]". Do NOT offer resume on the wrong branch. b. Read manifest.jsonl from the marker's dispatch_dir (or from the scratch directory copy if /tmp was lost) c. Identify the last successful phase boundary by scanning manifest entries grouped by phase. A phase boundary is verified when all dispatches in that phase have status: "completed". d. Present resume option to the user:

"Previous build on branch [marker.branch] crashed at Phase [N], [context]. Resume from [last good boundary] ([checkpoint reason], [estimated time preserved] of work preserved)? [yes / no / fresh]" e. User accepts: Invoke crucible:replay in resume mode, passing the scratch directory path. The replay skill handles checkpoint restore, state reconstruction, and re-dispatch. The build pipeline does not continue -- replay takes over. f. User declines (fresh): Delete the stale .pipeline-active marker. Write a fresh marker with the current session. Proceed to Step 0 as a new pipeline run.

Marker updates during pipeline: Update the phase field in .pipeline-active at each phase boundary (1->2, 2->3, 3->4) to track progress for crash detection.

Marker cleanup: Delete .pipeline-active at Phase 4 step 12 (after finish skill completes).

Gate Ledger Initialization: After the pipeline-active marker is written (or recovered) and mode detection is complete, run the Gate Ledger Protocol's Ledger Initialization and Orphan Cleanup steps. The ledger must exist before Phase 1 transitions to IN_PROGRESS.

Compass Arc Emit (build orchestrator only — D14):

After Gate Ledger Initialization completes AND the resume decision at Step -1 has resolved, emit the current arc to docs/compass.md — but ONLY on a fresh-start or fresh-restart path. Skip this emit if the user accepted the resume path (Step -1e: replay took over), because the prior arc's current_arc is already correct. Do NOT place this emit inside crash-recovery branches (Step -1, items 3 or 4e), as those fire mid-resume-detection and can clobber current_arc before replay restores the prior arc.

RESUME_DECISION is set by Step -1 to one of fresh / resume / fresh-restart. Default fresh if unset.

if [ "${RESUME_DECISION:-fresh}" != "resume" ]; then
  python scripts/compass.py update --field current_arc --value "#<ticket>: <user-goal-one-liner>" \
    || echo '[compass] emit failed at arc start; continuing build' >&2
fi

Replace <ticket> with the GitHub issue number (e.g. 273) and <user-goal-one-liner> with a short, precise description of the task at hand (e.g. Compass arc-state skill). The leading # is required — compass update raises ValueError on values missing the #NNN: prefix.

Error policy (best-effort): Compass is an optimization, not a correctness layer. A failed emit MUST NOT fail the build pipeline — log to stderr and continue. Never tighten this error handling.

D14 invariant: Sub-agents spawned inside build do NOT emit compass updates. This emit fires from the build orchestrator only, exactly once per fresh pipeline start.

Step 0: Pre-Existing Doc Detection

Before running interactive design, check whether /spec (or a prior /build run) already produced design artifacts for this ticket.

Scan for pre-existing spec docs: Search docs/plans/ for design docs (*-design.md) with a matching ticket field in YAML frontmatter. Also check for corresponding *-implementation-plan.md and *-contract.yaml files with the same ticket field.
Conflict detection: If multiple design docs match the same ticket field, escalate to user: "Found multiple design docs for ticket #NNN: [list files]. Which should I use?" Do not proceed until the user resolves the conflict.
Full match (design doc + implementation plan + contract all present):
- Skip interactive design (the Phase 1 design sub-skill below) — design doc already exists
- Security review check: If the contract contains security_review field, note it in the Phase 1→2 handoff manifest under Active Constraints: "Contract requires security review (security_review.status: [required|recommended]) — siege will be evaluated in Phase 4 Step 5.5." This ensures the directive survives phase handoffs and compaction recovery.
- Quality-gate the existing design doc with staleness context: "This design doc is pre-existing from /spec and may be stale — verify against current codebase state before proceeding"
- Staleness rejection: If the quality gate finds that the design doc references files, interfaces, or modules that no longer exist in the codebase, reject the doc as fundamentally stale. Fall back to running Phase 1 interactively. Inform user: "Pre-existing design doc for #NNN is fundamentally stale (references [specific items] that no longer exist). Running interactive design instead."
- If quality gate passes: Run Phase 2 on the pre-existing implementation plan — skip Plan Writer (plan already exists), but run Plan Reviewer + innovate + quality-gate on the existing plan. This ensures the plan gets the same review rigor as a freshly written plan.
- If quality gate fails (non-staleness issues): fix or escalate
- Proceed to Phase 3 when the plan passes review
Partial match (design doc present but implementation plan or contract missing):
- Use the existing design doc (quality-gate it as above, including staleness rejection)
- Run the missing phases normally: if no implementation plan, run Plan Writer in Phase 2; if no contract, proceed without contract awareness for this ticket
- Inform user which artifacts were found and which are being generated fresh: "Found pre-existing design doc for #NNN. Implementation plan is missing — will generate in Phase 2." (or similar)
Not found: Proceed with normal Phase 1 (interactive design below).

Model: Opus (creative/architectural work needs the best model)
Mode: Interactive with the user
RECOMMENDED SUB-SKILL: Use crucible:forge (feed-forward mode) — consult past lessons before starting
RECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (consult mode) — review codebase map for structural awareness
REQUIRED SUB-SKILL: Use crucible:design
Follow design skill for design refinement, section-by-section validation, and saving the design doc
OVERRIDE: When design completes and the design doc is saved, do NOT follow design's "Implementation" section (do not chain into planning or worktree from there). Return control to this build skill — Phase 2 handles planning with its own subagent-based approach.
Phase ends when user approves the design (says "go", "looks good", "proceed", etc.)
Everything after this point is autonomous — tell the user: "Design approved. Starting autonomous pipeline — I'll only interrupt for escalations."

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, all Use crucible:<skill> and Dispatch a <kind> subagent invocations in Phase 1 (design, innovate, quality-gate, PRD writer, acceptance test writer, contract test writer) substitute a disk-read from the mock dir for the Task tool invocation. Each substitution follows the substitution rule in the ## Mock Dispatch Mode (eval-gate) section. AskUserQuestion calls in Phase 1 use CRUCIBLE_BUILD_EVAL_USER_INPUT_DIR per that same section.

Step 2: Innovate and Red-Team the Design

After the user approves the design and before starting Phase 2:

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-design-gate" before dispatching innovate and quality-gate on the design doc.

Innovate: Dispatch crucible:innovate on the design doc. Plan Writer incorporates the proposal.
Write Phase 1 IN_PROGRESS to the gate ledger (after ledger initialization).
REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) design doc with artifact type "design". Include in the dispatch context: Phase: design and PipelineID: <current PipelineID>. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.)
If the quality gate requires changes, the Plan Writer updates the design doc and re-commits.
Verify verdict marker and write Phase 1 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
Design doc is now finalized — proceed to acceptance tests.

Step 2.5: Generate PRD

After the design doc is finalized (Step 2 complete), generate a stakeholder-facing PRD:

Dispatch a PRD Writer subagent (Sonnet) using ./prd-writer-prompt.md
- Input: finalized design doc
- Output: PRD in standard format (problem statement, user stories, requirements, scope, out-of-scope, success metrics, technical notes, dependencies)
Save to docs/prds/YYYY-MM-DD-<topic>-prd.md
Commit: docs: add PRD for [feature]

This step runs by default. The PRD is a reformatting of the design doc for non-technical stakeholders — it does not introduce new decisions or requirements. Skip only in refactor mode (refactoring has no stakeholder-facing PRD).

Step 3: Generate Acceptance Tests (RED)

Before planning, define "done" with executable tests:

Dispatch an Acceptance Test Writer subagent (Opus) using ./acceptance-test-writer-prompt.md
- Input: finalized design doc (especially acceptance criteria)
- Output: integration-level test file(s) that verify feature behavior end-to-end
Run the acceptance tests — verify they FAIL (the feature doesn't exist yet)
- If tests pass: something is wrong — investigate before proceeding
- If tests error (won't compile): this is expected in typed languages — note which tests exist and what they verify. They become the first implementation task.
Commit: test: add acceptance tests for [feature] (RED)

These tests define the feature-level RED-GREEN cycle that wraps the entire pipeline. The pipeline is done when these tests pass.

Refactor Mode: Phase 1 Changes

When in refactor mode, Phase 1 shifts from "what should we build?" to "what are we changing and what could break?"

Blast Radius Analysis

After the user describes the refactoring intent, the design phase:

Identify the target — What code is being restructured? (module, interface, data representation, file organization, etc.)
Trace the blast radius using cartographer (if available) or fallback exploration:
- Direct consumers — code that imports/calls/references the target
- Indirect dependents — code that depends on consumers (transitive)
- Test coverage — which tests exercise the target behavior
- Configuration/wiring — DI registrations, config files, build scripts that reference the target
- Fallback when cartographer is unavailable: Use language-aware symbol search via agent exploration. Grep for symbol references (imports, type annotations, function calls) using language-specific patterns. The impact manifest's confidence field reflects reduced precision.
Present an impact manifest to the user:

### Impact Manifest

**Target:** [what's being restructured]
**Structural goal:** [what the code should look like after]

**Direct consumers:** N files
- path/to/consumer1.py (calls TargetClass.method)
- path/to/consumer2.py (imports TargetClass)

**Indirect dependents:** N files
- path/to/dependent.py (depends on consumer1)

**Test coverage:**
- N tests directly exercise target behavior
- N tests exercise consumers
- Gap: no tests cover [specific seam]

**Risk assessment:** [Low/Medium/High] based on consumer count and coverage gaps
**Confidence:** [High/Medium/Low] — High if cartographer used, Medium/Low if fallback

When confidence is Low, require explicit user confirmation before proceeding. The user must review the impact manifest and confirm the blast radius is complete.

Design the structural goal — what should the code look like after the refactoring? User validates the target state.

Acceptance Tests (Refactor Mode)

Instead of writing NEW acceptance tests (Step 3 above), the pipeline:

Dispatch the contract test writer using ./contract-test-writer-prompt.md — a single agent handles gap identification AND gap filling. Input: impact manifest + blast radius file list. The agent maps existing tests to behavioral seams, identifies untested seams, and writes contract tests for each gap.
Run all contract tests GREEN — contract tests must pass before any refactoring begins.
If a contract test FAILS: The contract test writer investigates:
- Test defect (wrong assertion, bad setup) — fix the test and re-run
- Latent codebase bug — report to user with options: (a) fix the bug first, (b) exclude this seam and accept the risk, (c) abort the refactoring. Never silently drop a failing contract test.
Commit: test: add contract tests for [target] refactoring (GREEN — locking existing behavior)

Proportionality Escape Valve

Contract test writing must remain proportional to the refactoring scope. Trigger a scope check when any of these thresholds are hit:

Count threshold: More than 15 contract tests needed
Effort threshold: Contract test writer reports context pressure, or estimated total contract test LOC exceeds ~2x the estimated refactoring scope LOC

When triggered:

Present the full gap list to the user with estimated effort per gap
User selects which gaps to fill and which to accept as uncovered risk
Proceed with only user-selected contract tests

The impact manifest records which gaps the user chose to leave uncovered.

Phase Handoff: 1 → 2

Before dispatching the Plan Writer, verify the gate ledger and write a handoff manifest:

Gate ledger check: Read build-gate-ledger.md and verify Phase 1 Status is PASS. If not, follow Enforcement Rules.
Write handoff-1-to-2.md with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 2: design doc path, acceptance test paths (or contract tests in refactor mode), PRD path (if generated), conventions path (from cartographer, if loaded)
- Decisions Carried Forward: accumulated decisions from Phase 1
- Active Constraints: constraints affecting planning
- Shed Receipt: design iteration history, innovate proposals, quality gate round details → design doc on disk captures the outcome
Emit shed statement: "Phase 1 context shed. Design doc, acceptance tests, and PRD are on disk. Design iteration history, innovate proposals, and gate round details are not carried forward."
Update ## Compression State in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block (manifest replaces it at this boundary).
Session index event: Emit a phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 1 -> Phase 2 (Plan)","detail":{"skill":"build","from":"1","to":"2"}}.

Phase 2: Plan (Autonomous)

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, the Plan Writer, Plan Reviewer, innovate, and quality-gate dispatches in this phase use the mock-dir substitution rule defined in the ## Mock Dispatch Mode (eval-gate) section. The substitution does not change Phase 2's structure or the gate-ledger writes.

Step 1: Write the Plan

Dispatch a Plan Writer subagent (Opus):

Read the design doc produced in Phase 1 and the acceptance tests from Step 3
Write an implementation plan following the crucible:planning format
If acceptance tests couldn't compile (typed language), Task 1 should create the interfaces/stubs needed for them to compile and fail correctly
Include per-task metadata: Files (with count), Complexity (Low/Medium/High), Dependencies
Save to docs/plans/YYYY-MM-DD-<topic>-implementation-plan.md
Plan tasks should be scoped to 2-3 per subagent, ~10 files max (context budget awareness)

Use ./plan-writer-prompt.md template for the dispatch prompt.

Step 2: Review the Plan

Dispatch a Plan Reviewer subagent:

Reviewer model selection:

Plan touches 4+ systems or has 10+ tasks → Opus
Plan touches 1-3 systems with <10 tasks → Sonnet
When in doubt → Opus

Review protocol (iterative):

Dispatch Plan Reviewer to check plan against design doc
If issues found: record issue count, dispatch Plan Writer to revise
Dispatch NEW fresh Plan Reviewer on revised plan (no anchoring)
Compare issue count to prior round:
- Strictly fewer issues → progress, loop again
- Same or more issues → stagnation, escalate to user with findings from both rounds
Loop until plan passes with no issues
Architectural concerns bypass the loop — immediate escalation regardless of round

Use ./plan-reviewer-prompt.md template for the dispatch prompt.

Step 3: Innovate and Red-Team the Plan

After the plan passes review:

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-plan-gate" before dispatching innovate and quality-gate on the plan.

Write Phase 2 IN_PROGRESS to the gate ledger.
Innovate: Dispatch crucible:innovate on the approved plan. Plan Writer incorporates the proposal into the plan.
REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) plan with artifact type "plan". Include in the dispatch context: Phase: plan and PipelineID: <current PipelineID>. Provides the plan and design doc as context. (Non-negotiable — see Quality Gate Requirement.)
Verify verdict marker and write Phase 2 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.

The quality gate handles the iterative red-team loop — fresh review each round, weighted stagnation detection, 15-round safety limit, escalation. See crucible:quality-gate for details.

Phase Handoff: 2 → 3

Before creating the team and task list, write a handoff manifest. Step 3.4 above already verified the verdict marker, wrote PASS to the ledger, and deleted the marker. The handoff manifest is written AFTER the ledger PASS — this sequencing ensures compaction recovery finds a consistent state (ledger shows PASS, handoff exists).

Write a handoff manifest:

Write handoff-2-to-3.md with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 3: plan path, design doc path, acceptance test paths (or contract tests), contract YAML path (if exists), baseline SHA (current HEAD), cartographer context paths (module files, conventions.md, landmines.md)
- Decisions Carried Forward: accumulated decisions from Phases 1-2
- Active Constraints: constraints affecting execution
- Shed Receipt: plan review iterations, innovate proposals, quality gate round history → plan on disk captures the outcome
Emit shed statement: "Phase 2 context shed. Plan, design doc, and acceptance tests are on disk. Plan review rounds, innovate proposals, and gate details are not carried forward."
Update ## Compression State in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block.
Session index event: Emit a phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 2 -> Phase 3 (Execute)","detail":{"skill":"build","from":"2","to":"3"}}.

Phase 3: Execute (Autonomous, Team-Based)

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, all per-task dispatches in this phase (implementer, reviewer, cleanup, test-coverage, test-gap-writer, adversarial-tester, architecture-reviewer) use the mock-dir substitution rule defined in the ## Mock Dispatch Mode (eval-gate) section. TeamCreate and TaskCreate calls run normally — only the Task/Agent tool invocations on teammates are substituted.

Step 0: Load Module Context for Subagents

RECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (load mode) — when dispatching implementers and reviewers, include relevant module files, conventions.md, and landmines.md in their dispatch files
Defect signature loading (for implementers only):
1. Glob defect-signatures/*.md (excluding *.non-matches.md) from the cartographer storage directory
2. For each signature, read its Modules field and match against the task's target modules:
  - Read each cartographer module file's Path: field
  - A task's file is in a module if the file path starts with the module's Path: value
  - When a task spans multiple modules, load signatures for all matched modules
  - Directory prefix fallback: When no cartographer modules exist, match if any target file path starts with any of the signature's Modules directory prefixes
3. For matching signatures, validate all file paths still exist on disk — drop stale entries silently
4. Inject into the [DEFECT_SIGNATURES] section of build-implementer-prompt.md:
  - Generalized pattern (always)
  - Confirmed siblings list (always)
  - Unresolved siblings list (always — these are known live defects; produces a stronger warning)
  - Non-match companion files are NOT loaded for implementers
5. Last loaded update: Loading is pure-read. After all implementer dispatches for the current phase complete, batch-update the Last loaded field to today on all signatures that were loaded. Do NOT update during dispatch — defer to after all subagents are dispatched.
Grudge pre-flight (regression-oracle, #271): Before dispatching implementers, query the Book of Grudges for each task's in-scope files and inject any matches into that implementer's dispatch file as a hard DO NOT REPEAT constraint (sibling to defect-signature loading). Resolve the helper by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/grudge_query.py" <task files…>; non-empty output lists past regressions held against those files. Best-effort: if the helper is unresolved, emit a one-line stderr warning and continue — a missing pre-flight must NEVER block the build. See skills/grudge/SKILL.md.

Step 0.5: Gate Ledger — Phase 3 Start

Write Phase 3 IN_PROGRESS to the gate ledger (after Phase 2 PASS verification).

Step 1: Create Team and Task List

Create a team using TeamCreate:

team_name: "build-<feature-name>"
description: "Building <feature description>"

Read the approved plan. Create tasks via TaskCreate for each plan task, including:

Subject from plan task title
Description with full plan task text (subagents should never read the plan file)
Dependencies via TaskUpdate with addBlockedBy

Agent Teams Fallback

If TeamCreate fails (agent teams not available), output a clear one-time warning:

⚠️ Agent teams are not available. Recommended: set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 Falling back to sequential subagent dispatch via Agent tool.

Then fall back to sequential subagent dispatch via the regular Task tool (without team_name). Everything still works — independent tasks run sequentially instead of in parallel via teammates.

What changes in fallback mode:

Tasks are dispatched via Agent tool instead of as teammates
Independent tasks that would run in parallel now run sequentially
Task tracking still uses TaskCreate/TaskUpdate for state management
All other pipeline behavior (TDD, review, de-sloppify, quality gates) is unchanged

Step 2: Analyze Dependencies and Execution Order

Before dispatching:

Map the dependency graph from plan task metadata
Identify independent tasks (no shared files, no sequential dependencies)
Group into execution waves — independent tasks parallel, dependent tasks sequential
Assess complexity per task for reviewer model selection

Step 3: Execute Tasks

For each task (or wave of parallel tasks):

RECOMMENDED SUB-SKILL: Before dispatching each execution wave, use crucible:checkpoint — create checkpoint with reason "pre-wave-N" (where N is the wave number). This captures the working directory state after the prior wave's verification gate passed.

Mark task in_progress via TaskUpdate
Spawn Implementer teammate (Opus) via Task tool with team_name and subagent_type="general-purpose"
- Use ./build-implementer-prompt.md template
- Pass full task text, file paths, project conventions
- Contract-aware dispatch (when a contract exists for this ticket): Include the contract YAML alongside the design doc and task description. See "Contract-Aware Implementer Guidance" below.
- Implementer follows TDD, writes tests, runs tests, commits, self-reviews
When Implementer reports completion, run De-Sloppify Cleanup (see below)
After cleanup completes, spawn Reviewer teammate
- Use ./build-reviewer-prompt.md template
Tier-aware review routing: Read the task's Review-Tier from plan metadata.
- Tier 1: Dispatch single-pass code reviewer (Sonnet). If Clean or Minor-only: task complete. If Critical/Important: dispatch implementer fix, then task complete. If Architectural Concern: escalate.
- Tier 2: Dispatch iterative code review (per existing loop). Then dispatch single-pass test reviewer. If test review surfaces Critical findings, escalate to Tier 3. Then dispatch adversarial tester (per existing logic). Task complete.
- Tier 3: Follow current full pipeline (no changes to existing flow).

Contract-Aware Implementer Guidance

When a contract YAML exists for the current ticket (detected during Step 0 or produced by /spec), the implementer receives the contract alongside the design doc and task description. The contract uses the schema defined in crucible:spec/contract-schema.md (version 1.0). Implementers must treat contract elements as follows:

api_surface declarations are binding. The implementer must match the declared function signatures, class interfaces, endpoint shapes, parameter names, types, and return types exactly. Deviations from the contract's API surface are implementation errors.
checkable invariants are binding. The implementer must satisfy all declared constraints (e.g., "must not import X", "must be idempotent"). The check_method field (grep, code-inspection, file-structure) indicates how the quality gate will verify compliance — the implementer should self-check against these before committing.
testable invariants require tagged tests. For each testable invariant, the implementer must write a test tagged with the declared test_tag (pattern: contract:<category>:<id>) that validates the invariant. These tests are checked by the quality gate and reviewers — they must exist and pass.
integration_points are informational. These indicate which other components and contracts this ticket interacts with. The implementer should be aware of referenced components and ensure compatibility, but integration points are not binding constraints — they provide context for making good implementation decisions.

De-Sloppify Cleanup

After the implementer reports completion and before dispatching the reviewer:

RECOMMENDED: Use crucible:checkpoint — create checkpoint with reason "pre-cleanup-task-N" before dispatching the cleanup agent. If cleanup removes something needed, restore to this checkpoint.

Record the pre-cleanup commit SHA
Dispatch a fresh Cleanup Agent (Opus) using ./cleanup-prompt.md
- Input: git diff <pre-task-sha>..HEAD (the implementer's committed changes)
- The orchestrator provides the pre-task commit SHA to the cleanup agent
Cleanup agent reviews changes, removes unnecessary code (see allowlist), runs tests
If cleanup made changes, commits separately: refactor: cleanup task N implementation
If cleanup found nothing to remove, reports "No cleanup needed" and proceeds

Reviewer Model Selection (Lead Decides Per-Task)

| Task Complexity | Reviewer Model | |----------------|----------------| | Low (1-3 files, straightforward) | Sonnet | | Medium (3-6 files, some cross-system) | Lead decides (default Opus) | | High (6+ files, refactoring, deep chains) | Opus | | When in doubt | Opus |

Two-Pass Review Cycle

Each task gets TWO review passes before completion:

digraph review {
  "Implementer builds + tests" -> "De-sloppify cleanup";
  "De-sloppify cleanup" -> "Pass 1: Code Review";
  "Pass 1: Code Review" -> "Implementer fixes code findings";
  "Implementer fixes code findings" -> "Pass 2: Test Quality Review";
  "Pass 2: Test Quality Review" -> "Implementer fixes test findings";
  "Implementer fixes test findings" -> "Test Alignment Audit (crucible:test-coverage)";
  "Test Alignment Audit (crucible:test-coverage)" -> "Test Gap Writer";
  "Test Gap Writer" -> "Adversarial Tester";
  "Adversarial Tester" -> "Task complete";
}

Pass 1 — Code Review: Architecture, patterns, correctness, wiring (actually connected, not just existing?)

Pass 2 — Test Quality Review: Test independence? Determinism? Edge cases? Integration tests where mocks are masking real behavior? AAA pattern? Correct test level? (Staleness and alignment checks are handled by the test-coverage dispatch below.)

Review Tier Routing

Each task's Review-Tier (from the plan) determines which review steps execute. Phase 4 full-implementation gates are NOT affected by per-task tiers.

| Step | Tier 1 | Tier 2 | Tier 3 | |------|--------|--------|--------| | Implementer | Yes | Yes | Yes | | De-sloppify cleanup | Yes | Yes | Yes | | Pass 1: Code review | Single pass | Iterative | Iterative | | Implementer fixes (code) | If findings | If findings | If findings | | Pass 2: Test quality review | SKIP | Single pass (non-iterative) | Iterative | | Implementer fixes (test) | SKIP | If critical findings only | If findings | | Test alignment audit | SKIP | SKIP | Yes | | Test gap writer | SKIP | SKIP | Yes | | Adversarial tester | SKIP | Yes | Yes |

Tier 1 "single pass" code review: Dispatch one reviewer. If findings are Clean, task is complete. If findings include Critical or Important issues, dispatch implementer to fix, then the task is complete (no re-review). If findings include an Architectural Concern, escalate as normal.

Tier 2 "single pass" test review: Dispatch one test quality reviewer. Report findings but do NOT enter the iterative review loop. If the single pass surfaces Critical findings, escalate the task to Tier 3 for full iterative treatment.

Tier 2 "iterative" code review: Same as current behavior -- fresh reviewer each round, track issue count, loop until clean or stagnation.

Runtime Tier Escalation

The orchestrator may escalate a task's review tier during execution. Escalation is one-directional (up only).

Triggers:

Implementer reports unexpected complexity or cross-system interaction not anticipated in the plan
Single-pass reviewer (Tier 1 code review or Tier 2 test review) reports Critical findings
Implementer touches significantly more files than the plan specified

Process:

Log escalation to decision journal: [timestamp] DECISION: review-tier | choice=escalate T1->T2 | reason=<trigger> | alternatives=none
Execute the additional review steps for the new tier (from the point where the current tier's pipeline diverges)
Update the task status display to show the escalated tier

Contract-Aware Reviewer Guidance

When a contract YAML exists for the current ticket, reviewers receive the contract alongside the implementation and must add the following checks to both review passes:

API surface compliance: Do the implemented public interfaces match the api_surface declarations in the contract? Check function signatures, class interfaces, endpoint shapes, parameter names/types, and return types. Any deviation from the contract's declared API surface is a blocking finding.
Checkable invariant satisfaction: Are all checkable invariants satisfied per their declared check_method?
- grep: verify the pattern match (or absence) in production code
- code-inspection: read and reason about code to confirm the invariant holds
- file-structure: check file existence/organization matches the constraint Any unsatisfied checkable invariant is a blocking finding.
Testable invariant test existence: Does a test exist for each testable invariant, tagged with the correct test_tag (pattern: contract:<category>:<id>)? A missing tagged test is a blocking finding.
Test correctness: Do the tagged tests actually validate the invariant they claim to cover? A test that exists but does not meaningfully exercise the invariant (e.g., a trivially passing assertion, a test that tests something unrelated despite having the right tag) is a blocking finding.

Severity: All contract-related review findings are classified as blocking — the same severity as contract violations in the quality gate. Contract findings must be resolved before the task is marked complete.

Test Alignment Audit

After the implementer addresses Pass 2 findings, invoke crucible:test-coverage against the task's changes:

Code diff: git diff <pre-task-sha>..HEAD
Affected test files: test files touched or related to the task
Context: "Build task N: [task description]"

The test-coverage skill audits existing tests for staleness (wrong assertions, misleading descriptions, dead tests, coincidence tests) and handles its own fix dispatch and revert-on-failure logic. It returns a structured report. Note: the diff includes review fix commits — the audit agent should focus on behavioral changes to source files, not changes that only touch test files.

Skip this step if the task made no behavioral source changes (only .md, .json, config files).

Test Gap Writer

After test-coverage completes (or is skipped), dispatch a Test Gap Writer (Opus) using ./test-gap-writer-prompt.md:

Input: Pass 2 test reviewer's missing coverage findings + implementer's changes + test-coverage audit report (if available)
The test gap writer writes tests ONLY for gaps the reviewer identified — no scope creep. Before writing a new test for a flagged gap, verify no existing test already covers this path (it may have been updated by the test-coverage audit).
Tests should pass immediately (the behavior already exists from implementation)
The test gap writer reports per-test PASS/FAIL results (see prompt template for report format)
Commits new tests: test: fill coverage gaps for task N

If all tests PASS: Continue to adversarial tester.

If some tests FAIL (gaps reveal genuinely missing implementation):

Dispatch a fresh implementer (Opus) with the failing test(s), their failure messages, and the gap descriptions from the reviewer
Implementer fixes the missing behavior, then re-runs ALL test gap writer tests (not just the failures — catches regressions from the fix)
If all tests pass after fix: commit (fix: address test gap failures for task N), continue to adversarial tester
If tests still fail after one fix attempt: escalate to user with:
- Which coverage gaps the reviewer identified
- Which tests the gap writer wrote (per-test PASS/FAIL)
- What the implementer attempted to fix
- Which tests still fail and their current failure messages

Skip this step if the Pass 2 test reviewer reported zero missing coverage gaps.

Adversarial Tester

After the test gap writer completes (or is skipped), dispatch an Adversarial Tester (Opus) using skills/adversarial-tester/break-it-prompt.md:

Input: Full diff of the task's changes (git diff <pre-task-sha>..HEAD), project test conventions, cartographer module context (if available)
The adversarial tester identifies the top 5 most likely failure modes, writes one test per mode, and runs them
Outcome handling:
- All tests PASS: Implementation is robust. Log results and proceed to task complete.
- Some tests FAIL: Real weaknesses found. Dispatch implementer to fix. Re-run all tests (including adversarial). If pass → task complete. If fail → one more fix attempt, then escalate to user.
- Tests ERROR (won't compile): Adversarial tester mistake. Discard broken tests, log, proceed to task complete.
Quality bypass prevention: If the implementer's fix touches more than 3 files, route through a lightweight code review before completing.
Commit adversarial tests: test: adversarial tests for task N

Skip this step when:

The task diff contains no behavioral source files (only .md, .json, .yaml, .uss, .uxml)
No tests were written during implementation (pure scaffolding)

Iterative Review Loop

Each review pass (code and test) uses the iterative loop:

After fixes, dispatch a NEW fresh Reviewer (no anchoring to prior findings)
Track issue count between rounds
Strictly fewer issues → progress, loop again
Same or more issues → stagnation, escalate to user
Loop until clean
Architectural concerns → immediate escalation regardless of round

Verification Gates

After each wave completes:

Run full test suite (not just current wave's tests)
Check compilation
Failures → identify which task caused regression before fixing
Clean → proceed to next wave

Refactor Mode: Phase 3 Changes

When in refactor mode, Phase 3 execution differs from feature mode in several ways.

Pre-Execution Coverage Check

Before the first task executes:

Run all contract tests from Phase 1 — confirm GREEN
Run the full test suite — confirm GREEN (pre-execution baseline)
Record the "baseline commit" SHA in /tmp/crucible-build-mode.md — this is the rollback target

Tiered Test Strategy

Running the full test suite after every atomic step is prohibitively expensive. Instead:

(a) After each atomic task: Run blast-radius tests + direct consumer tests only (tests identified in the impact manifest)
(b) After each execution wave: Run the full test suite (matches existing verification gate between waves)
(c) Full suite checkpoints: Pre-execution baseline and Phase 4 final verification always run the full suite

Coordinated-Atomic Execution

When the executor encounters a task marked atomic: true:

Record pre-task commit SHA
Implementer makes ALL changes (multiple files) — dispatch with ./refactor-implementer-addendum.md appended
Run blast-radius tests + direct consumer tests (per tiered strategy)
If GREEN: Commit all files together in a single commit
If FAIL: Revert ALL files to pre-task SHA. Dispatch one retry with a fresh implementer that receives the failure context and test output. If second attempt also fails, revert to pre-task SHA and escalate to user (see Rollback Policy below).

Key difference from feature mode: Feature mode does RED-GREEN-REFACTOR. Refactor mode for atomic steps does GREEN-GREEN — tests are green before, tests must be green after. No RED phase because no new behavior is being added.

After a successful atomic commit (step 4), the rest of the per-task pipeline continues as normal: de-sloppify cleanup, two-pass review cycle, test alignment audit, test gap writer, and adversarial tester (unless skipped per restructuring-only annotation below).

Non-atomic refactoring tasks follow normal execution — structural changes that don't break intermediate states (e.g., extracting a private method, adding a module nothing imports yet). These use standard TDD if they introduce new abstractions, or GREEN-GREEN if they are pure restructuring.

Phase 3 Adaptations for Existing Steps

Adversarial tester: The planner annotates each task with restructuring-only: true/false. If restructuring-only: true, adversarial testing is skipped. Tasks with restructuring-only: false still get adversarial testing. When in doubt, default to false.
- restructuring-only: true examples: renames where all call sites are mechanically updated, file moves with updated paths, extract-method where the extracted method is private and preserves the original call signature
- restructuring-only: false examples: extract-class where callers must change call targets, splitting a module where consumers must update imports, any change where the consumer-facing API surface shifts
De-sloppify cleanup: Gains a new removal category: dead compatibility shims. After a refactoring task, look for leftover adapter code, re-export aliases, or compatibility layers introduced during migration but no longer referenced. Detection scope: code added after the baseline commit SHA that re-exports, aliases, or wraps symbols under old names, AND where no code outside the refactoring's changed files references the old names. String-based references: When the target was registered by name in a configuration system, flag the shim as UNCERTAIN and defer to the reviewer rather than removing it.

Refactoring Rollback Policy

Baseline Commit

The orchestrator records the baseline commit SHA before the first refactoring task executes (during pre-execution coverage check). Persisted in /tmp/crucible-build-mode.md.

Per-Task Rollback

When a single task fails after the executor's retry attempt:

Revert that task's changes to the pre-task commit SHA
Escalate to user with failure context and test output
User chooses: skip this task and continue (orchestrator also skips all tasks that depend on the skipped task, and informs the user which tasks were transitively skipped), retry with guidance, or revert all tasks to baseline

Full Rollback to Baseline

When the user chooses full rollback (or cascading failures make forward progress impossible):

Perform git reset --hard <baseline-SHA> to restore pre-refactoring state
Re-run all contract tests to confirm known-good state
Report what was reverted and why

Safe Partial States

The planner annotates tasks with safe-partial: true/false. A task is safe-partial: true if the codebase is in a valid, shippable state after that task completes (all tests green, no dangling references). When a later task fails, the orchestrator can offer to keep changes through the last safe-partial task.

Architectural Checkpoint

For plans with 10+ tasks, at ~50% completion or after a major subsystem:

Dispatch architecture reviewer using ./architecture-reviewer-prompt.md
Design drift → escalate to user
Minor concerns → adjust prompts for remaining tasks
All clear → continue

Noticed Reconciliation

After all implementers in Phase 3 report back and before writing the Phase 3 COMPLETE ledger entry, aggregate their ### Noticed But Not Touching sections into a single docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md artifact.

Scope discipline: Notice, do not act. If an implementer sees an out-of-scope issue during implementation, it must be logged under ### Noticed But Not Touching in their report — NOT fixed in their diff. Acting on noticed items in the same task is a scope-discipline failure. The orchestrator enforces this via reconciliation: noticed entries are surfaced here and converted to follow-up tickets later (see /finish).

7-step reconciliation process:

Collect each implementer's ### Noticed But Not Touching section from every Phase 3 implementer report.
Skip any section whose body is *(none)*.
Dedupe entries using the canonical dedupe key: sha256( normalize(file_path) + "|" + line_range + "|" + noticed[:40] ), where normalize(file_path) is the repo-relative POSIX path lowercased.
Sort the deduped entries by file path, then line range.
If any entries remain, write docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md matching the canonical filename regex ^docs/plans/\d{4}-\d{2}-\d{2}-[a-z0-9-]+-noticed\.md$. Use the date embedded in the sibling plan filename (not wall-clock date) so all sibling artifacts share a date; slug matches the ticket being built. Frontmatter and body must follow the Canonical Constants template exactly:
```
---
pipeline_id: "<build-YYYYMMDD-HHMMSS>"
date: "YYYY-MM-DD"
ticket: "#NNN"
---

# Noticed But Not Touching — <ticket-slug>

- **file:** `path:L<start>-L<end>`
  **noticed:** <desc>
  **why it matters:** <risk/opportunity>
  **suggested follow-up:** <optional>
```
Idempotent overwrite: If the target -noticed.md already exists (same-ticket re-run on the same date), merge the existing entries with the newly collected entries, run the full dedupe (same key), sort, and overwrite the file in one write. No append-mode; the on-disk file is always the full deduped set for that date+ticket.
Stage the -noticed.md file so it lands in the PR commit.

Skip the write entirely if zero entries remain after dedupe — do not produce an empty -noticed.md.

Gate Ledger — Phase 3 Complete

After the last task wave's verification gate passes and all tasks are marked complete — but BEFORE the Phase 3→4 handoff — write Status: COMPLETE and Tasks: N/N complete to the Phase 3 ledger entry. If any task is in a retry/re-dispatch loop, COMPLETE is NOT written until retries resolve.

Phase Handoff: 3 → 4

Before running acceptance tests and code review, verify the gate ledger and write a handoff manifest:

Gate ledger check: Read build-gate-ledger.md and verify Phase 3 Status is COMPLETE. If not, follow Enforcement Rules.

Write the handoff manifest:

Write handoff-3-to-4.md with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 4: HEAD SHA (all tasks committed), design doc path, acceptance test paths (or contract tests), baseline SHA (for git diff scope), task summary (completed count, escalation outcomes)
- Decisions Carried Forward: accumulated decisions from Phases 1-3
- Active Constraints: constraints affecting completion review
- Shed Receipt: per-task review rounds, implementer context, wave verification details → task completion status in task list; per-task review details are shed
Emit shed statement: "Phase 3 context shed. Working code at HEAD, design doc, and acceptance tests on disk. Per-task implementation context, review rounds, and verification details are not carried forward."
Update ## Compression State in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block.
Session index event: Emit a phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 3 -> Phase 4 (Completion)","detail":{"skill":"build","from":"3","to":"4"}}.

Phase 4: Completion

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, the temper, inquisitor, optional siege, quality-gate, forge, cartographer, and finish dispatches in this phase use the mock-dir substitution rule defined in the ## Mock Dispatch Mode (eval-gate) section. Local test-suite execution (pytest, etc.) runs normally — substitution applies only to subagent dispatches.

After all tasks complete:

Write Phase 4 IN_PROGRESS to the gate ledger (after Phase 3 COMPLETE verification).
Feature mode: Run acceptance tests from Phase 1 Step 3 — verify they PASS (GREEN). Refactor mode: Run all contract tests from Phase 1 — verify they PASS (GREEN).
- If any fail: implementation is incomplete. Identify what's missing, dispatch implementer to fix, re-run.
- If all pass: feature is verifiably done. Proceed.
Run full test suite (unit + integration)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-temper" before dispatching code review. If the iterative review fix cycle introduces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:temper on full implementation (iterative until clean)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-inquisitor" before dispatching inquisitor. If the inquisitor's fix cycle produces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:inquisitor on full implementation (dispatches 5 parallel dimensions against full feature diff)
- Input: git diff <base-sha>..HEAD where base-sha is the commit before Phase 3 execution began
- Runs after code review (obvious issues already fixed) and before quality gate (gate reviews final state)
- The inquisitor manages its own fix cycle internally — do not intervene unless it escalates
- See crucible:inquisitor for full process
Conditional: If the inquisitor's fix cycle produced any code changes, re-run crucible:temper scoped to the inquisitor fix commits only (git diff <pre-inquisitor-sha>..HEAD)
- This is NOT a full implementation re-review — scope it to only the fixer's changes
- Iterative until clean, same as step 3
- Skip if the inquisitor reported all PASS (no fixes were needed) 5.5. CONDITIONAL: Security review via crucible:siege

a. Contract check: If a contract YAML exists for this ticket with security_review.status: "required", siege is mandatory — skip to step (d). b. Code scan: If no contract directive (or contract has security_review.status: "recommended" or field absent), scan for siege activation signals:
- Scan targets: design doc content + git diff <base-sha>..HEAD (changed file contents)
- Method: Case-insensitive keyword matching using the 7-category keyword lists from shared/security-signals.md
- Count distinct categories matched (one hit per category is sufficient) c. Threshold evaluation:
- 0 signals: Skip siege silently. No narration needed.
- 1 signal: Log in narration: "1 security signal detected ([category]) — skipping siege. Invoke /siege --force manually if needed." Record in manifest and decision journal: security-review | choice=skip | reason=1 signal ([category]).
- 2+ signals: Proceed to step (d). d. Dispatch siege:
- RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-siege" before dispatching siege. If siege's fix cycle produces regressions, this is the rollback target.
- Dispatch crucible:siege with:
  - Target: design doc + full implementation diff (artifact type: mixed)
  - deployment_context: from contract security_review.deployment_context if present, else unset (siege defaults to public)
- Narration: "Security signals detected: [list categories]. Dispatching siege."
- Decision journal: security-review | choice=dispatch | reason=[N] signals ([categories]) [or contract-required]
- Session index event: Emit to outbox: {"ts":"<now>","seq":0,"type":"security_review","summary":"Siege dispatched: [N] signals detected","detail":{"skill":"build","signals":[categories]}} e. Blocking behavior: Siege iterates internally until zero Critical + zero High.
- If siege completes clean: continue to step 6 (quality-gate)
- If siege escalates (stagnation, user input needed): escalate to user with siege context
- If siege's fix cycle produced code changes: re-run crucible:temper scoped to siege fix commits only (git diff <pre-siege-sha>..HEAD). Same pattern as post-inquisitor conditional review at step 5. f. Escape hatches: User can override automatic siege behavior:
- --force-siege — Dispatch siege regardless of signal count. Maps to siege's --force flag. Decision journal: security-review | choice=force-dispatch | reason=user --force-siege flag
- --skip-siege — Suppress siege even when signals/contract require it. Maps to siege's --skip flag. Decision journal: security-review | choice=force-skip | reason=user --skip-siege flag
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-impl-gate" before dispatching the implementation quality gate. If gate fix rounds degrade the code, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:quality-gate on full implementation (artifact type: "code"). Include in the dispatch context: Phase: code and PipelineID: <current PipelineID>. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.) 6b. Verify verdict marker and write Phase 4 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
RECOMMENDED SUB-SKILL: Use crucible:forge (retrospective mode) — capture what happened vs what was planned 7.5. Chronicle signal fallback: If forge retrospective was skipped (user declined, session ending), append a minimal chronicle signal directly:
- Read the metrics log at /tmp/crucible-metrics-<session-id>.log for duration and subagent counts
- Construct signal: v=1, ts=now, skill="build", outcome from acceptance test results, duration_m from metrics log, branch from git, files_touched from git diff <base-sha>..HEAD --name-only, metrics={mode, tasks count, tasks_passed count from task list, stagnation=false}
- Append as a single JSON line to ~/.claude/projects/<hash>/memory/chronicle/signals.jsonl
- If forge retrospective DID run, skip this step (forge Step 8.5 already emitted the signal)
RECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (record mode) — persist any new codebase knowledge discovered during build
Compile summary: what was built, acceptance tests passing, review findings addressed, inquisitor findings, concerns
Report to user 10.5. Session index event: Emit a skill_end event to the outbox: {"ts":"<now>","seq":0,"type":"skill_end","summary":"/build complete: <outcome summary>","detail":{"skill":"build","outcome":"success|failure|escalated"}}.
REQUIRED SUB-SKILL: Use crucible:finish — skip finish's Step 2.5 (test-coverage) since test-coverage ran per-task in Phase 3, and skip finish's Step 3 (red-team) since quality-gate already ran at step 6. Tell finish to skip both.
Delete pipeline-active marker: Remove <scratch>/.pipeline-active. This signals that the pipeline completed successfully. If deletion fails (permissions, missing file), log a warning but do not fail the pipeline.

Session Metrics

Throughout the pipeline, the orchestrator appends timestamped entries to /tmp/crucible-metrics-<session-id>.log on each subagent dispatch and completion.

Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from shared/dispatch-convention.md:

Before dispatching: Measure the dispatch file size in characters. Record input_chars and model_tier in the manifest entry.
After dispatch returns: Measure the subagent response length in characters. Record output_chars and tool_calls (if available) in the manifest completion entry.

At completion (before reporting to user, i.e. step 9), read the metrics log and manifest, then compute:

-- Pipeline Complete ----------------------------------------
  Subagents dispatched:  23 (14 Opus, 7 Sonnet, 2 Haiku)
  Active work time:      2h 47m
  Wall clock time:       11h 13m
  Quality gate rounds:   4 (design: 2, plan: 1, impl: 1)
  Siege:                 dispatched (3 agents, 2 rounds, 0 Critical, 0 High) | skipped (0 signals) | skipped (1 signal: auth)
  Task tiers:           3 Tier 1, 3 Tier 2, 2 Tier 3
  Subagent savings:     ~21 dispatches skipped vs all-Tier-3
  Est. input tokens:    ~32,100 (128,400 chars)
  Est. output tokens:   ~20,500 (82,000 chars)
  Token estimate note:  Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%.
-------------------------------------------------------------

Metrics tracked:

Total subagents dispatched (by type and model tier: Opus/Sonnet/Haiku)
Active work time (merge overlapping parallel intervals — NOT naive sum)
Wall clock time (first dispatch to final completion)
Quality gate rounds (per gate: design, plan, implementation)
Siege status (dispatched with agent count, rounds, and final severity counts — or skipped with signal count and reason)
Estimated input tokens (sum of input_chars from manifest / 4)
Estimated output tokens (sum of output_chars from manifest / 4)

Efficiency summary computation: Read manifest.jsonl from the dispatch directory. Sum input_chars and output_chars across all completed entries (skip nulls). Divide each by 4 for token estimates. Count dispatches grouped by model_tier. Include these in the pipeline completion report alongside existing metrics.

Gate tracking verification: Before compiling the pipeline summary (Phase 4 Step 9), verify that all three gate categories (design, plan, implementation) show round count >= 1 with clean final rounds (0 Fatal, 0 Significant). If any gate was skipped with explicit user approval, record it as USER_SKIP in the metrics. A zero without user approval indicates a gate was dropped — report this in the summary.

Pipeline Decision Journal

Alongside the metrics log, maintain a decision journal at /tmp/crucible-decisions-<session-id>.log. Append a structured entry for every non-trivial routing decision:

[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>

Decision types to capture:

reviewer-model — why Opus vs Sonnet for this reviewer
review-tier -- tier assignment read from plan, runtime escalation reason if applicable
gate-round — issue count, severity shifts, progress/stagnation per round
escalation — why the orchestrator escalated to user (and user's decision)
task-grouping — parallelism decisions for wave execution
cleanup-removal — what de-sloppify removed and accept/reject decision

Escalation Triggers (Any Phase)

STOP and ask the user when:

Architectural concerns in plan or code review
Review loop stagnation (same or more issues after fixes — any phase)
Test suite failures not obviously fixable
Multiple teammates fail on different tasks
Teammate reports context pressure at 50%+ with significant work remaining
When escalating for regression or stagnation AND a checkpoint exists for the current phase boundary: include "A checkpoint from [reason] is available. Restore to pre-regression state?" in the escalation message.

Minor issues: Log, work around, include in final report.

What the Lead Should NOT Do

Implement code (dispatch implementers)
Read large files (spawn Haiku researcher)
Debug failing tests (dispatch implementer)
Make architectural decisions (escalate to user)

Context Management

One task per agent — always spawn a fresh implementer for each task. Never send a second task to a running agent via SendMessage. Reusing agents accumulates context and causes exhaustion.
"2-3 per subagent, ~10 files max" refers to plan design — group small steps into one task at planning time, not sequential dispatch to a running agent
Lead stays thin — coordination only
All important state on disk (plan files, task list)
Teammates report at 50%+ context usage
Lead compaction acceptable — task list is source of truth
Agent teams unavailable: If agent teams are not enabled, the lead dispatches tasks sequentially via Agent tool. Task tracking still uses TaskCreate/TaskUpdate. The pipeline is slower but functionally identical.

Prompt Templates

./acceptance-test-writer-prompt.md — Phase 1 acceptance test generation
./prd-writer-prompt.md — Phase 1 PRD generation from design doc
./plan-writer-prompt.md — Phase 2 plan writer dispatch
./plan-reviewer-prompt.md — Phase 2 plan reviewer dispatch
./build-implementer-prompt.md — Phase 3 implementer dispatch
./build-reviewer-prompt.md — Phase 3 reviewer dispatch
./cleanup-prompt.md — Phase 3 de-sloppify cleanup dispatch
./test-gap-writer-prompt.md — Phase 3 test gap writer dispatch
./architecture-reviewer-prompt.md — Mid-plan checkpoint
./contract-test-writer-prompt.md — Phase 1 refactor-mode contract test generation
./refactor-implementer-addendum.md — Phase 3 refactor-mode implementer addendum (appended to build-implementer-prompt)

Red-team, innovate, adversarial tester, and inquisitor prompts live in their respective skills:

crucible:red-team — skills/red-team/red-team-prompt.md
crucible:innovate — skills/innovate/innovate-prompt.md
crucible:adversarial-tester — skills/adversarial-tester/break-it-prompt.md
crucible:inquisitor — skills/inquisitor/inquisitor-prompt.md

Quality Gate Orchestration

Build is the outermost orchestrator and controls all quality gates via crucible:quality-gate. Quality gate wraps crucible:red-team internally — do NOT invoke red-team separately at these points.

Gate points in the pipeline:

| Pipeline Stage | Artifact Type | Replaces | |---------------|---------------|----------| | Phase 1, Step 2 (after design) | design | Existing crucible:red-team on design | | Phase 2, Step 3 (after plan review) | plan | Existing crucible:red-team on plan | | Phase 4, Step 6 (after inquisitor + conditional re-review) | code | Existing crucible:red-team on implementation |

Code review (crucible:temper) and inquisitor (crucible:inquisitor) remain separate from the quality gate — temper does structured quality checks, inquisitor writes cross-component adversarial tests, and the quality gate does adversarial artifact review. All three serve distinct purposes.

Contract-Aware Quality Gate

When a contract YAML exists for the current ticket, the quality gate adds contract verification to its checks. This applies at all gate points (design, plan, and code), though most contract checks are only meaningful at the code gate (Phase 4, Step 6).

Version check: Before processing a contract, verify the version field is "1.0". If the version is missing or unrecognized, reject the contract with a clear error: "Contract version [X] is not supported. Expected version 1.0." Do not proceed with contract-aware checks — fall back to standard quality gate behavior without contract awareness.
Checkable invariant verification: For each checkable invariant in the contract, verify satisfaction using the declared check_method:
- grep — pattern match (or absence) in production code. Run the grep and confirm the result matches the invariant's verification description.
- code-inspection — read and reason about the relevant code to confirm the invariant holds (e.g., idempotency, no side effects).
- file-structure — check that file existence, location, or organization matches the constraint.
Testable invariant verification: For each testable invariant in the contract:
- Verify that a test tagged with the declared test_tag (pattern: contract:<category>:<id>) exists in the test suite.
- Verify that the tagged test passes when run.
- A missing or failing tagged test is a contract violation.
Contract violations are blocking issues. Contract violations are NOT warnings — they have the same severity as architectural concerns and must be resolved before the gate passes. The quality gate's iterative fix loop applies: dispatch fixes, re-check, track progress/stagnation as normal.

Red Flags

Skipping Compression State Block emission at checkpoint boundaries
Emitting a Compression State Block at a phase boundary (1→2, 2→3, 3→4) instead of writing a handoff manifest
Skipping the shed statement after a manifest write
Emitting a Compression State Block with stale or missing Key Decisions (decisions must be cumulative across all prior blocks)
Allowing the Goal field to drift across successive Compression State Blocks (must match original user request)
Exceeding 10 entries in the Key Decisions list without overflow-compressing the oldest
Skipping a REQUIRED quality gate because the task seems "small", "simple", or "trivial"
Self-assessing that a quality gate is unnecessary based on perceived task complexity
Rationalizing that quality-gate findings would be "minor" as justification to skip
Declaring a quality gate "done" after fixing findings without a clean verification round (fixing is not passing)
Short-circuiting the quality-gate iteration loop by assuming fixes are self-evidently correct
Interpreting general user feedback as approval to skip a quality gate that has not yet run — once a gate has run and presented findings to the user, the user's decision to proceed is authoritative. Pre-gate skip approval must be an unambiguous instruction specifically referencing the gate.
Treating session index summary as authoritative over CSB state (session index is supplementary narrative, CSB is authoritative state)

Integration

Required sub-skills:

crucible:design — Phase 1
crucible:finish — Phase 4
crucible:quality-gate — Iterative red-teaming at each quality gate point
crucible:red-team — Adversarial review engine (invoked by quality-gate)
crucible:innovate — Creative enhancement before quality gates
crucible:inquisitor — Full-feature cross-component adversarial testing (Phase 4, after temper, before quality-gate)

Recommended sub-skills:

crucible:forge — Feed-forward at Phase 1 start, retrospective at Phase 4 completion
crucible:cartographer-skill — Consult at Phase 1 start, load at Phase 3 dispatches, record at Phase 4
crucible:checkpoint — Shadow git checkpoints at pipeline boundaries (pre-design-gate, pre-plan-gate, pre-wave-N, pre-cleanup-task-N, pre-temper, pre-inquisitor, pre-impl-gate)

Recon/assay context: Inherits recon/assay context through /design (Phase 1). No direct dispatch. When design integrates recon, build benefits automatically. See #147 for rationale.

Phase 3 sub-skills (dispatched per-task):

crucible:test-coverage — Test alignment audit after each task's test quality review (staleness, dead tests, coincidence tests)

Implementer sub-skills:

crucible:test-driven-development — TDD within each task
crucible:source-driven-development — Detect → Fetch → Implement → Cite loop for non-trivial external API usage (≥ 5 LOC touching a detected framework); invoked by the implementer prompt's Source Consultation block. Recommended — skipped for pure internal refactors or trivial edits.

Contract consumption:

crucible:spec — Consumes contract YAML files produced by /spec (schema version 1.0). Contracts are read from docs/plans/*-contract.yaml and feed into pre-existing doc detection (Phase 1 Step 0), implementer dispatch (Phase 3), reviewer checks (Phase 3), and quality gate verification (all gate points). See crucible:spec/contract-schema.md for field definitions.

Build

Overview

All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.

End-to-end development pipeline: interactive design, autonomous planning with adversarial review, team-based execution with per-task code and test review. One command, idea to completion.

Announce at start: "I'm using the build skill to run the full development pipeline."

Mock Dispatch Mode (eval-gate)

Env-var contract. Three variables, all consumed only when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set:

CRUCIBLE_BUILD_EVAL_MOCK_DIR=<path> — directory of canned subagent return receipts. Filenames follow <seq>-<template-name>.md with fallback <template-name>.md (e.g. 1-plan-writer.md, then plan-writer.md). Missing mock → halt immediately with a clear error; no silent fallthrough.
CRUCIBLE_BUILD_EVAL_MODE=feature|refactor — pre-set answer to the Mode Detection prompt. When present, the orchestrator skips the AskUserQuestion call in Mode Detection and uses this value.
CRUCIBLE_BUILD_EVAL_USER_INPUT_DIR=<path> — directory of canned user-input turns, named turn-<N>.md. Each AskUserQuestion call (other than Mode Detection, which uses CRUCIBLE_BUILD_EVAL_MODE) consumes the next sequential turn. If the next turn-file is missing, halt before proceeding — this is the b4 fixture's design: build correctly stops when it needs input it does not have.

See also: skills/build/evals/README.md for harness usage; skills/shared/dispatch-convention.md for the dispatch-file protocol the substitution rule preserves.

Cairn (Layer 3)

The orchestrator maintains an Invariant Cairn per shared/cairn-convention.md. Build-specific bindings:

Phase mapping. The build pipeline's four phases (1 Design, 2 Plan, 3 Execute, 4 Completion) map 1:1 to cairn phases. Mid-phase sub-stages (e.g. Phase 3 Wave N, Phase 4 gate rounds) do NOT get their own cairn phase counter — they are internal to the owning phase and contribute a single LEDGER line when the owning phase completes.
Phase transitions. At every 1→2, 2→3, 3→4 transition, the orchestrator: (a) writes any correctness-critical phase-N invariants, (b) appends the phase-N LEDGER line with dispatches=/receipts=/verdict=, (c) single atomic Write advancing PHASE to phase N+1. Uses the phase handoff manifest (handoff-N-to-M.md) as input evidence for the invariants.
Terminal phase. Phase 4 sealing — after finish-skill completes and the pipeline-active marker is deleted. At terminal sealing, delete active-run.md; leave cairn-<run-id>.md in place.
Mandatory-invariant categories for build. Each phase-exit MUST capture:
- Design exit: the one-sentence architectural commitment, plus any RED-flag constraint surfaced by red-team that later phases must preserve.
- Plan exit: the task list's load-bearing dependencies (e.g. "Task 3 unblocks Tasks 5-7; T2 review tier"); any non-obvious refactoring risk.
- Execute exit: every noticed-not-touching entry that is correctness-critical for a future task or for post-merge review; every test-gap finding that the run chose to leave uncovered.
- Completion exit: acceptance-test outcome; siege dispatch decision + outcome; any skipped gate with Acknowledged: true.
Reconciliation on phase entry. Runs the full Reconciliation Pass (5 rules) against receipt-ledger.jsonl and the in-context Tripwire Manifest. Rule 1 local-repair is authorized for trailing-receipts-in-current-phase LEDGER under-count only.
Composition with Phase Handoff Manifest. The cairn and the existing Phase Handoff Manifest overlap in intent but not in scope: the handoff manifest is a per-transition snapshot of inputs for the next phase; the cairn is the cumulative load-bearing state across the whole run. Both are maintained; neither replaces the other. On Recovery Protocol invocation, the orchestrator reads the cairn first (authoritative for load-bearing state) then the most recent handoff manifest (authoritative for current-phase inputs).

Tripwire Manifest Sweep (Layer 2)

Manifest: After each Task return (post-lint), append one line to the in-context manifest:

<rcpt-sha256-prefix-12>  <skill>/<dispatch-id>  <verdict>  TRIPWIRE: <predicates>  [SUPERSEDED_BY=<prefix>]  [keys=<skill>:<k>:<v>,…]  [files=<path>:<h6>,…]

Sweep (the dispatch-loop clause): The orchestrator MAY NOT dispatch the next subagent until it has:

Applied Layer 1 two-tier linter to the just-returned receipt. Lint failure → re-dispatch, DO NOT sweep.
Appended the manifest entry.
Processed SUPERSEDES: — marked each cited predecessor SUPERSEDED_BY=<new-prefix>.
Evaluated self-checks (verdict=FAIL, exec-exit!=0, suspicion>=N-self) on the new receipt — no Read needed.
Evaluated forward-checks against every active (not SUPERSEDED_BY=*) prior manifest entry, over the union of that entry's TRIPWIRE and TRIPWIRE-CHILD predicate sets:
- claims-touch(glob) / wrote(glob) / read(glob) — path-glob match against the new receipt's TRACE or CLAIMS citations.
- suspicion>=N — new receipt's SUSPICION ≥ N.
- peer-dispatch-disagrees(<dim>) — same-skill, same-target, discriminator mismatch (evaluated via manifest keys=/files=; more= overflow → mandatory fire).
- always — fires unconditionally.
For each firing predicate on manifest entry M, Read M's full receipt from disk and narrate the re-read: "tripwire <predicate> on <M-prefix> fired from <new-prefix>; re-read M."
Only then dispatch the next subagent.

Mandatory-work declarations for build's subagent types (add to each dispatch template's ## Return Format section):

Implementer (feature): run-tests, apply-edits.
Implementer (refactor, atomic): run-blast-radius-tests, apply-edits.
Reviewer (code / test): read-artifact, emit-findings.
Cleanup agent: read-diff, emit-recommendation.
Plan writer / plan reviewer: read-design, emit-artifact.
Acceptance-test writer / test-gap writer / adversarial tester: run-tests, emit-tests.

Communication Requirement (Non-Negotiable)

Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.

Every status update must include:

Current phase — Which pipeline phase you're in
What just completed — What the last agent reported
What's being dispatched next — What you're about to do and why
Task checklist — Current status of all tasks (pending/in-progress/complete)

After compaction: If you just experienced context compaction, re-read the task list from disk and output current status before continuing. Do NOT proceed silently.

Examples of GOOD narration:

"Phase 3, Task 4 complete. Reviewer found 2 Important issues — dispatching implementer to fix. Tasks: [1] ✓ [2] ✓ [3] ✓ [4] fixing [5-8] pending"

"Phase 2 complete. Plan passed review with 0 issues on round 2. Dispatching innovate on the plan."

Pipeline Discipline (Non-Negotiable)

NEVER skip quality gate steps. Every artifact must pass its quality gate before proceeding to the next phase. No exceptions, no shortcuts.

If you find yourself about to skip a gate: STOP. Re-read this section. The gate exists because skipping it has caused real production incidents and hours of wasted time. Run the gate.

Anti-Rationalization Table — build

Gate Ledger Protocol

File location: ~/.claude/projects/<project-hash>/memory/build-gate-ledger.md

PipelineID Generation

At pipeline start, generate a PipelineID via date -u +build-%Y%m%d-%H%M%S. This ID:

Is persisted in the ledger header
Is passed to quality-gate invocations as pipeline_id
Is used by the enforcement hook to cross-check verdict markers
Is unique per build run (timestamp-based)

Ledger Format

# Build Gate Ledger
Run: <ISO-8601 timestamp>
PipelineID: <build-YYYYMMDD-HHMMSS>
Goal: <user request>
Mode: <feature | refactor>

## Phase 1: Design
Status: NOT_STARTED

## Phase 2: Plan
Status: NOT_STARTED

## Phase 3: Execute
Status: NOT_STARTED

## Phase 4: Completion
Status: NOT_STARTED

Format constraints:

One key-value pair per line: Key: value
Fixed key names: Status, Gate, Artifact, Tasks, Reason, Acknowledged, PipelineID
Status values: NOT_STARTED, IN_PROGRESS, PASS, COMPLETE, FAIL, SKIPPED, INFERRED
Phase headers are ## Phase N: Name — always 4 phases, always in order
No prose, no paragraphs, no nested structure

Ledger Initialization

Runs during build startup, after mode detection but before Phase 1 begins:

Check for existing ledger at canonical path
If found: run Run Isolation checks (see below)
If not found (or user chose "start fresh"): write new ledger including Run, PipelineID, Goal, and Mode header fields, then all four phases with Status: NOT_STARTED
The ledger MUST exist before Phase 1 transitions to IN_PROGRESS

Run Isolation

Stale detection prevents cross-run contamination:

Compaction recovery (same run): If pipeline-status.md Started timestamp matches the ledger's Run timestamp, this is the same build run recovering from compaction. Auto-resume without prompting.
New session with existing ledger: If the ledger exists but pipeline-status.md is missing or its Started timestamp doesn't match the ledger's Run, prompt: "Found existing ledger for '[goal]' (started [timestamp], Phase N [status]). Resume this run? [y/n]". On "no", archive the old ledger via Bash mv to build-gate-ledger-<old-timestamp>.md. If the target filename already exists, append a counter suffix (-2, -3, etc.).
No existing ledger: Create fresh.

Orphan Cleanup

Requires: Active PipelineID established (from Ledger Initialization + Run Isolation). This step runs AFTER the resume/fresh decision is resolved.

Timestamps and File Operations

Timestamps: Obtained via Bash date -u +%Y-%m-%dT%H:%M:%S (Bash is allowed for date commands that don't reference .claude/ paths)
Ledger archival (rename): Uses Bash mv since Write/Read/Edit/Glob have no rename capability
All other ledger operations (create, read, update): MUST use Write and Read tools, NOT Bash. This is a hard constraint due to .claude/ path restrictions.

Enforcement Rules

Before each phase transition, read build-gate-ledger.md and check the previous phase's status:

Gate check: If the previous phase's Status is NOT in {PASS, COMPLETE (Phase 3 only), SKIPPED with Acknowledged: true}, output:
```
PHASE GATE BLOCKED: Cannot start Phase N — Phase N-1 gate has not passed.
Current state: [status]
Run the quality gate on Phase N-1's artifact before proceeding.
```
This means INFERRED, IN_PROGRESS, FAIL, and NOT_STARTED all trigger BLOCKED.
Phase 1 exception: Phase 1 (Design) has no predecessor gate — it always starts.
Phase 3 exception: Phase 3 transitions to COMPLETE (not PASS) when all tasks are done and per-task code reviews pass. COMPLETE satisfies the gate requirement for Phase 4. No verdict marker is required for Phase 3.

Verdict Marker Verification

After quality-gate returns with a verdict, verify the verdict marker before writing to the ledger:

Glob for verdict markers: ~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md
Filter by PipelineID match — only markers with the current build's PipelineID
Sort by the Timestamp field value inside the marker file (parsed as ISO-8601), take the most recent
Verify: marker exists, Verdict is PASS, PipelineID matches current build's PipelineID
If verification passes: write PASS to the ledger with Gate timestamp and Artifact path
If verification fails:
- Normal flow (marker missing/mismatched after a just-run gate): do NOT write PASS. Output warning and re-invoke quality-gate on the same artifact.
- INFERRED recovery (PipelineID mismatch or missing marker on an INFERRED phase): prompt the user for the artifact path, then offer to run the gate or type SKIP GATE.
After writing the ledger entry, delete the verdict marker (it has served its purpose). This applies to all verdict outcomes — PASS, FAIL, STAGNATION, and ESCALATED markers are all deleted after the corresponding ledger entry is written. [PLAN ADDITION — extends the design doc's PASS-only deletion to all verdict outcomes for cleanliness.]

Skip Escape Hatch

If the user explicitly wants to bypass a gate:

Example of a SKIPPED phase in the ledger:

## Phase 2: Plan
Status: SKIPPED
Gate: 2026-04-13T15:00:00
Reason: User requested skip
Acknowledged: true

Confirmation protocol: [Default: option (a) — separate-turn required, matching the design doc's two-step flow. User may override to option (b) before implementation.]

The orchestrator outputs: "Gate skip requested. Type SKIP GATE to confirm. This will be logged."
The orchestrator halts execution and waits. The user's NEXT message must contain exactly SKIP GATE. A SKIP GATE token in the same message as the skip request does NOT satisfy the confirmation requirement.
The orchestrator writes Status: SKIPPED with Reason field to the ledger.

Phase 4 completion warning: If ANY prior phase has Status: SKIPPED, Phase 4 outputs a prominent warning listing all skipped gates before presenting finish options.

State Machine

Phase 1: Design
  NOT_STARTED → IN_PROGRESS (design skill starts)
  IN_PROGRESS → PASS (quality gate verdict marker verified)
  IN_PROGRESS → FAIL (quality gate escalates — stagnation/regression)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE — does NOT unlock next phase without acknowledgment)
  SKIPPED → IN_PROGRESS (user asks to run the gate retroactively)
  INFERRED → IN_PROGRESS (user runs gate after compaction recovery)
  INFERRED → SKIPPED (user types SKIP GATE after compaction recovery)

Phase 2: Plan
  NOT_STARTED → IN_PROGRESS (requires Phase 1 Status = PASS or SKIPPED+Acknowledged)
  [same transitions as Phase 1]

Phase 3: Execute (no quality gate — uses COMPLETE instead of PASS)
  NOT_STARTED → IN_PROGRESS (requires Phase 2 Status = PASS or SKIPPED+Acknowledged)
  IN_PROGRESS → COMPLETE (all tasks done, per-task reviews passed, verification gates green)
  IN_PROGRESS → FAIL (task failures, user escalation)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE)
  SKIPPED → IN_PROGRESS (user asks to run retroactively)
  Note: Phase 3 has no QG invocation. COMPLETE satisfies Phase 4's gate requirement.

Phase 4: Completion
  NOT_STARTED → IN_PROGRESS (requires Phase 3 Status = COMPLETE or SKIPPED+Acknowledged. PASS is unreachable for Phase 3.)
  IN_PROGRESS → PASS (quality gate verdict marker verified)
  IN_PROGRESS → FAIL (quality gate escalates)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE)
  SKIPPED → IN_PROGRESS (user asks to run retroactively)
  IN_PROGRESS includes: emit skip warnings if any prior phase SKIPPED

Compaction Recovery (Ledger)

build-gate-ledger.md is on disk and survives compaction. Recovery precedence when state is partial:

Ledger exists, handoff manifest missing: Use ledger to determine which phase to resume from. Prompt: "Gate ledger shows Phase N passed, but the phase handoff context was lost. Confirm resume from Phase N+1?" If PASS but no handoff, also prompt for Phase N inputs (design doc path, plan path, etc.) before proceeding.
Handoff manifest exists, ledger missing: Reconstruct ledger from manifests. Mark the current phase as INFERRED (not PASS). Mark predecessor phases as PASS (handoff existence proves the boundary was crossed). Generate a new PipelineID and write it to the reconstructed ledger header. After writing, re-read the ledger header to extract the PipelineID into active state. INFERRED phases trigger the gate-blocked check — the orchestrator must run a fresh quality gate (with matching PipelineID) or the user must type SKIP GATE.
Both missing: Fresh start. Prompt user.

Quality Gate Requirement (Non-Negotiable)

Every quality gate in this pipeline MUST run to completion. This is NOT optional — you may NOT self-assess whether a quality gate is "needed" based on task size, complexity, or scope.

Quality gates are unconditional at all three gate points:

Phase 1, Step 2 — Design doc gate
Phase 2, Step 3 — Plan gate
Phase 4, Step 6 — Implementation gate

Common rationalizations that are NEVER valid reasons to skip:

"This is a small change"
"This is trivial / simple / straightforward"
"This is just a config change / documentation update / one-liner"
"The quality gate won't find anything on something this simple"
"I fixed the findings, so the gate is done" — fixing findings is NOT the same as passing the gate. The iteration loop must complete with a clean verification round (0 Fatal, 0 Significant on a fresh review). Fix agents introduce new issues or incompletely resolve old ones — that is why fresh-eyes re-review exists.

Pipeline Status

Write Triggers

Status File Format

The status file uses this structure (overwritten in full each time):

# Pipeline Status
**Updated:** <current timestamp>
**Started:** <timestamp from first write — persisted across compaction>
**Skill:** build
**Phase:** <current phase, e.g. "3 — Execute (Autonomous)">
**Health:** <GREEN|YELLOW|RED>
**Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED>
**Elapsed:** <computed from Started>

## Recent Events
- [HH:MM] <most recent event>
- [HH:MM] <previous event>
(last 5 events, newest first)

Skill-Specific Body

Append after the shared header:

## Task Progress
| # | Task | Tier | Status | Duration |
|---|------|------|--------|----------|
| 1 | Auth middleware | T3 | DONE | 12m |
| 2 | Route handlers | T2 | IN REVIEW (code, pass 1) | 18m+ |
| 3 | Database layer | T1 | PENDING | — |

## Quality Gates
- Design: PASSED (2 rounds)
- Plan: PASSED (1 round)
- Task tiers: 1x T1, 1x T2, 1x T3
- Code: not yet reached

## Checkpoints
- Last checkpoint: pre-wave-3 (12:45:30)
- Total checkpoints: 7
- Shadow repo: healthy

## Compression State
Goal: [original user request]
Key Decisions:
- [accumulated decisions, max 10]
Active Constraints:
- [constraints affecting remaining work]
Next Steps:
1. [immediate next action]
2. [subsequent actions]

Health State Machine

Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.

Phase boundaries (reset to GREEN): Phase 1->2, 2->3, 3->4
YELLOW: review loop round 3+, quality gate round 5+, retry in progress
RED: escalation pending, stagnation detected, test suite failure unresolved

When health is YELLOW or RED, include **Suggested Action:** with a concrete, context-specific sentence (e.g., "Code review looping on Task 4. Check recent events for recurring patterns.").

Inline CLI Format

Output concise inline status alongside the status file write:

Minor transitions (dispatch, completion): one-liner, e.g. Phase 3 [4/8] Task 4 IN REVIEW (pass 1) | GREEN | 1h 12m
Phase changes and escalations: expanded block with --- separators
Health transitions: always expanded with old -> new health

Compaction Recovery

Read the rest of pipeline-status.md to recover Started timestamp and Recent Events buffer
Reconstruct phase, health, and skill-specific body from internal state files
If crucible:checkpoint was used: verify checkpoint availability by checking for the shadow repo at the computed path. Log available checkpoint count. Do not restore — just confirm checkpoints are recoverable.
Emit a Compression State Block into the conversation to seed the new context window with recovered state 4.5. Read session index summary (supplementary): If the CSB Scratch State contains a Session Index: path, or if globbing ~/.claude/projects/<hash>/memory/session-index/*/summary.md finds a recent file, read summary.md. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. If summary.md lacks detail for a specific event type (e.g., errors, decisions, file changes), use /recall to query events.jsonl with filters for targeted recovery.
Write the updated status file
Output inline status to CLI

Compression State Block

===COMPRESSION_STATE===
Goal: [original user request, one sentence]
Skill: [skill name]
Phase: [current phase identifier]
Health: [GREEN|YELLOW|RED]
Mode: [skill-specific mode if applicable, omit otherwise]

Progress:
- [completed milestone 1]
- [completed milestone 2]
- [current work in progress]

Key Decisions (this session):
- [DEC-1] [decision]: [reasoning, one line]
- [DEC-2] [decision]: [reasoning, one line]

Active Constraints:
- [constraint that affects remaining work]
- [constraint from prior phase that still applies]

Files Modified:
- [file path]: [what changed, one line]

Scratch State:
- Location: [scratch directory path]
- Session Index: [~/.claude/projects/<hash>/memory/session-index/<session-id>/ if active, omit if not]
- Recovery: [which files to read first, in order]

Next Steps:
1. [immediate next action]
2. [action after that]
3. [remaining work summary]
===END_COMPRESSION_STATE===

Rules:

Key Decisions list is capped at 10. When adding an 11th, compress the oldest low-impact decision into a single-line Progress entry annotated "[compressed from decisions]".
Each Compression State Block includes the FULL accumulated decision list, not just new decisions since the last block. Decisions accumulate across compressions.
Progress entries are cumulative — include all completed milestones, not just since the last block.
Files Modified lists only files changed since the last block emission. On first block of a session, list all files changed so far.
Goal must be the original user request verbatim or a faithful one-sentence paraphrase. Do not let it drift across compressions.

Checkpoint Timing

Emit a Compression State Block into the conversation AND update the ## Compression State section in pipeline-status.md at these points:

Phase transitions: 1→2, 2→3, 3→4 — emit a Phase Handoff Manifest (see below) instead of a Compression State Block at these points
Phase 3 progress: After every 3 task completions
Quality gate entry/exit: Before first quality gate round dispatch and after gate completes (pass or escalation)
Escalations: Before any escalation to user
Health transitions: On any GREEN->YELLOW or YELLOW->RED transition

These triggers are a superset of the existing pipeline-status.md write triggers. The Compression State Block is emitted alongside (not instead of) the normal narration and status file write.

Phase Handoff Manifest

Format:

# Phase Handoff: N → M
**Timestamp:** ISO-8601
**Goal:** [original user request, verbatim]
**Mode:** feature | refactor

## Inputs for Phase M
- **[Input name]:** [disk path or inline value]

## Decisions Carried Forward
- [DEC-N] [decision]: [reasoning, one line]

## Active Constraints
- [constraint affecting remaining work]

## Shed Receipt
- [what was shed] → [where it lives on disk]

Rules:

After writing the manifest, emit an explicit shed statement: list what context is no longer needed, where it lives on disk, and that the orchestrator operates from manifest inputs only going forward.
After writing the manifest, update the ## Compression State section in pipeline-status.md with the manifest contents (Goal, Decisions, Constraints, and the Inputs as Next Steps). This ensures compaction recovery can reconstruct state even if the manifest is lost.
CSBs continue at all non-boundary checkpoint triggers (intra-phase progress, quality gate entry/exit, escalations, health transitions).
Backward compatibility: If a handoff manifest does not exist at a recovery point, fall back to CSB-based recovery (existing behavior).

Mode Detection

Before dispatching the design skill, determine whether this build is:

Feature mode (default) — adding new capability. Success = new acceptance tests pass.
Refactor mode — restructuring existing code. Success = existing behavior preserved + structural goals met.

Detection: If the user's intent is ambiguous, ask directly before proceeding:

"Is this adding new behavior, or restructuring existing code without changing what it does?"

The user's answer sets the mode for the entire pipeline. No special syntax needed.

Eval-gate pointer (Mock Dispatch Mode): if CRUCIBLE_BUILD_EVAL_MODE is set, use its value (feature or refactor) as the mode-detection answer and skip the AskUserQuestion call. The substitution rule lives in the ## Mock Dispatch Mode (eval-gate) section near the top of this file.

Mode Propagation

Propagate refactor mode to subagents through:

New refactor-specific prompt templates — contract-test-writer-prompt.md and refactor-implementer-addendum.md are standalone files used only in refactor mode. Select these instead of (or in addition to) the feature-mode equivalents.
Appended context blocks — For existing prompts that serve both modes (plan-writer-prompt.md, build-implementer-prompt.md), append a "Refactor Mode Context" section when composing the dispatch file. The templates remain flat markdown — the orchestrator decides what to include.
Scratch file for compaction recovery — Persist the current mode in /tmp/crucible-build-mode.md containing mode: refactor or mode: feature plus the baseline commit SHA. Only one build runs per session, so a well-known filename is sufficient.

Compaction Recovery

Read ## Compression State from pipeline-status.md — recover goal, decisions, constraints, next steps. 0.5. Check for handoff manifests (handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs and Mode to bootstrap recovery — this supersedes the mode file for phase-boundary state.
Read /tmp/crucible-build-mode.md — recover mode and baseline commit SHA.
If file is missing: Default to feature mode and warn.
If mode is refactor: Verify baseline commit SHA exists.
Read build-gate-ledger.md — if it exists, apply Gate Ledger Compaction Recovery (see Compaction Recovery subsection under Gate Ledger Protocol). Use the ledger's phase statuses to determine the resume point. If the ledger is missing but handoff manifests exist, reconstruct with INFERRED status.
After mode and ledger are recovered: Proceed with general state reconstruction (task list, phase, health).

Phase 1: Design (Interactive)

Step -1: Resume Detection and Pipeline-Active Marker

Before any design or dispatch work, check for a crashed prior pipeline:

Check <scratch>/.pipeline-active (where <scratch> is ~/.claude/projects/<hash>/memory/)
Not found: Write the pipeline-active marker (JSON with pipeline_id set to current session ID, skill set to "build", phase set to "1", start_time set to current ISO-8601 timestamp, scratch_dir set to the scratch directory path, dispatch_dir set to the dispatch directory path, branch from git branch --show-current, baseline_sha from git rev-parse HEAD). Proceed to Step 0.
Found, same pipeline_id as current session: This is a compaction recovery scenario. Follow existing compaction recovery procedures. Do not re-write the marker.
Found, different pipeline_id: a. Branch guard: Compare marker's branch field against current git branch --show-current. If they differ, warn: "Previous build on branch [marker.branch] crashed at Phase [phase]. You are currently on [current-branch]. Switch to [marker.branch] before resuming? [switch+resume / start fresh / abort]". Do NOT offer resume on the wrong branch. b. Read manifest.jsonl from the marker's dispatch_dir (or from the scratch directory copy if /tmp was lost) c. Identify the last successful phase boundary by scanning manifest entries grouped by phase. A phase boundary is verified when all dispatches in that phase have status: "completed". d. Present resume option to the user:

"Previous build on branch [marker.branch] crashed at Phase [N], [context]. Resume from [last good boundary] ([checkpoint reason], [estimated time preserved] of work preserved)? [yes / no / fresh]" e. User accepts: Invoke crucible:replay in resume mode, passing the scratch directory path. The replay skill handles checkpoint restore, state reconstruction, and re-dispatch. The build pipeline does not continue -- replay takes over. f. User declines (fresh): Delete the stale .pipeline-active marker. Write a fresh marker with the current session. Proceed to Step 0 as a new pipeline run.

Marker updates during pipeline: Update the phase field in .pipeline-active at each phase boundary (1->2, 2->3, 3->4) to track progress for crash detection.

Marker cleanup: Delete .pipeline-active at Phase 4 step 12 (after finish skill completes).

Compass Arc Emit (build orchestrator only — D14):

RESUME_DECISION is set by Step -1 to one of fresh / resume / fresh-restart. Default fresh if unset.

if [ "${RESUME_DECISION:-fresh}" != "resume" ]; then
  python scripts/compass.py update --field current_arc --value "#<ticket>: <user-goal-one-liner>" \
    || echo '[compass] emit failed at arc start; continuing build' >&2
fi

Error policy (best-effort): Compass is an optimization, not a correctness layer. A failed emit MUST NOT fail the build pipeline — log to stderr and continue. Never tighten this error handling.

D14 invariant: Sub-agents spawned inside build do NOT emit compass updates. This emit fires from the build orchestrator only, exactly once per fresh pipeline start.

Step 0: Pre-Existing Doc Detection

Before running interactive design, check whether /spec (or a prior /build run) already produced design artifacts for this ticket.

Scan for pre-existing spec docs: Search docs/plans/ for design docs (*-design.md) with a matching ticket field in YAML frontmatter. Also check for corresponding *-implementation-plan.md and *-contract.yaml files with the same ticket field.
Conflict detection: If multiple design docs match the same ticket field, escalate to user: "Found multiple design docs for ticket #NNN: [list files]. Which should I use?" Do not proceed until the user resolves the conflict.
Full match (design doc + implementation plan + contract all present):
- Skip interactive design (the Phase 1 design sub-skill below) — design doc already exists
- Security review check: If the contract contains security_review field, note it in the Phase 1→2 handoff manifest under Active Constraints: "Contract requires security review (security_review.status: [required|recommended]) — siege will be evaluated in Phase 4 Step 5.5." This ensures the directive survives phase handoffs and compaction recovery.
- Quality-gate the existing design doc with staleness context: "This design doc is pre-existing from /spec and may be stale — verify against current codebase state before proceeding"
- Staleness rejection: If the quality gate finds that the design doc references files, interfaces, or modules that no longer exist in the codebase, reject the doc as fundamentally stale. Fall back to running Phase 1 interactively. Inform user: "Pre-existing design doc for #NNN is fundamentally stale (references [specific items] that no longer exist). Running interactive design instead."
- If quality gate passes: Run Phase 2 on the pre-existing implementation plan — skip Plan Writer (plan already exists), but run Plan Reviewer + innovate + quality-gate on the existing plan. This ensures the plan gets the same review rigor as a freshly written plan.
- If quality gate fails (non-staleness issues): fix or escalate
- Proceed to Phase 3 when the plan passes review
Partial match (design doc present but implementation plan or contract missing):
- Use the existing design doc (quality-gate it as above, including staleness rejection)
- Run the missing phases normally: if no implementation plan, run Plan Writer in Phase 2; if no contract, proceed without contract awareness for this ticket
- Inform user which artifacts were found and which are being generated fresh: "Found pre-existing design doc for #NNN. Implementation plan is missing — will generate in Phase 2." (or similar)
Not found: Proceed with normal Phase 1 (interactive design below).

Model: Opus (creative/architectural work needs the best model)
Mode: Interactive with the user
RECOMMENDED SUB-SKILL: Use crucible:forge (feed-forward mode) — consult past lessons before starting
RECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (consult mode) — review codebase map for structural awareness
REQUIRED SUB-SKILL: Use crucible:design
Follow design skill for design refinement, section-by-section validation, and saving the design doc
OVERRIDE: When design completes and the design doc is saved, do NOT follow design's "Implementation" section (do not chain into planning or worktree from there). Return control to this build skill — Phase 2 handles planning with its own subagent-based approach.
Phase ends when user approves the design (says "go", "looks good", "proceed", etc.)
Everything after this point is autonomous — tell the user: "Design approved. Starting autonomous pipeline — I'll only interrupt for escalations."

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, all Use crucible:<skill> and Dispatch a <kind> subagent invocations in Phase 1 (design, innovate, quality-gate, PRD writer, acceptance test writer, contract test writer) substitute a disk-read from the mock dir for the Task tool invocation. Each substitution follows the substitution rule in the ## Mock Dispatch Mode (eval-gate) section. AskUserQuestion calls in Phase 1 use CRUCIBLE_BUILD_EVAL_USER_INPUT_DIR per that same section.

Step 2: Innovate and Red-Team the Design

After the user approves the design and before starting Phase 2:

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-design-gate" before dispatching innovate and quality-gate on the design doc.

Innovate: Dispatch crucible:innovate on the design doc. Plan Writer incorporates the proposal.
Write Phase 1 IN_PROGRESS to the gate ledger (after ledger initialization).
REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) design doc with artifact type "design". Include in the dispatch context: Phase: design and PipelineID: <current PipelineID>. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.)
If the quality gate requires changes, the Plan Writer updates the design doc and re-commits.
Verify verdict marker and write Phase 1 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
Design doc is now finalized — proceed to acceptance tests.

Step 2.5: Generate PRD

After the design doc is finalized (Step 2 complete), generate a stakeholder-facing PRD:

Dispatch a PRD Writer subagent (Sonnet) using ./prd-writer-prompt.md
- Input: finalized design doc
- Output: PRD in standard format (problem statement, user stories, requirements, scope, out-of-scope, success metrics, technical notes, dependencies)
Save to docs/prds/YYYY-MM-DD-<topic>-prd.md
Commit: docs: add PRD for [feature]

Step 3: Generate Acceptance Tests (RED)

Before planning, define "done" with executable tests:

Dispatch an Acceptance Test Writer subagent (Opus) using ./acceptance-test-writer-prompt.md
- Input: finalized design doc (especially acceptance criteria)
- Output: integration-level test file(s) that verify feature behavior end-to-end
Run the acceptance tests — verify they FAIL (the feature doesn't exist yet)
- If tests pass: something is wrong — investigate before proceeding
- If tests error (won't compile): this is expected in typed languages — note which tests exist and what they verify. They become the first implementation task.
Commit: test: add acceptance tests for [feature] (RED)

These tests define the feature-level RED-GREEN cycle that wraps the entire pipeline. The pipeline is done when these tests pass.

Refactor Mode: Phase 1 Changes

When in refactor mode, Phase 1 shifts from "what should we build?" to "what are we changing and what could break?"

Blast Radius Analysis

After the user describes the refactoring intent, the design phase:

Identify the target — What code is being restructured? (module, interface, data representation, file organization, etc.)
Trace the blast radius using cartographer (if available) or fallback exploration:
- Direct consumers — code that imports/calls/references the target
- Indirect dependents — code that depends on consumers (transitive)
- Test coverage — which tests exercise the target behavior
- Configuration/wiring — DI registrations, config files, build scripts that reference the target
- Fallback when cartographer is unavailable: Use language-aware symbol search via agent exploration. Grep for symbol references (imports, type annotations, function calls) using language-specific patterns. The impact manifest's confidence field reflects reduced precision.
Present an impact manifest to the user:

### Impact Manifest

**Target:** [what's being restructured]
**Structural goal:** [what the code should look like after]

**Direct consumers:** N files
- path/to/consumer1.py (calls TargetClass.method)
- path/to/consumer2.py (imports TargetClass)

**Indirect dependents:** N files
- path/to/dependent.py (depends on consumer1)

**Test coverage:**
- N tests directly exercise target behavior
- N tests exercise consumers
- Gap: no tests cover [specific seam]

**Risk assessment:** [Low/Medium/High] based on consumer count and coverage gaps
**Confidence:** [High/Medium/Low] — High if cartographer used, Medium/Low if fallback

When confidence is Low, require explicit user confirmation before proceeding. The user must review the impact manifest and confirm the blast radius is complete.

Design the structural goal — what should the code look like after the refactoring? User validates the target state.

Acceptance Tests (Refactor Mode)

Instead of writing NEW acceptance tests (Step 3 above), the pipeline:

Dispatch the contract test writer using ./contract-test-writer-prompt.md — a single agent handles gap identification AND gap filling. Input: impact manifest + blast radius file list. The agent maps existing tests to behavioral seams, identifies untested seams, and writes contract tests for each gap.
Run all contract tests GREEN — contract tests must pass before any refactoring begins.
If a contract test FAILS: The contract test writer investigates:
- Test defect (wrong assertion, bad setup) — fix the test and re-run
- Latent codebase bug — report to user with options: (a) fix the bug first, (b) exclude this seam and accept the risk, (c) abort the refactoring. Never silently drop a failing contract test.
Commit: test: add contract tests for [target] refactoring (GREEN — locking existing behavior)

Proportionality Escape Valve

Contract test writing must remain proportional to the refactoring scope. Trigger a scope check when any of these thresholds are hit:

Count threshold: More than 15 contract tests needed
Effort threshold: Contract test writer reports context pressure, or estimated total contract test LOC exceeds ~2x the estimated refactoring scope LOC

When triggered:

Present the full gap list to the user with estimated effort per gap
User selects which gaps to fill and which to accept as uncovered risk
Proceed with only user-selected contract tests

The impact manifest records which gaps the user chose to leave uncovered.

Phase Handoff: 1 → 2

Before dispatching the Plan Writer, verify the gate ledger and write a handoff manifest:

Gate ledger check: Read build-gate-ledger.md and verify Phase 1 Status is PASS. If not, follow Enforcement Rules.
Write handoff-1-to-2.md with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 2: design doc path, acceptance test paths (or contract tests in refactor mode), PRD path (if generated), conventions path (from cartographer, if loaded)
- Decisions Carried Forward: accumulated decisions from Phase 1
- Active Constraints: constraints affecting planning
- Shed Receipt: design iteration history, innovate proposals, quality gate round details → design doc on disk captures the outcome
Emit shed statement: "Phase 1 context shed. Design doc, acceptance tests, and PRD are on disk. Design iteration history, innovate proposals, and gate round details are not carried forward."
Update ## Compression State in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block (manifest replaces it at this boundary).
Session index event: Emit a phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 1 -> Phase 2 (Plan)","detail":{"skill":"build","from":"1","to":"2"}}.

Phase 2: Plan (Autonomous)

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, the Plan Writer, Plan Reviewer, innovate, and quality-gate dispatches in this phase use the mock-dir substitution rule defined in the ## Mock Dispatch Mode (eval-gate) section. The substitution does not change Phase 2's structure or the gate-ledger writes.

Step 1: Write the Plan

Dispatch a Plan Writer subagent (Opus):

Read the design doc produced in Phase 1 and the acceptance tests from Step 3
Write an implementation plan following the crucible:planning format
If acceptance tests couldn't compile (typed language), Task 1 should create the interfaces/stubs needed for them to compile and fail correctly
Include per-task metadata: Files (with count), Complexity (Low/Medium/High), Dependencies
Save to docs/plans/YYYY-MM-DD-<topic>-implementation-plan.md
Plan tasks should be scoped to 2-3 per subagent, ~10 files max (context budget awareness)

Use ./plan-writer-prompt.md template for the dispatch prompt.

Step 2: Review the Plan

Dispatch a Plan Reviewer subagent:

Reviewer model selection:

Plan touches 4+ systems or has 10+ tasks → Opus
Plan touches 1-3 systems with <10 tasks → Sonnet
When in doubt → Opus

Review protocol (iterative):

Dispatch Plan Reviewer to check plan against design doc
If issues found: record issue count, dispatch Plan Writer to revise
Dispatch NEW fresh Plan Reviewer on revised plan (no anchoring)
Compare issue count to prior round:
- Strictly fewer issues → progress, loop again
- Same or more issues → stagnation, escalate to user with findings from both rounds
Loop until plan passes with no issues
Architectural concerns bypass the loop — immediate escalation regardless of round

Use ./plan-reviewer-prompt.md template for the dispatch prompt.

Step 3: Innovate and Red-Team the Plan

After the plan passes review:

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-plan-gate" before dispatching innovate and quality-gate on the plan.

Write Phase 2 IN_PROGRESS to the gate ledger.
Innovate: Dispatch crucible:innovate on the approved plan. Plan Writer incorporates the proposal into the plan.
REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) plan with artifact type "plan". Include in the dispatch context: Phase: plan and PipelineID: <current PipelineID>. Provides the plan and design doc as context. (Non-negotiable — see Quality Gate Requirement.)
Verify verdict marker and write Phase 2 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.

The quality gate handles the iterative red-team loop — fresh review each round, weighted stagnation detection, 15-round safety limit, escalation. See crucible:quality-gate for details.

Phase Handoff: 2 → 3

Write a handoff manifest:

Write handoff-2-to-3.md with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 3: plan path, design doc path, acceptance test paths (or contract tests), contract YAML path (if exists), baseline SHA (current HEAD), cartographer context paths (module files, conventions.md, landmines.md)
- Decisions Carried Forward: accumulated decisions from Phases 1-2
- Active Constraints: constraints affecting execution
- Shed Receipt: plan review iterations, innovate proposals, quality gate round history → plan on disk captures the outcome
Emit shed statement: "Phase 2 context shed. Plan, design doc, and acceptance tests are on disk. Plan review rounds, innovate proposals, and gate details are not carried forward."
Update ## Compression State in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block.
Session index event: Emit a phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 2 -> Phase 3 (Execute)","detail":{"skill":"build","from":"2","to":"3"}}.

Phase 3: Execute (Autonomous, Team-Based)

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, all per-task dispatches in this phase (implementer, reviewer, cleanup, test-coverage, test-gap-writer, adversarial-tester, architecture-reviewer) use the mock-dir substitution rule defined in the ## Mock Dispatch Mode (eval-gate) section. TeamCreate and TaskCreate calls run normally — only the Task/Agent tool invocations on teammates are substituted.

Step 0: Load Module Context for Subagents

RECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (load mode) — when dispatching implementers and reviewers, include relevant module files, conventions.md, and landmines.md in their dispatch files
Defect signature loading (for implementers only):
1. Glob defect-signatures/*.md (excluding *.non-matches.md) from the cartographer storage directory
2. For each signature, read its Modules field and match against the task's target modules:
  - Read each cartographer module file's Path: field
  - A task's file is in a module if the file path starts with the module's Path: value
  - When a task spans multiple modules, load signatures for all matched modules
  - Directory prefix fallback: When no cartographer modules exist, match if any target file path starts with any of the signature's Modules directory prefixes
3. For matching signatures, validate all file paths still exist on disk — drop stale entries silently
4. Inject into the [DEFECT_SIGNATURES] section of build-implementer-prompt.md:
  - Generalized pattern (always)
  - Confirmed siblings list (always)
  - Unresolved siblings list (always — these are known live defects; produces a stronger warning)
  - Non-match companion files are NOT loaded for implementers
5. Last loaded update: Loading is pure-read. After all implementer dispatches for the current phase complete, batch-update the Last loaded field to today on all signatures that were loaded. Do NOT update during dispatch — defer to after all subagents are dispatched.
Grudge pre-flight (regression-oracle, #271): Before dispatching implementers, query the Book of Grudges for each task's in-scope files and inject any matches into that implementer's dispatch file as a hard DO NOT REPEAT constraint (sibling to defect-signature loading). Resolve the helper by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/grudge_query.py" <task files…>; non-empty output lists past regressions held against those files. Best-effort: if the helper is unresolved, emit a one-line stderr warning and continue — a missing pre-flight must NEVER block the build. See skills/grudge/SKILL.md.

Step 0.5: Gate Ledger — Phase 3 Start

Write Phase 3 IN_PROGRESS to the gate ledger (after Phase 2 PASS verification).

Step 1: Create Team and Task List

Create a team using TeamCreate:

team_name: "build-<feature-name>"
description: "Building <feature description>"

Read the approved plan. Create tasks via TaskCreate for each plan task, including:

Subject from plan task title
Description with full plan task text (subagents should never read the plan file)
Dependencies via TaskUpdate with addBlockedBy

Agent Teams Fallback

If TeamCreate fails (agent teams not available), output a clear one-time warning:

⚠️ Agent teams are not available. Recommended: set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 Falling back to sequential subagent dispatch via Agent tool.

Then fall back to sequential subagent dispatch via the regular Task tool (without team_name). Everything still works — independent tasks run sequentially instead of in parallel via teammates.

What changes in fallback mode:

Tasks are dispatched via Agent tool instead of as teammates
Independent tasks that would run in parallel now run sequentially
Task tracking still uses TaskCreate/TaskUpdate for state management
All other pipeline behavior (TDD, review, de-sloppify, quality gates) is unchanged

Step 2: Analyze Dependencies and Execution Order

Before dispatching:

Map the dependency graph from plan task metadata
Identify independent tasks (no shared files, no sequential dependencies)
Group into execution waves — independent tasks parallel, dependent tasks sequential
Assess complexity per task for reviewer model selection

Step 3: Execute Tasks

For each task (or wave of parallel tasks):

Mark task in_progress via TaskUpdate
Spawn Implementer teammate (Opus) via Task tool with team_name and subagent_type="general-purpose"
- Use ./build-implementer-prompt.md template
- Pass full task text, file paths, project conventions
- Contract-aware dispatch (when a contract exists for this ticket): Include the contract YAML alongside the design doc and task description. See "Contract-Aware Implementer Guidance" below.
- Implementer follows TDD, writes tests, runs tests, commits, self-reviews
When Implementer reports completion, run De-Sloppify Cleanup (see below)
After cleanup completes, spawn Reviewer teammate
- Use ./build-reviewer-prompt.md template
Tier-aware review routing: Read the task's Review-Tier from plan metadata.
- Tier 1: Dispatch single-pass code reviewer (Sonnet). If Clean or Minor-only: task complete. If Critical/Important: dispatch implementer fix, then task complete. If Architectural Concern: escalate.
- Tier 2: Dispatch iterative code review (per existing loop). Then dispatch single-pass test reviewer. If test review surfaces Critical findings, escalate to Tier 3. Then dispatch adversarial tester (per existing logic). Task complete.
- Tier 3: Follow current full pipeline (no changes to existing flow).

Contract-Aware Implementer Guidance

api_surface declarations are binding. The implementer must match the declared function signatures, class interfaces, endpoint shapes, parameter names, types, and return types exactly. Deviations from the contract's API surface are implementation errors.
checkable invariants are binding. The implementer must satisfy all declared constraints (e.g., "must not import X", "must be idempotent"). The check_method field (grep, code-inspection, file-structure) indicates how the quality gate will verify compliance — the implementer should self-check against these before committing.
testable invariants require tagged tests. For each testable invariant, the implementer must write a test tagged with the declared test_tag (pattern: contract:<category>:<id>) that validates the invariant. These tests are checked by the quality gate and reviewers — they must exist and pass.
integration_points are informational. These indicate which other components and contracts this ticket interacts with. The implementer should be aware of referenced components and ensure compatibility, but integration points are not binding constraints — they provide context for making good implementation decisions.

De-Sloppify Cleanup

After the implementer reports completion and before dispatching the reviewer:

RECOMMENDED: Use crucible:checkpoint — create checkpoint with reason "pre-cleanup-task-N" before dispatching the cleanup agent. If cleanup removes something needed, restore to this checkpoint.

Record the pre-cleanup commit SHA
Dispatch a fresh Cleanup Agent (Opus) using ./cleanup-prompt.md
- Input: git diff <pre-task-sha>..HEAD (the implementer's committed changes)
- The orchestrator provides the pre-task commit SHA to the cleanup agent
Cleanup agent reviews changes, removes unnecessary code (see allowlist), runs tests
If cleanup made changes, commits separately: refactor: cleanup task N implementation
If cleanup found nothing to remove, reports "No cleanup needed" and proceeds

Reviewer Model Selection (Lead Decides Per-Task)

Two-Pass Review Cycle

Each task gets TWO review passes before completion:

digraph review {
  "Implementer builds + tests" -> "De-sloppify cleanup";
  "De-sloppify cleanup" -> "Pass 1: Code Review";
  "Pass 1: Code Review" -> "Implementer fixes code findings";
  "Implementer fixes code findings" -> "Pass 2: Test Quality Review";
  "Pass 2: Test Quality Review" -> "Implementer fixes test findings";
  "Implementer fixes test findings" -> "Test Alignment Audit (crucible:test-coverage)";
  "Test Alignment Audit (crucible:test-coverage)" -> "Test Gap Writer";
  "Test Gap Writer" -> "Adversarial Tester";
  "Adversarial Tester" -> "Task complete";
}

Pass 1 — Code Review: Architecture, patterns, correctness, wiring (actually connected, not just existing?)

Review Tier Routing

Each task's Review-Tier (from the plan) determines which review steps execute. Phase 4 full-implementation gates are NOT affected by per-task tiers.

Tier 2 "iterative" code review: Same as current behavior -- fresh reviewer each round, track issue count, loop until clean or stagnation.

Runtime Tier Escalation

The orchestrator may escalate a task's review tier during execution. Escalation is one-directional (up only).

Triggers:

Implementer reports unexpected complexity or cross-system interaction not anticipated in the plan
Single-pass reviewer (Tier 1 code review or Tier 2 test review) reports Critical findings
Implementer touches significantly more files than the plan specified

Process:

Log escalation to decision journal: [timestamp] DECISION: review-tier | choice=escalate T1->T2 | reason=<trigger> | alternatives=none
Execute the additional review steps for the new tier (from the point where the current tier's pipeline diverges)
Update the task status display to show the escalated tier

Contract-Aware Reviewer Guidance

When a contract YAML exists for the current ticket, reviewers receive the contract alongside the implementation and must add the following checks to both review passes:

API surface compliance: Do the implemented public interfaces match the api_surface declarations in the contract? Check function signatures, class interfaces, endpoint shapes, parameter names/types, and return types. Any deviation from the contract's declared API surface is a blocking finding.
Checkable invariant satisfaction: Are all checkable invariants satisfied per their declared check_method?
- grep: verify the pattern match (or absence) in production code
- code-inspection: read and reason about code to confirm the invariant holds
- file-structure: check file existence/organization matches the constraint Any unsatisfied checkable invariant is a blocking finding.
Testable invariant test existence: Does a test exist for each testable invariant, tagged with the correct test_tag (pattern: contract:<category>:<id>)? A missing tagged test is a blocking finding.
Test correctness: Do the tagged tests actually validate the invariant they claim to cover? A test that exists but does not meaningfully exercise the invariant (e.g., a trivially passing assertion, a test that tests something unrelated despite having the right tag) is a blocking finding.

Test Alignment Audit

After the implementer addresses Pass 2 findings, invoke crucible:test-coverage against the task's changes:

Code diff: git diff <pre-task-sha>..HEAD
Affected test files: test files touched or related to the task
Context: "Build task N: [task description]"

Skip this step if the task made no behavioral source changes (only .md, .json, config files).

Test Gap Writer

After test-coverage completes (or is skipped), dispatch a Test Gap Writer (Opus) using ./test-gap-writer-prompt.md:

Input: Pass 2 test reviewer's missing coverage findings + implementer's changes + test-coverage audit report (if available)
The test gap writer writes tests ONLY for gaps the reviewer identified — no scope creep. Before writing a new test for a flagged gap, verify no existing test already covers this path (it may have been updated by the test-coverage audit).
Tests should pass immediately (the behavior already exists from implementation)
The test gap writer reports per-test PASS/FAIL results (see prompt template for report format)
Commits new tests: test: fill coverage gaps for task N

If all tests PASS: Continue to adversarial tester.

If some tests FAIL (gaps reveal genuinely missing implementation):

Dispatch a fresh implementer (Opus) with the failing test(s), their failure messages, and the gap descriptions from the reviewer
Implementer fixes the missing behavior, then re-runs ALL test gap writer tests (not just the failures — catches regressions from the fix)
If all tests pass after fix: commit (fix: address test gap failures for task N), continue to adversarial tester
If tests still fail after one fix attempt: escalate to user with:
- Which coverage gaps the reviewer identified
- Which tests the gap writer wrote (per-test PASS/FAIL)
- What the implementer attempted to fix
- Which tests still fail and their current failure messages

Skip this step if the Pass 2 test reviewer reported zero missing coverage gaps.

Adversarial Tester

After the test gap writer completes (or is skipped), dispatch an Adversarial Tester (Opus) using skills/adversarial-tester/break-it-prompt.md:

Input: Full diff of the task's changes (git diff <pre-task-sha>..HEAD), project test conventions, cartographer module context (if available)
The adversarial tester identifies the top 5 most likely failure modes, writes one test per mode, and runs them
Outcome handling:
- All tests PASS: Implementation is robust. Log results and proceed to task complete.
- Some tests FAIL: Real weaknesses found. Dispatch implementer to fix. Re-run all tests (including adversarial). If pass → task complete. If fail → one more fix attempt, then escalate to user.
- Tests ERROR (won't compile): Adversarial tester mistake. Discard broken tests, log, proceed to task complete.
Quality bypass prevention: If the implementer's fix touches more than 3 files, route through a lightweight code review before completing.
Commit adversarial tests: test: adversarial tests for task N

Skip this step when:

The task diff contains no behavioral source files (only .md, .json, .yaml, .uss, .uxml)
No tests were written during implementation (pure scaffolding)

Iterative Review Loop

Each review pass (code and test) uses the iterative loop:

After fixes, dispatch a NEW fresh Reviewer (no anchoring to prior findings)
Track issue count between rounds
Strictly fewer issues → progress, loop again
Same or more issues → stagnation, escalate to user
Loop until clean
Architectural concerns → immediate escalation regardless of round

Verification Gates

After each wave completes:

Run full test suite (not just current wave's tests)
Check compilation
Failures → identify which task caused regression before fixing
Clean → proceed to next wave

Refactor Mode: Phase 3 Changes

When in refactor mode, Phase 3 execution differs from feature mode in several ways.

Pre-Execution Coverage Check

Before the first task executes:

Run all contract tests from Phase 1 — confirm GREEN
Run the full test suite — confirm GREEN (pre-execution baseline)
Record the "baseline commit" SHA in /tmp/crucible-build-mode.md — this is the rollback target

Tiered Test Strategy

Running the full test suite after every atomic step is prohibitively expensive. Instead:

(a) After each atomic task: Run blast-radius tests + direct consumer tests only (tests identified in the impact manifest)
(b) After each execution wave: Run the full test suite (matches existing verification gate between waves)
(c) Full suite checkpoints: Pre-execution baseline and Phase 4 final verification always run the full suite

Coordinated-Atomic Execution

When the executor encounters a task marked atomic: true:

Record pre-task commit SHA
Implementer makes ALL changes (multiple files) — dispatch with ./refactor-implementer-addendum.md appended
Run blast-radius tests + direct consumer tests (per tiered strategy)
If GREEN: Commit all files together in a single commit
If FAIL: Revert ALL files to pre-task SHA. Dispatch one retry with a fresh implementer that receives the failure context and test output. If second attempt also fails, revert to pre-task SHA and escalate to user (see Rollback Policy below).

Phase 3 Adaptations for Existing Steps

Adversarial tester: The planner annotates each task with restructuring-only: true/false. If restructuring-only: true, adversarial testing is skipped. Tasks with restructuring-only: false still get adversarial testing. When in doubt, default to false.
- restructuring-only: true examples: renames where all call sites are mechanically updated, file moves with updated paths, extract-method where the extracted method is private and preserves the original call signature
- restructuring-only: false examples: extract-class where callers must change call targets, splitting a module where consumers must update imports, any change where the consumer-facing API surface shifts
De-sloppify cleanup: Gains a new removal category: dead compatibility shims. After a refactoring task, look for leftover adapter code, re-export aliases, or compatibility layers introduced during migration but no longer referenced. Detection scope: code added after the baseline commit SHA that re-exports, aliases, or wraps symbols under old names, AND where no code outside the refactoring's changed files references the old names. String-based references: When the target was registered by name in a configuration system, flag the shim as UNCERTAIN and defer to the reviewer rather than removing it.

Refactoring Rollback Policy

Baseline Commit

The orchestrator records the baseline commit SHA before the first refactoring task executes (during pre-execution coverage check). Persisted in /tmp/crucible-build-mode.md.

Per-Task Rollback

When a single task fails after the executor's retry attempt:

Revert that task's changes to the pre-task commit SHA
Escalate to user with failure context and test output
User chooses: skip this task and continue (orchestrator also skips all tasks that depend on the skipped task, and informs the user which tasks were transitively skipped), retry with guidance, or revert all tasks to baseline

Full Rollback to Baseline

When the user chooses full rollback (or cascading failures make forward progress impossible):

Perform git reset --hard <baseline-SHA> to restore pre-refactoring state
Re-run all contract tests to confirm known-good state
Report what was reverted and why

Safe Partial States

Architectural Checkpoint

For plans with 10+ tasks, at ~50% completion or after a major subsystem:

Dispatch architecture reviewer using ./architecture-reviewer-prompt.md
Design drift → escalate to user
Minor concerns → adjust prompts for remaining tasks
All clear → continue

Noticed Reconciliation

7-step reconciliation process:

Collect each implementer's ### Noticed But Not Touching section from every Phase 3 implementer report.
Skip any section whose body is *(none)*.
Dedupe entries using the canonical dedupe key: sha256( normalize(file_path) + "|" + line_range + "|" + noticed[:40] ), where normalize(file_path) is the repo-relative POSIX path lowercased.
Sort the deduped entries by file path, then line range.
If any entries remain, write docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md matching the canonical filename regex ^docs/plans/\d{4}-\d{2}-\d{2}-[a-z0-9-]+-noticed\.md$. Use the date embedded in the sibling plan filename (not wall-clock date) so all sibling artifacts share a date; slug matches the ticket being built. Frontmatter and body must follow the Canonical Constants template exactly:
```
---
pipeline_id: "<build-YYYYMMDD-HHMMSS>"
date: "YYYY-MM-DD"
ticket: "#NNN"
---

# Noticed But Not Touching — <ticket-slug>

- **file:** `path:L<start>-L<end>`
  **noticed:** <desc>
  **why it matters:** <risk/opportunity>
  **suggested follow-up:** <optional>
```
Idempotent overwrite: If the target -noticed.md already exists (same-ticket re-run on the same date), merge the existing entries with the newly collected entries, run the full dedupe (same key), sort, and overwrite the file in one write. No append-mode; the on-disk file is always the full deduped set for that date+ticket.
Stage the -noticed.md file so it lands in the PR commit.

Skip the write entirely if zero entries remain after dedupe — do not produce an empty -noticed.md.

Gate Ledger — Phase 3 Complete

Phase Handoff: 3 → 4

Before running acceptance tests and code review, verify the gate ledger and write a handoff manifest:

Gate ledger check: Read build-gate-ledger.md and verify Phase 3 Status is COMPLETE. If not, follow Enforcement Rules.

Write the handoff manifest:

Write handoff-3-to-4.md with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 4: HEAD SHA (all tasks committed), design doc path, acceptance test paths (or contract tests), baseline SHA (for git diff scope), task summary (completed count, escalation outcomes)
- Decisions Carried Forward: accumulated decisions from Phases 1-3
- Active Constraints: constraints affecting completion review
- Shed Receipt: per-task review rounds, implementer context, wave verification details → task completion status in task list; per-task review details are shed
Emit shed statement: "Phase 3 context shed. Working code at HEAD, design doc, and acceptance tests on disk. Per-task implementation context, review rounds, and verification details are not carried forward."
Update ## Compression State in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block.
Session index event: Emit a phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 3 -> Phase 4 (Completion)","detail":{"skill":"build","from":"3","to":"4"}}.

Phase 4: Completion

Eval-gate pointer (Mock Dispatch Mode): when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set, the temper, inquisitor, optional siege, quality-gate, forge, cartographer, and finish dispatches in this phase use the mock-dir substitution rule defined in the ## Mock Dispatch Mode (eval-gate) section. Local test-suite execution (pytest, etc.) runs normally — substitution applies only to subagent dispatches.

After all tasks complete:

Write Phase 4 IN_PROGRESS to the gate ledger (after Phase 3 COMPLETE verification).
Feature mode: Run acceptance tests from Phase 1 Step 3 — verify they PASS (GREEN). Refactor mode: Run all contract tests from Phase 1 — verify they PASS (GREEN).
- If any fail: implementation is incomplete. Identify what's missing, dispatch implementer to fix, re-run.
- If all pass: feature is verifiably done. Proceed.
Run full test suite (unit + integration)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-temper" before dispatching code review. If the iterative review fix cycle introduces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:temper on full implementation (iterative until clean)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-inquisitor" before dispatching inquisitor. If the inquisitor's fix cycle produces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:inquisitor on full implementation (dispatches 5 parallel dimensions against full feature diff)
- Input: git diff <base-sha>..HEAD where base-sha is the commit before Phase 3 execution began
- Runs after code review (obvious issues already fixed) and before quality gate (gate reviews final state)
- The inquisitor manages its own fix cycle internally — do not intervene unless it escalates
- See crucible:inquisitor for full process
Conditional: If the inquisitor's fix cycle produced any code changes, re-run crucible:temper scoped to the inquisitor fix commits only (git diff <pre-inquisitor-sha>..HEAD)
- This is NOT a full implementation re-review — scope it to only the fixer's changes
- Iterative until clean, same as step 3
- Skip if the inquisitor reported all PASS (no fixes were needed) 5.5. CONDITIONAL: Security review via crucible:siege

a. Contract check: If a contract YAML exists for this ticket with security_review.status: "required", siege is mandatory — skip to step (d). b. Code scan: If no contract directive (or contract has security_review.status: "recommended" or field absent), scan for siege activation signals:
- Scan targets: design doc content + git diff <base-sha>..HEAD (changed file contents)
- Method: Case-insensitive keyword matching using the 7-category keyword lists from shared/security-signals.md
- Count distinct categories matched (one hit per category is sufficient) c. Threshold evaluation:
- 0 signals: Skip siege silently. No narration needed.
- 1 signal: Log in narration: "1 security signal detected ([category]) — skipping siege. Invoke /siege --force manually if needed." Record in manifest and decision journal: security-review | choice=skip | reason=1 signal ([category]).
- 2+ signals: Proceed to step (d). d. Dispatch siege:
- RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-siege" before dispatching siege. If siege's fix cycle produces regressions, this is the rollback target.
- Dispatch crucible:siege with:
  - Target: design doc + full implementation diff (artifact type: mixed)
  - deployment_context: from contract security_review.deployment_context if present, else unset (siege defaults to public)
- Narration: "Security signals detected: [list categories]. Dispatching siege."
- Decision journal: security-review | choice=dispatch | reason=[N] signals ([categories]) [or contract-required]
- Session index event: Emit to outbox: {"ts":"<now>","seq":0,"type":"security_review","summary":"Siege dispatched: [N] signals detected","detail":{"skill":"build","signals":[categories]}} e. Blocking behavior: Siege iterates internally until zero Critical + zero High.
- If siege completes clean: continue to step 6 (quality-gate)
- If siege escalates (stagnation, user input needed): escalate to user with siege context
- If siege's fix cycle produced code changes: re-run crucible:temper scoped to siege fix commits only (git diff <pre-siege-sha>..HEAD). Same pattern as post-inquisitor conditional review at step 5. f. Escape hatches: User can override automatic siege behavior:
- --force-siege — Dispatch siege regardless of signal count. Maps to siege's --force flag. Decision journal: security-review | choice=force-dispatch | reason=user --force-siege flag
- --skip-siege — Suppress siege even when signals/contract require it. Maps to siege's --skip flag. Decision journal: security-review | choice=force-skip | reason=user --skip-siege flag
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-impl-gate" before dispatching the implementation quality gate. If gate fix rounds degrade the code, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:quality-gate on full implementation (artifact type: "code"). Include in the dispatch context: Phase: code and PipelineID: <current PipelineID>. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.) 6b. Verify verdict marker and write Phase 4 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
RECOMMENDED SUB-SKILL: Use crucible:forge (retrospective mode) — capture what happened vs what was planned 7.5. Chronicle signal fallback: If forge retrospective was skipped (user declined, session ending), append a minimal chronicle signal directly:
- Read the metrics log at /tmp/crucible-metrics-<session-id>.log for duration and subagent counts
- Construct signal: v=1, ts=now, skill="build", outcome from acceptance test results, duration_m from metrics log, branch from git, files_touched from git diff <base-sha>..HEAD --name-only, metrics={mode, tasks count, tasks_passed count from task list, stagnation=false}
- Append as a single JSON line to ~/.claude/projects/<hash>/memory/chronicle/signals.jsonl
- If forge retrospective DID run, skip this step (forge Step 8.5 already emitted the signal)
RECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (record mode) — persist any new codebase knowledge discovered during build
Compile summary: what was built, acceptance tests passing, review findings addressed, inquisitor findings, concerns
Report to user 10.5. Session index event: Emit a skill_end event to the outbox: {"ts":"<now>","seq":0,"type":"skill_end","summary":"/build complete: <outcome summary>","detail":{"skill":"build","outcome":"success|failure|escalated"}}.
REQUIRED SUB-SKILL: Use crucible:finish — skip finish's Step 2.5 (test-coverage) since test-coverage ran per-task in Phase 3, and skip finish's Step 3 (red-team) since quality-gate already ran at step 6. Tell finish to skip both.
Delete pipeline-active marker: Remove <scratch>/.pipeline-active. This signals that the pipeline completed successfully. If deletion fails (permissions, missing file), log a warning but do not fail the pipeline.

Session Metrics

Throughout the pipeline, the orchestrator appends timestamped entries to /tmp/crucible-metrics-<session-id>.log on each subagent dispatch and completion.

Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from shared/dispatch-convention.md:

Before dispatching: Measure the dispatch file size in characters. Record input_chars and model_tier in the manifest entry.
After dispatch returns: Measure the subagent response length in characters. Record output_chars and tool_calls (if available) in the manifest completion entry.

At completion (before reporting to user, i.e. step 9), read the metrics log and manifest, then compute:

-- Pipeline Complete ----------------------------------------
  Subagents dispatched:  23 (14 Opus, 7 Sonnet, 2 Haiku)
  Active work time:      2h 47m
  Wall clock time:       11h 13m
  Quality gate rounds:   4 (design: 2, plan: 1, impl: 1)
  Siege:                 dispatched (3 agents, 2 rounds, 0 Critical, 0 High) | skipped (0 signals) | skipped (1 signal: auth)
  Task tiers:           3 Tier 1, 3 Tier 2, 2 Tier 3
  Subagent savings:     ~21 dispatches skipped vs all-Tier-3
  Est. input tokens:    ~32,100 (128,400 chars)
  Est. output tokens:   ~20,500 (82,000 chars)
  Token estimate note:  Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%.
-------------------------------------------------------------

Metrics tracked:

Total subagents dispatched (by type and model tier: Opus/Sonnet/Haiku)
Active work time (merge overlapping parallel intervals — NOT naive sum)
Wall clock time (first dispatch to final completion)
Quality gate rounds (per gate: design, plan, implementation)
Siege status (dispatched with agent count, rounds, and final severity counts — or skipped with signal count and reason)
Estimated input tokens (sum of input_chars from manifest / 4)
Estimated output tokens (sum of output_chars from manifest / 4)

Pipeline Decision Journal

Alongside the metrics log, maintain a decision journal at /tmp/crucible-decisions-<session-id>.log. Append a structured entry for every non-trivial routing decision:

[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>

Decision types to capture:

reviewer-model — why Opus vs Sonnet for this reviewer
review-tier -- tier assignment read from plan, runtime escalation reason if applicable
gate-round — issue count, severity shifts, progress/stagnation per round
escalation — why the orchestrator escalated to user (and user's decision)
task-grouping — parallelism decisions for wave execution
cleanup-removal — what de-sloppify removed and accept/reject decision

Escalation Triggers (Any Phase)

STOP and ask the user when:

Architectural concerns in plan or code review
Review loop stagnation (same or more issues after fixes — any phase)
Test suite failures not obviously fixable
Multiple teammates fail on different tasks
Teammate reports context pressure at 50%+ with significant work remaining
When escalating for regression or stagnation AND a checkpoint exists for the current phase boundary: include "A checkpoint from [reason] is available. Restore to pre-regression state?" in the escalation message.

Minor issues: Log, work around, include in final report.

What the Lead Should NOT Do

Implement code (dispatch implementers)
Read large files (spawn Haiku researcher)
Debug failing tests (dispatch implementer)
Make architectural decisions (escalate to user)

Context Management

One task per agent — always spawn a fresh implementer for each task. Never send a second task to a running agent via SendMessage. Reusing agents accumulates context and causes exhaustion.
"2-3 per subagent, ~10 files max" refers to plan design — group small steps into one task at planning time, not sequential dispatch to a running agent
Lead stays thin — coordination only
All important state on disk (plan files, task list)
Teammates report at 50%+ context usage
Lead compaction acceptable — task list is source of truth
Agent teams unavailable: If agent teams are not enabled, the lead dispatches tasks sequentially via Agent tool. Task tracking still uses TaskCreate/TaskUpdate. The pipeline is slower but functionally identical.

Prompt Templates

./acceptance-test-writer-prompt.md — Phase 1 acceptance test generation
./prd-writer-prompt.md — Phase 1 PRD generation from design doc
./plan-writer-prompt.md — Phase 2 plan writer dispatch
./plan-reviewer-prompt.md — Phase 2 plan reviewer dispatch
./build-implementer-prompt.md — Phase 3 implementer dispatch
./build-reviewer-prompt.md — Phase 3 reviewer dispatch
./cleanup-prompt.md — Phase 3 de-sloppify cleanup dispatch
./test-gap-writer-prompt.md — Phase 3 test gap writer dispatch
./architecture-reviewer-prompt.md — Mid-plan checkpoint
./contract-test-writer-prompt.md — Phase 1 refactor-mode contract test generation
./refactor-implementer-addendum.md — Phase 3 refactor-mode implementer addendum (appended to build-implementer-prompt)

Red-team, innovate, adversarial tester, and inquisitor prompts live in their respective skills:

crucible:red-team — skills/red-team/red-team-prompt.md
crucible:innovate — skills/innovate/innovate-prompt.md
crucible:adversarial-tester — skills/adversarial-tester/break-it-prompt.md
crucible:inquisitor — skills/inquisitor/inquisitor-prompt.md

Quality Gate Orchestration

Gate points in the pipeline:

Contract-Aware Quality Gate

Version check: Before processing a contract, verify the version field is "1.0". If the version is missing or unrecognized, reject the contract with a clear error: "Contract version [X] is not supported. Expected version 1.0." Do not proceed with contract-aware checks — fall back to standard quality gate behavior without contract awareness.
Checkable invariant verification: For each checkable invariant in the contract, verify satisfaction using the declared check_method:
- grep — pattern match (or absence) in production code. Run the grep and confirm the result matches the invariant's verification description.
- code-inspection — read and reason about the relevant code to confirm the invariant holds (e.g., idempotency, no side effects).
- file-structure — check that file existence, location, or organization matches the constraint.
Testable invariant verification: For each testable invariant in the contract:
- Verify that a test tagged with the declared test_tag (pattern: contract:<category>:<id>) exists in the test suite.
- Verify that the tagged test passes when run.
- A missing or failing tagged test is a contract violation.
Contract violations are blocking issues. Contract violations are NOT warnings — they have the same severity as architectural concerns and must be resolved before the gate passes. The quality gate's iterative fix loop applies: dispatch fixes, re-check, track progress/stagnation as normal.

Red Flags

Skipping Compression State Block emission at checkpoint boundaries
Emitting a Compression State Block at a phase boundary (1→2, 2→3, 3→4) instead of writing a handoff manifest
Skipping the shed statement after a manifest write
Emitting a Compression State Block with stale or missing Key Decisions (decisions must be cumulative across all prior blocks)
Allowing the Goal field to drift across successive Compression State Blocks (must match original user request)
Exceeding 10 entries in the Key Decisions list without overflow-compressing the oldest
Skipping a REQUIRED quality gate because the task seems "small", "simple", or "trivial"
Self-assessing that a quality gate is unnecessary based on perceived task complexity
Rationalizing that quality-gate findings would be "minor" as justification to skip
Declaring a quality gate "done" after fixing findings without a clean verification round (fixing is not passing)
Short-circuiting the quality-gate iteration loop by assuming fixes are self-evidently correct
Interpreting general user feedback as approval to skip a quality gate that has not yet run — once a gate has run and presented findings to the user, the user's decision to proceed is authoritative. Pre-gate skip approval must be an unambiguous instruction specifically referencing the gate.
Treating session index summary as authoritative over CSB state (session index is supplementary narrative, CSB is authoritative state)

Integration

Required sub-skills:

crucible:design — Phase 1
crucible:finish — Phase 4
crucible:quality-gate — Iterative red-teaming at each quality gate point
crucible:red-team — Adversarial review engine (invoked by quality-gate)
crucible:innovate — Creative enhancement before quality gates
crucible:inquisitor — Full-feature cross-component adversarial testing (Phase 4, after temper, before quality-gate)

Recommended sub-skills:

crucible:forge — Feed-forward at Phase 1 start, retrospective at Phase 4 completion
crucible:cartographer-skill — Consult at Phase 1 start, load at Phase 3 dispatches, record at Phase 4
crucible:checkpoint — Shadow git checkpoints at pipeline boundaries (pre-design-gate, pre-plan-gate, pre-wave-N, pre-cleanup-task-N, pre-temper, pre-inquisitor, pre-impl-gate)

Recon/assay context: Inherits recon/assay context through /design (Phase 1). No direct dispatch. When design integrates recon, build benefits automatically. See #147 for rationale.

Phase 3 sub-skills (dispatched per-task):

crucible:test-coverage — Test alignment audit after each task's test quality review (staleness, dead tests, coincidence tests)

Implementer sub-skills:

crucible:test-driven-development — TDD within each task
crucible:source-driven-development — Detect → Fetch → Implement → Cite loop for non-trivial external API usage (≥ 5 LOC touching a detected framework); invoked by the implementer prompt's Source Consultation block. Recommended — skipped for pure internal refactors or trivial edits.

Contract consumption:

crucible:spec — Consumes contract YAML files produced by /spec (schema version 1.0). Contracts are read from docs/plans/*-contract.yaml and feed into pre-existing doc detection (Phase 1 Step 0), implementer dispatch (Phase 3), reviewer checks (Phase 3), and quality gate verification (all gate points). See crucible:spec/contract-schema.md for field definitions.

Adoption

raddue/build

$ install --global

Security Scan Results

SKILL.md

Build

Overview

Mock Dispatch Mode (eval-gate)

Cairn (Layer 3)

Tripwire Manifest Sweep (Layer 2)

Communication Requirement (Non-Negotiable)

Pipeline Discipline (Non-Negotiable)

Anti-Rationalization Table — build

Gate Ledger Protocol

PipelineID Generation

Ledger Format

Ledger Initialization

Run Isolation

Orphan Cleanup

Timestamps and File Operations

Enforcement Rules

Verdict Marker Verification

Skip Escape Hatch

State Machine

Compaction Recovery (Ledger)

Quality Gate Requirement (Non-Negotiable)

Pipeline Status

Write Triggers

Status File Format

Skill-Specific Body

Health State Machine

Inline CLI Format

Compaction Recovery

Compression State Block

Checkpoint Timing

Phase Handoff Manifest

Mode Detection

Mode Propagation

Compaction Recovery

Phase 1: Design (Interactive)

Step -1: Resume Detection and Pipeline-Active Marker

Step 0: Pre-Existing Doc Detection

Step 2: Innovate and Red-Team the Design

Step 2.5: Generate PRD

Step 3: Generate Acceptance Tests (RED)

Refactor Mode: Phase 1 Changes

Blast Radius Analysis

Acceptance Tests (Refactor Mode)

Proportionality Escape Valve

Phase Handoff: 1 → 2

Phase 2: Plan (Autonomous)

Step 1: Write the Plan

Step 2: Review the Plan

Step 3: Innovate and Red-Team the Plan

Phase Handoff: 2 → 3

Phase 3: Execute (Autonomous, Team-Based)

Step 0: Load Module Context for Subagents

Step 0.5: Gate Ledger — Phase 3 Start

Step 1: Create Team and Task List

Agent Teams Fallback

Step 2: Analyze Dependencies and Execution Order

Step 3: Execute Tasks

Contract-Aware Implementer Guidance

De-Sloppify Cleanup

Reviewer Model Selection (Lead Decides Per-Task)

Two-Pass Review Cycle

Review Tier Routing

Runtime Tier Escalation

Contract-Aware Reviewer Guidance

Test Alignment Audit

Test Gap Writer

Adversarial Tester

Iterative Review Loop

Verification Gates

Refactor Mode: Phase 3 Changes

Pre-Execution Coverage Check

Tiered Test Strategy

Coordinated-Atomic Execution

Phase 3 Adaptations for Existing Steps

Refactoring Rollback Policy