skills/build/SKILL.md
Use when starting any feature development, building new functionality, implementing a design, or going from idea to working code. Triggers on "build", "implement", "add feature", or any task requiring design-through-execution.
npx skillsauth add raddue/crucible buildInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.
All subagent returns use the Ledger Return Protocol. Every subagent returns exactly one Evidence Receipt per shared/return-convention.md; the orchestrator applies the two-tier receipt linter (Tier 1 structural + Tier 2 witness verification — full grammar in the shared convention) to every Task return before acting on the declared VERDICT. A lint failure is treated as structurally BLOCKED.
The orchestrator maintains a per-run Invariant Cairn at ~/.claude/projects/<project-hash>/memory/cairn/cairn-<run-id>.md per shared/cairn-convention.md. See the ## Cairn (Layer 3) section below for build-specific phase definitions, terminal condition, and mandatory-invariant categories.
End-to-end development pipeline: interactive design, autonomous planning with adversarial review, team-based execution with per-task code and test review. One command, idea to completion.
Announce at start: "I'm using the build skill to run the full development pipeline."
Session index event: At startup, if session indexing is active (session index path discoverable via glob), emit a skill_start event to the outbox: {"ts":"<now>","seq":0,"type":"skill_start","summary":"Starting /build for <user goal>","detail":{"skill":"build","goal":"<user goal>"}}. See skills/shared/session-index-convention.md for the outbox pattern.
Guiding principle: Quality over velocity. This pipeline produces correct, well-integrated, maintainable output — even if slower. Parallel execution is available for independent work, but sequential with quality gates is the default.
<!-- Trust framework: see [skills/getting-started/trust-hierarchy.md](../getting-started/trust-hierarchy.md). -->This mode exists for the skills/build/evals/ eval-gate harness. It is enabled iff CRUCIBLE_BUILD_EVAL_MOCK_DIR is set in the environment. Production runs MUST leave this variable unset, in which case this mode is a no-op and the orchestrator behaves exactly as if this section were not present.
Env-var contract. Three variables, all consumed only when CRUCIBLE_BUILD_EVAL_MOCK_DIR is set:
CRUCIBLE_BUILD_EVAL_MOCK_DIR=<path> — directory of canned subagent return receipts. Filenames follow <seq>-<template-name>.md with fallback <template-name>.md (e.g. 1-plan-writer.md, then plan-writer.md). Missing mock → halt immediately with a clear error; no silent fallthrough.CRUCIBLE_BUILD_EVAL_MODE=feature|refactor — pre-set answer to the Mode Detection prompt. When present, the orchestrator skips the AskUserQuestion call in Mode Detection and uses this value.CRUCIBLE_BUILD_EVAL_USER_INPUT_DIR=<path> — directory of canned user-input turns, named turn-<N>.md. Each AskUserQuestion call (other than Mode Detection, which uses CRUCIBLE_BUILD_EVAL_MODE) consumes the next sequential turn. If the next turn-file is missing, halt before proceeding — this is the b4 fixture's design: build correctly stops when it needs input it does not have.Substitution rule (defined ONCE here; referenced from intercept sites below): at every dispatch site, the dispatch file is STILL written to the normal dispatch dir (trace integrity preserved). Only the Task/Agent tool invocation is replaced — instead of invoking the tool, read $CRUCIBLE_BUILD_EVAL_MOCK_DIR/<seq>-<template-name>.md (or <template-name>.md) and treat its contents as the subagent's return receipt. Apply the normal receipt linter and manifest sweep as if the receipt had come from a live agent.
Boundary behavior. MockNotFound / MockUserInputMissing errors raised by the mock loader halt the build run with a clear stderr message. They do not silently fall through. The eval-gate harness detects these halts via on-disk artifacts (absent phase-handoff manifests, pipeline-active marker still at the original phase) — the harness does NOT catch these exceptions across the build-runtime boundary.
Pointer reminders. Sections that reference these env vars (Mode Detection, Phase 1 Step 2 / 2.5 / 3, Phase 2 Step 1 / 2, Phase 3 Step 3) contain short inline pointers back to this section. The substitution rule is defined here only; the pointers exist so a reader scanning a dispatch site doesn't need to re-derive the contract.
See also: skills/build/evals/README.md for harness usage; skills/shared/dispatch-convention.md for the dispatch-file protocol the substitution rule preserves.
The orchestrator maintains an Invariant Cairn per shared/cairn-convention.md. Build-specific bindings:
dispatches=/receipts=/verdict=, (c) single atomic Write advancing PHASE to phase N+1. Uses the phase handoff manifest (handoff-N-to-M.md) as input evidence for the invariants.active-run.md; leave cairn-<run-id>.md in place.noticed-not-touching entry that is correctness-critical for a future task or for post-merge review; every test-gap finding that the run chose to leave uncovered.Acknowledged: true.receipt-ledger.jsonl and the in-context Tripwire Manifest. Rule 1 local-repair is authorized for trailing-receipts-in-current-phase LEDGER under-count only.Starting with convention v1.1, every subagent returns a receipt carrying TRIPWIRE:, SUPERSEDES:, and (if the subagent dispatched children) TRIPWIRE-CHILD: lines. The full grammar and predicate vocabulary live in shared/return-convention.md. This section defines how the orchestrator uses them.
Manifest: After each Task return (post-lint), append one line to the in-context manifest:
<rcpt-sha256-prefix-12> <skill>/<dispatch-id> <verdict> TRIPWIRE: <predicates> [SUPERSEDED_BY=<prefix>] [keys=<skill>:<k>:<v>,…] [files=<path>:<h6>,…]
Extract keys= and files= discriminators at insertion time (severity-max, *-count CLAIM keys namespaced by skill; EDIT/WROTE paths with first 6 hex of post-edit hash). Truncate each list at 8; overflow becomes more=<N> and forces mandatory fire on peer-dispatch-disagrees.
Sweep (the dispatch-loop clause): The orchestrator MAY NOT dispatch the next subagent until it has:
SUPERSEDES: — marked each cited predecessor SUPERSEDED_BY=<new-prefix>.SUPERSEDED_BY=*) prior manifest entry, over the union of that entry's TRIPWIRE and TRIPWIRE-CHILD predicate sets:
claims-touch(glob) / wrote(glob) / read(glob) — path-glob match against the new receipt's TRACE or CLAIMS citations.suspicion>=N — new receipt's SUSPICION ≥ N.peer-dispatch-disagrees(<dim>) — same-skill, same-target, discriminator mismatch (evaluated via manifest keys=/files=; more= overflow → mandatory fire).always — fires unconditionally.Read M's full receipt from disk and narrate the re-read: "tripwire <predicate> on <M-prefix> fired from <new-prefix>; re-read M."Supersession fix-flow. A fix-agent dispatched after a FAIL receipt normally supersedes that FAIL. Its receipt MUST cite the FAIL's hash-prefix in SUPERSEDES: and in at least one CLAIM from=<prefix>#…, AND its WITNESS must be kind ∈ {exec, grep} with ran=TRACE#N (not SKIPPED/UNRUNNABLE). Tier-2 then verifies the witness — supersession only survives if the original failure no longer reproduces.
Mandatory-work declarations for build's subagent types (add to each dispatch template's ## Return Format section):
run-tests, apply-edits.run-blast-radius-tests, apply-edits.read-artifact, emit-findings.read-diff, emit-recommendation.read-design, emit-artifact.run-tests, emit-tests.Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.
Every status update must include:
After compaction: If you just experienced context compaction, re-read the task list from disk and output current status before continuing. Do NOT proceed silently.
Examples of GOOD narration:
"Phase 3, Task 4 complete. Reviewer found 2 Important issues — dispatching implementer to fix. Tasks: [1] ✓ [2] ✓ [3] ✓ [4] fixing [5-8] pending"
"Phase 2 complete. Plan passed review with 0 issues on round 2. Dispatching innovate on the plan."
This requirement exists because: Long-running autonomous pipelines can run for hours. Without narration, the user sees nothing but a spinner. They can't assess progress, can't decide whether to intervene, and can't learn from the pipeline's decisions.
NEVER skip quality gate steps. Every artifact must pass its quality gate before proceeding to the next phase. No exceptions, no shortcuts.
BLOCK semantics: Phase transitions are gated. You CANNOT proceed from Phase 1→2, 2→3, or 3→4 without the gate for that phase passing. If a gate fails, fix the issues and re-run the gate. Do not silently skip a gate because "it looks fine" or "we already reviewed it."
If you find yourself about to skip a gate: STOP. Re-read this section. The gate exists because skipping it has caused real production incidents and hours of wasted time. Run the gate.
| Rationalization | Rebuttal | Rule |
|---|---|---|
| "This task is small/simple/trivial, the quality gate would just find nits." | Small changes have the same bug density per line as large ones. QG has never run on a Crucible artifact without finding at least one real issue. | Run the quality gate on every phase artifact, regardless of size. |
| "Phase N looks fine, I can skip the gate and move on." | Self-assessment of artifact quality is exactly the bias the gate exists to counter. "Looks fine" is the failure mode, not a pass criterion. | Phase transitions are BLOCKED without a verified PASS verdict marker for the prior phase. |
| "The fix agent addressed the findings, so the gate is done." | Fixing is not passing. Fix rounds routinely introduce new issues or incompletely resolve old ones. A clean verification round is required. | The gate is only complete after a fresh red-team round returns 0 Fatal, 0 Significant. |
| "The user said 'looks good' / 'move on' — that's approval to skip the gate." | General feedback is not skip approval. Only an unambiguous instruction that explicitly references the gate counts. | Require literal SKIP GATE (or equivalent explicit phrase) before recording Status: SKIPPED. |
| "I can fix this one finding myself instead of dispatching a fix agent." | Orchestrator-applied fixes conflate coordination with remediation and bypass the fix journal. Every fix — even trivial — goes through a fix agent. | Orchestrator never edits the artifact directly; always dispatch the fix agent. |
| "Innovate/red-team seem redundant on top of the quality gate, I'll skip them." | They are not redundant. Innovate is divergent; red-team is adversarial; QG is iterative remediation. Skipping any one of them is a documented regression (feedback_never_skip_gates). | Run innovate and red-team on every artifact, every time. |
| "I'll just finish the task list and narrate at the end." | Long-running autonomous pipelines are invisible without narration. Silent runs prevent the user from intervening or learning. | Narrate before every dispatch and after every completion — non-negotiable. |
Tamper-evident audit trail for phase transitions and gate verdicts. This is defense-in-depth — it raises the cost of gate-skipping from zero to nonzero by requiring structured state to be maintained and verified. An external enforcement hook (gate-ledger-guard.sh) provides mechanical enforcement by blocking unauthorized PASS writes.
File location: ~/.claude/projects/<project-hash>/memory/build-gate-ledger.md
Relationship to pipeline-status.md: pipeline-status.md is ambient user awareness (overwritten at every narration point). build-gate-ledger.md is the gate verdict audit trail (updated per phase as gates pass). Both are needed; neither replaces the other.
At pipeline start, generate a PipelineID via date -u +build-%Y%m%d-%H%M%S. This ID:
pipeline_id# Build Gate Ledger
Run: <ISO-8601 timestamp>
PipelineID: <build-YYYYMMDD-HHMMSS>
Goal: <user request>
Mode: <feature | refactor>
## Phase 1: Design
Status: NOT_STARTED
## Phase 2: Plan
Status: NOT_STARTED
## Phase 3: Execute
Status: NOT_STARTED
## Phase 4: Completion
Status: NOT_STARTED
Format constraints:
Key: valueStatus, Gate, Artifact, Tasks, Reason, Acknowledged, PipelineIDNOT_STARTED, IN_PROGRESS, PASS, COMPLETE, FAIL, SKIPPED, INFERRED## Phase N: Name — always 4 phases, always in orderRuns during build startup, after mode detection but before Phase 1 begins:
Run, PipelineID, Goal, and Mode header fields, then all four phases with Status: NOT_STARTEDIN_PROGRESSAfter writing any ledger (fresh or reconstructed), immediately re-read the ledger header to extract the PipelineID into the active in-memory state. This is a defensive consistency practice — ensures the in-memory value always matches the persisted value.
Stale detection prevents cross-run contamination:
Started timestamp matches the ledger's Run timestamp, this is the same build run recovering from compaction. Auto-resume without prompting.Started timestamp doesn't match the ledger's Run, prompt: "Found existing ledger for '[goal]' (started [timestamp], Phase N [status]). Resume this run? [y/n]". On "no", archive the old ledger via Bash mv to build-gate-ledger-<old-timestamp>.md. If the target filename already exists, append a counter suffix (-2, -3, etc.).Requires: Active PipelineID established (from Ledger Initialization + Run Isolation). This step runs AFTER the resume/fresh decision is resolved.
Scan ~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md for verdict markers. Delete any whose PipelineID does not match the active PipelineID. If resuming: use the resumed build's PipelineID (from the existing ledger). If starting fresh: use the newly generated PipelineID.
Note: If the session is recovering via INFERRED reconstruction (new PipelineID generated), markers from the old run will be cleaned up. This is intentional — the design requires a fresh QG run for INFERRED→PASS upgrade, not reuse of old markers.
date -u +%Y-%m-%dT%H:%M:%S (Bash is allowed for date commands that don't reference .claude/ paths)mv since Write/Read/Edit/Glob have no rename capability.claude/ path restrictions.Before each phase transition, read build-gate-ledger.md and check the previous phase's status:
Gate check: If the previous phase's Status is NOT in {PASS, COMPLETE (Phase 3 only), SKIPPED with Acknowledged: true}, output:
PHASE GATE BLOCKED: Cannot start Phase N — Phase N-1 gate has not passed.
Current state: [status]
Run the quality gate on Phase N-1's artifact before proceeding.
This means INFERRED, IN_PROGRESS, FAIL, and NOT_STARTED all trigger BLOCKED.
Phase 1 exception: Phase 1 (Design) has no predecessor gate — it always starts.
Phase 3 exception: Phase 3 transitions to COMPLETE (not PASS) when all tasks are done and per-task code reviews pass. COMPLETE satisfies the gate requirement for Phase 4. No verdict marker is required for Phase 3.
After quality-gate returns with a verdict, verify the verdict marker before writing to the ledger:
~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.mdPipelineID match — only markers with the current build's PipelineIDTimestamp field value inside the marker file (parsed as ISO-8601), take the most recentVerdict is PASS, PipelineID matches current build's PipelineIDPASS to the ledger with Gate timestamp and Artifact pathIf the user explicitly wants to bypass a gate:
Example of a SKIPPED phase in the ledger:
## Phase 2: Plan
Status: SKIPPED
Gate: 2026-04-13T15:00:00
Reason: User requested skip
Acknowledged: true
Confirmation protocol: [Default: option (a) — separate-turn required, matching the design doc's two-step flow. User may override to option (b) before implementation.]
SKIP GATE to confirm. This will be logged."SKIP GATE. A SKIP GATE token in the same message as the skip request does NOT satisfy the confirmation requirement.Status: SKIPPED with Reason field to the ledger.Per-phase acknowledgment: SKIPPED requires one acknowledgment per phase, not per boundary. Before starting Phase N, the orchestrator checks all prior phases. Any prior phase with Status: SKIPPED that has not yet been Acknowledged: true triggers the BLOCKED message. The user types SKIP GATE once per skipped phase, and the ledger records Acknowledged: true. Subsequent boundaries do not re-prompt for already-acknowledged skips.
Missing artifact handling: If a phase was SKIPPED because no artifact was produced, retroactive gating requires the user to supply the artifact path: "To run the gate on Phase N, provide the artifact path." If no artifact exists, retroactive gating is not possible — the phase remains SKIPPED.
Recovery from SKIPPED: If the user later wants to properly gate a skipped phase, they can ask to "run the gate on Phase N." The orchestrator transitions SKIPPED → IN_PROGRESS, runs the quality gate on the phase's artifact, and writes the result normally.
Phase 4 completion warning: If ANY prior phase has Status: SKIPPED, Phase 4 outputs a prominent warning listing all skipped gates before presenting finish options.
Phase 1: Design
NOT_STARTED → IN_PROGRESS (design skill starts)
IN_PROGRESS → PASS (quality gate verdict marker verified)
IN_PROGRESS → FAIL (quality gate escalates — stagnation/regression)
FAIL → IN_PROGRESS (user directs re-work)
* → SKIPPED (user types SKIP GATE — does NOT unlock next phase without acknowledgment)
SKIPPED → IN_PROGRESS (user asks to run the gate retroactively)
INFERRED → IN_PROGRESS (user runs gate after compaction recovery)
INFERRED → SKIPPED (user types SKIP GATE after compaction recovery)
Phase 2: Plan
NOT_STARTED → IN_PROGRESS (requires Phase 1 Status = PASS or SKIPPED+Acknowledged)
[same transitions as Phase 1]
Phase 3: Execute (no quality gate — uses COMPLETE instead of PASS)
NOT_STARTED → IN_PROGRESS (requires Phase 2 Status = PASS or SKIPPED+Acknowledged)
IN_PROGRESS → COMPLETE (all tasks done, per-task reviews passed, verification gates green)
IN_PROGRESS → FAIL (task failures, user escalation)
FAIL → IN_PROGRESS (user directs re-work)
* → SKIPPED (user types SKIP GATE)
SKIPPED → IN_PROGRESS (user asks to run retroactively)
Note: Phase 3 has no QG invocation. COMPLETE satisfies Phase 4's gate requirement.
Phase 4: Completion
NOT_STARTED → IN_PROGRESS (requires Phase 3 Status = COMPLETE or SKIPPED+Acknowledged. PASS is unreachable for Phase 3.)
IN_PROGRESS → PASS (quality gate verdict marker verified)
IN_PROGRESS → FAIL (quality gate escalates)
FAIL → IN_PROGRESS (user directs re-work)
* → SKIPPED (user types SKIP GATE)
SKIPPED → IN_PROGRESS (user asks to run retroactively)
IN_PROGRESS includes: emit skip warnings if any prior phase SKIPPED
build-gate-ledger.md is on disk and survives compaction. Recovery precedence when state is partial:
INFERRED (not PASS). Mark predecessor phases as PASS (handoff existence proves the boundary was crossed). Generate a new PipelineID and write it to the reconstructed ledger header. After writing, re-read the ledger header to extract the PipelineID into active state. INFERRED phases trigger the gate-blocked check — the orchestrator must run a fresh quality gate (with matching PipelineID) or the user must type SKIP GATE.Every quality gate in this pipeline MUST run to completion. This is NOT optional — you may NOT self-assess whether a quality gate is "needed" based on task size, complexity, or scope.
Quality gates are unconditional at all three gate points:
Common rationalizations that are NEVER valid reasons to skip:
This requirement exists because: Quality gates consistently find issues the pipeline misses regardless of task size. There is no category of task that is immune. In observed runs, tasks self-assessed as "trivial" had the same defect rate as complex tasks. The only way to skip a quality gate is with explicit user approval — an unambiguous instruction specifically referencing the gate, not general feedback like "looks good" or "move on."
Write a status file to ~/.claude/projects/<hash>/memory/pipeline-status.md at every narration point. This file is overwritten (not appended) and provides ambient awareness for the user in a second terminal.
Write the status file at every point where the Communication Requirement mandates narration: before dispatch, after completion, phase transitions, health changes, escalations, and after compaction recovery.
The status file uses this structure (overwritten in full each time):
# Pipeline Status
**Updated:** <current timestamp>
**Started:** <timestamp from first write — persisted across compaction>
**Skill:** build
**Phase:** <current phase, e.g. "3 — Execute (Autonomous)">
**Health:** <GREEN|YELLOW|RED>
**Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED>
**Elapsed:** <computed from Started>
## Recent Events
- [HH:MM] <most recent event>
- [HH:MM] <previous event>
(last 5 events, newest first)
Append after the shared header:
## Task Progress
| # | Task | Tier | Status | Duration |
|---|------|------|--------|----------|
| 1 | Auth middleware | T3 | DONE | 12m |
| 2 | Route handlers | T2 | IN REVIEW (code, pass 1) | 18m+ |
| 3 | Database layer | T1 | PENDING | — |
## Quality Gates
- Design: PASSED (2 rounds)
- Plan: PASSED (1 round)
- Task tiers: 1x T1, 1x T2, 1x T3
- Code: not yet reached
## Checkpoints
- Last checkpoint: pre-wave-3 (12:45:30)
- Total checkpoints: 7
- Shadow repo: healthy
## Compression State
Goal: [original user request]
Key Decisions:
- [accumulated decisions, max 10]
Active Constraints:
- [constraints affecting remaining work]
Next Steps:
1. [immediate next action]
2. [subsequent actions]
The Compression State section is a semantic subset of the full Compression State Block emitted into the conversation. It omits Files Modified (recoverable from git) and Scratch State (fixed per skill). It is the first section read during compaction recovery.
Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.
When health is YELLOW or RED, include **Suggested Action:** with a concrete, context-specific sentence (e.g., "Code review looping on Task 4. Check recent events for recurring patterns.").
Output concise inline status alongside the status file write:
Phase 3 [4/8] Task 4 IN REVIEW (pass 1) | GREEN | 1h 12m--- separatorsAfter compaction, before re-writing the status file:
0. Read the ## Compression State section from pipeline-status.md — recover Goal, Key Decisions, Active Constraints, and Next Steps. If the section is absent (pre-update pipeline), skip to step 1.
0.5. Check for handoff manifests (handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs, Decisions, and Constraints to reconstruct state for the current phase — this supersedes the Compression State section for phase-boundary recovery. If no manifest exists, continue with CSB-based recovery.
pipeline-status.md to recover Started timestamp and Recent Events bufferSession Index: path, or if globbing ~/.claude/projects/<hash>/memory/session-index/*/summary.md finds a recent file, read summary.md. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. If summary.md lacks detail for a specific event type (e.g., errors, decisions, file changes), use /recall to query events.jsonl with filters for targeted recovery.At checkpoint boundaries (see Checkpoint Timing below), emit the following structured block into the conversation. This block signals to the auto-compactor which state is critical to preserve. Also persist the semantic subset (Goal, Key Decisions, Active Constraints, Next Steps) to the ## Compression State section of pipeline-status.md.
===COMPRESSION_STATE===
Goal: [original user request, one sentence]
Skill: [skill name]
Phase: [current phase identifier]
Health: [GREEN|YELLOW|RED]
Mode: [skill-specific mode if applicable, omit otherwise]
Progress:
- [completed milestone 1]
- [completed milestone 2]
- [current work in progress]
Key Decisions (this session):
- [DEC-1] [decision]: [reasoning, one line]
- [DEC-2] [decision]: [reasoning, one line]
Active Constraints:
- [constraint that affects remaining work]
- [constraint from prior phase that still applies]
Files Modified:
- [file path]: [what changed, one line]
Scratch State:
- Location: [scratch directory path]
- Session Index: [~/.claude/projects/<hash>/memory/session-index/<session-id>/ if active, omit if not]
- Recovery: [which files to read first, in order]
Next Steps:
1. [immediate next action]
2. [action after that]
3. [remaining work summary]
===END_COMPRESSION_STATE===
Rules:
Emit a Compression State Block into the conversation AND update the ## Compression State section in pipeline-status.md at these points:
These triggers are a superset of the existing pipeline-status.md write triggers. The Compression State Block is emitted alongside (not instead of) the normal narration and status file write.
At phase boundaries (1→2, 2→3, 3→4), write a handoff manifest to the scratch directory instead of emitting a Compression State Block. The manifest defines exactly what the next phase needs — an allowlist, not a denylist. Everything not on the manifest is shed.
Format:
# Phase Handoff: N → M
**Timestamp:** ISO-8601
**Goal:** [original user request, verbatim]
**Mode:** feature | refactor
## Inputs for Phase M
- **[Input name]:** [disk path or inline value]
## Decisions Carried Forward
- [DEC-N] [decision]: [reasoning, one line]
## Active Constraints
- [constraint affecting remaining work]
## Shed Receipt
- [what was shed] → [where it lives on disk]
Rules:
## Compression State section in pipeline-status.md with the manifest contents (Goal, Decisions, Constraints, and the Inputs as Next Steps). This ensures compaction recovery can reconstruct state even if the manifest is lost.Before dispatching the design skill, determine whether this build is:
Detection: If the user's intent is ambiguous, ask directly before proceeding:
"Is this adding new behavior, or restructuring existing code without changing what it does?"
The user's answer sets the mode for the entire pipeline. No special syntax needed.
Eval-gate pointer (Mock Dispatch Mode): if
CRUCIBLE_BUILD_EVAL_MODEis set, use its value (featureorrefactor) as the mode-detection answer and skip the AskUserQuestion call. The substitution rule lives in the## Mock Dispatch Mode (eval-gate)section near the top of this file.
Propagate refactor mode to subagents through:
contract-test-writer-prompt.md and refactor-implementer-addendum.md are standalone files used only in refactor mode. Select these instead of (or in addition to) the feature-mode equivalents.plan-writer-prompt.md, build-implementer-prompt.md), append a "Refactor Mode Context" section when composing the dispatch file. The templates remain flat markdown — the orchestrator decides what to include./tmp/crucible-build-mode.md containing mode: refactor or mode: feature plus the baseline commit SHA. Only one build runs per session, so a well-known filename is sufficient.Build's existing compaction step must read the Compression State FIRST (step 0 from Pipeline Status Compaction Recovery), then the mode file, before re-reading the task list or any other state. On resumption after compaction:
## Compression State from pipeline-status.md — recover goal, decisions, constraints, next steps.
0.5. Check for handoff manifests (handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs and Mode to bootstrap recovery — this supersedes the mode file for phase-boundary state./tmp/crucible-build-mode.md — recover mode and baseline commit SHA.refactor: Verify baseline commit SHA exists.build-gate-ledger.md — if it exists, apply Gate Ledger Compaction Recovery (see Compaction Recovery subsection under Gate Ledger Protocol). Use the ledger's phase statuses to determine the resume point. If the ledger is missing but handoff manifests exist, reconstruct with INFERRED status.Before any design or dispatch work, check for a crashed prior pipeline:
<scratch>/.pipeline-active (where <scratch> is ~/.claude/projects/<hash>/memory/)pipeline_id set to current session ID, skill set to "build", phase set to "1", start_time set to current ISO-8601 timestamp, scratch_dir set to the scratch directory path, dispatch_dir set to the dispatch directory path, branch from git branch --show-current, baseline_sha from git rev-parse HEAD). Proceed to Step 0.pipeline_id as current session: This is a compaction recovery scenario. Follow existing compaction recovery procedures. Do not re-write the marker.pipeline_id:
a. Branch guard: Compare marker's branch field against current git branch --show-current. If they differ, warn: "Previous build on branch [marker.branch] crashed at Phase [phase]. You are currently on [current-branch]. Switch to [marker.branch] before resuming? [switch+resume / start fresh / abort]". Do NOT offer resume on the wrong branch.
b. Read manifest.jsonl from the marker's dispatch_dir (or from the scratch directory copy if /tmp was lost)
c. Identify the last successful phase boundary by scanning manifest entries grouped by phase. A phase boundary is verified when all dispatches in that phase have status: "completed".
d. Present resume option to the user:
"Previous build on branch [marker.branch] crashed at Phase [N], [context]. Resume from [last good boundary] ([checkpoint reason], [estimated time preserved] of work preserved)? [yes / no / fresh]" e. User accepts: Invoke
crucible:replayin resume mode, passing the scratch directory path. The replay skill handles checkpoint restore, state reconstruction, and re-dispatch. The build pipeline does not continue -- replay takes over. f. User declines (fresh): Delete the stale.pipeline-activemarker. Write a fresh marker with the current session. Proceed to Step 0 as a new pipeline run.
Marker updates during pipeline: Update the phase field in .pipeline-active at each phase boundary (1->2, 2->3, 3->4) to track progress for crash detection.
Marker cleanup: Delete .pipeline-active at Phase 4 step 12 (after finish skill completes).
Gate Ledger Initialization: After the pipeline-active marker is written (or recovered) and mode detection is complete, run the Gate Ledger Protocol's Ledger Initialization and Orphan Cleanup steps. The ledger must exist before Phase 1 transitions to IN_PROGRESS.
Compass Arc Emit (build orchestrator only — D14):
<!-- CANONICAL: shared/compass-protocol.md -->After Gate Ledger Initialization completes AND the resume decision at Step -1 has resolved, emit the current arc to docs/compass.md — but ONLY on a fresh-start or fresh-restart path. Skip this emit if the user accepted the resume path (Step -1e: replay took over), because the prior arc's current_arc is already correct. Do NOT place this emit inside crash-recovery branches (Step -1, items 3 or 4e), as those fire mid-resume-detection and can clobber current_arc before replay restores the prior arc.
RESUME_DECISION is set by Step -1 to one of fresh / resume / fresh-restart. Default fresh if unset.
if [ "${RESUME_DECISION:-fresh}" != "resume" ]; then
python scripts/compass.py update --field current_arc --value "#<ticket>: <user-goal-one-liner>" \
|| echo '[compass] emit failed at arc start; continuing build' >&2
fi
Replace <ticket> with the GitHub issue number (e.g. 273) and <user-goal-one-liner> with a short, precise description of the task at hand (e.g. Compass arc-state skill). The leading # is required — compass update raises ValueError on values missing the #NNN: prefix.
Error policy (best-effort): Compass is an optimization, not a correctness layer. A failed emit MUST NOT fail the build pipeline — log to stderr and continue. Never tighten this error handling.
D14 invariant: Sub-agents spawned inside build do NOT emit compass updates. This emit fires from the build orchestrator only, exactly once per fresh pipeline start.
Before running interactive design, check whether /spec (or a prior /build run) already produced design artifacts for this ticket.
Scan for pre-existing spec docs: Search docs/plans/ for design docs (*-design.md) with a matching ticket field in YAML frontmatter. Also check for corresponding *-implementation-plan.md and *-contract.yaml files with the same ticket field.
Conflict detection: If multiple design docs match the same ticket field, escalate to user: "Found multiple design docs for ticket #NNN: [list files]. Which should I use?" Do not proceed until the user resolves the conflict.
Full match (design doc + implementation plan + contract all present):
security_review field, note it in the Phase 1→2 handoff manifest under Active Constraints: "Contract requires security review (security_review.status: [required|recommended]) — siege will be evaluated in Phase 4 Step 5.5." This ensures the directive survives phase handoffs and compaction recovery.Partial match (design doc present but implementation plan or contract missing):
Not found: Proceed with normal Phase 1 (interactive design below).
Eval-gate pointer (Mock Dispatch Mode): when
CRUCIBLE_BUILD_EVAL_MOCK_DIRis set, allUse crucible:<skill>andDispatch a <kind> subagentinvocations in Phase 1 (design, innovate, quality-gate, PRD writer, acceptance test writer, contract test writer) substitute a disk-read from the mock dir for the Task tool invocation. Each substitution follows the substitution rule in the## Mock Dispatch Mode (eval-gate)section. AskUserQuestion calls in Phase 1 useCRUCIBLE_BUILD_EVAL_USER_INPUT_DIRper that same section.
After the user approves the design and before starting Phase 2:
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-design-gate" before dispatching innovate and quality-gate on the design doc.
crucible:innovate on the design doc. Plan Writer incorporates the proposal.Phase: design and PipelineID: <current PipelineID>. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.)After the design doc is finalized (Step 2 complete), generate a stakeholder-facing PRD:
./prd-writer-prompt.md
docs/prds/YYYY-MM-DD-<topic>-prd.mddocs: add PRD for [feature]This step runs by default. The PRD is a reformatting of the design doc for non-technical stakeholders — it does not introduce new decisions or requirements. Skip only in refactor mode (refactoring has no stakeholder-facing PRD).
Before planning, define "done" with executable tests:
./acceptance-test-writer-prompt.md
test: add acceptance tests for [feature] (RED)These tests define the feature-level RED-GREEN cycle that wraps the entire pipeline. The pipeline is done when these tests pass.
When in refactor mode, Phase 1 shifts from "what should we build?" to "what are we changing and what could break?"
After the user describes the refactoring intent, the design phase:
### Impact Manifest
**Target:** [what's being restructured]
**Structural goal:** [what the code should look like after]
**Direct consumers:** N files
- path/to/consumer1.py (calls TargetClass.method)
- path/to/consumer2.py (imports TargetClass)
**Indirect dependents:** N files
- path/to/dependent.py (depends on consumer1)
**Test coverage:**
- N tests directly exercise target behavior
- N tests exercise consumers
- Gap: no tests cover [specific seam]
**Risk assessment:** [Low/Medium/High] based on consumer count and coverage gaps
**Confidence:** [High/Medium/Low] — High if cartographer used, Medium/Low if fallback
When confidence is Low, require explicit user confirmation before proceeding. The user must review the impact manifest and confirm the blast radius is complete.
Instead of writing NEW acceptance tests (Step 3 above), the pipeline:
./contract-test-writer-prompt.md — a single agent handles gap identification AND gap filling. Input: impact manifest + blast radius file list. The agent maps existing tests to behavioral seams, identifies untested seams, and writes contract tests for each gap.test: add contract tests for [target] refactoring (GREEN — locking existing behavior)Contract test writing must remain proportional to the refactoring scope. Trigger a scope check when any of these thresholds are hit:
When triggered:
The impact manifest records which gaps the user chose to leave uncovered.
Before dispatching the Plan Writer, verify the gate ledger and write a handoff manifest:
build-gate-ledger.md and verify Phase 1 Status is PASS. If not, follow Enforcement Rules.handoff-1-to-2.md with:
## Compression State in pipeline-status.md with manifest contents.phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 1 -> Phase 2 (Plan)","detail":{"skill":"build","from":"1","to":"2"}}.Eval-gate pointer (Mock Dispatch Mode): when
CRUCIBLE_BUILD_EVAL_MOCK_DIRis set, the Plan Writer, Plan Reviewer, innovate, and quality-gate dispatches in this phase use the mock-dir substitution rule defined in the## Mock Dispatch Mode (eval-gate)section. The substitution does not change Phase 2's structure or the gate-ledger writes.
Dispatch a Plan Writer subagent (Opus):
crucible:planning formatdocs/plans/YYYY-MM-DD-<topic>-implementation-plan.mdUse ./plan-writer-prompt.md template for the dispatch prompt.
Dispatch a Plan Reviewer subagent:
Reviewer model selection:
Review protocol (iterative):
Use ./plan-reviewer-prompt.md template for the dispatch prompt.
After the plan passes review:
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-plan-gate" before dispatching innovate and quality-gate on the plan.
crucible:innovate on the approved plan. Plan Writer incorporates the proposal into the plan.Phase: plan and PipelineID: <current PipelineID>. Provides the plan and design doc as context. (Non-negotiable — see Quality Gate Requirement.)The quality gate handles the iterative red-team loop — fresh review each round, weighted stagnation detection, 15-round safety limit, escalation. See crucible:quality-gate for details.
Before creating the team and task list, write a handoff manifest. Step 3.4 above already verified the verdict marker, wrote PASS to the ledger, and deleted the marker. The handoff manifest is written AFTER the ledger PASS — this sequencing ensures compaction recovery finds a consistent state (ledger shows PASS, handoff exists).
Write a handoff manifest:
handoff-2-to-3.md with:
## Compression State in pipeline-status.md with manifest contents.phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 2 -> Phase 3 (Execute)","detail":{"skill":"build","from":"2","to":"3"}}.Eval-gate pointer (Mock Dispatch Mode): when
CRUCIBLE_BUILD_EVAL_MOCK_DIRis set, all per-task dispatches in this phase (implementer, reviewer, cleanup, test-coverage, test-gap-writer, adversarial-tester, architecture-reviewer) use the mock-dir substitution rule defined in the## Mock Dispatch Mode (eval-gate)section. TeamCreate and TaskCreate calls run normally — only the Task/Agent tool invocations on teammates are substituted.
RECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (load mode) — when dispatching implementers and reviewers, include relevant module files, conventions.md, and landmines.md in their dispatch files
Defect signature loading (for implementers only):
defect-signatures/*.md (excluding *.non-matches.md) from the cartographer storage directoryModules field and match against the task's target modules:
Path: fieldPath: valueModules directory prefixes[DEFECT_SIGNATURES] section of build-implementer-prompt.md:
Last loaded update: Loading is pure-read. After all implementer dispatches for the current phase complete, batch-update the Last loaded field to today on all signatures that were loaded. Do NOT update during dispatch — defer to after all subagents are dispatched.Grudge pre-flight (regression-oracle, #271): Before dispatching implementers, query the Book of Grudges for each task's in-scope files and inject any matches into that implementer's dispatch file as a hard DO NOT REPEAT constraint (sibling to defect-signature loading). Resolve the helper by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/grudge_query.py" <task files…>; non-empty output lists past regressions held against those files. Best-effort: if the helper is unresolved, emit a one-line stderr warning and continue — a missing pre-flight must NEVER block the build. See skills/grudge/SKILL.md.
Write Phase 3 IN_PROGRESS to the gate ledger (after Phase 2 PASS verification).
Create a team using TeamCreate:
team_name: "build-<feature-name>"
description: "Building <feature description>"
Read the approved plan. Create tasks via TaskCreate for each plan task, including:
TaskUpdate with addBlockedByIf TeamCreate fails (agent teams not available), output a clear one-time warning:
⚠️ Agent teams are not available. Recommended: set
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1Falling back to sequential subagent dispatch via Agent tool.
Then fall back to sequential subagent dispatch via the regular Task tool (without team_name). Everything still works — independent tasks run sequentially instead of in parallel via teammates.
What changes in fallback mode:
Agent tool instead of as teammatesTaskCreate/TaskUpdate for state managementBefore dispatching:
For each task (or wave of parallel tasks):
RECOMMENDED SUB-SKILL: Before dispatching each execution wave, use crucible:checkpoint — create checkpoint with reason "pre-wave-N" (where N is the wave number). This captures the working directory state after the prior wave's verification gate passed.
in_progress via TaskUpdateteam_name and subagent_type="general-purpose"
./build-implementer-prompt.md template./build-reviewer-prompt.md templateReview-Tier from plan metadata.
When a contract YAML exists for the current ticket (detected during Step 0 or produced by /spec), the implementer receives the contract alongside the design doc and task description. The contract uses the schema defined in crucible:spec/contract-schema.md (version 1.0). Implementers must treat contract elements as follows:
api_surface declarations are binding. The implementer must match the declared function signatures, class interfaces, endpoint shapes, parameter names, types, and return types exactly. Deviations from the contract's API surface are implementation errors.
checkable invariants are binding. The implementer must satisfy all declared constraints (e.g., "must not import X", "must be idempotent"). The check_method field (grep, code-inspection, file-structure) indicates how the quality gate will verify compliance — the implementer should self-check against these before committing.
testable invariants require tagged tests. For each testable invariant, the implementer must write a test tagged with the declared test_tag (pattern: contract:<category>:<id>) that validates the invariant. These tests are checked by the quality gate and reviewers — they must exist and pass.
integration_points are informational. These indicate which other components and contracts this ticket interacts with. The implementer should be aware of referenced components and ensure compatibility, but integration points are not binding constraints — they provide context for making good implementation decisions.
After the implementer reports completion and before dispatching the reviewer:
RECOMMENDED: Use crucible:checkpoint — create checkpoint with reason "pre-cleanup-task-N" before dispatching the cleanup agent. If cleanup removes something needed, restore to this checkpoint.
./cleanup-prompt.md
git diff <pre-task-sha>..HEAD (the implementer's committed changes)refactor: cleanup task N implementation| Task Complexity | Reviewer Model | |----------------|----------------| | Low (1-3 files, straightforward) | Sonnet | | Medium (3-6 files, some cross-system) | Lead decides (default Opus) | | High (6+ files, refactoring, deep chains) | Opus | | When in doubt | Opus |
Each task gets TWO review passes before completion:
digraph review {
"Implementer builds + tests" -> "De-sloppify cleanup";
"De-sloppify cleanup" -> "Pass 1: Code Review";
"Pass 1: Code Review" -> "Implementer fixes code findings";
"Implementer fixes code findings" -> "Pass 2: Test Quality Review";
"Pass 2: Test Quality Review" -> "Implementer fixes test findings";
"Implementer fixes test findings" -> "Test Alignment Audit (crucible:test-coverage)";
"Test Alignment Audit (crucible:test-coverage)" -> "Test Gap Writer";
"Test Gap Writer" -> "Adversarial Tester";
"Adversarial Tester" -> "Task complete";
}
Pass 1 — Code Review: Architecture, patterns, correctness, wiring (actually connected, not just existing?)
Pass 2 — Test Quality Review: Test independence? Determinism? Edge cases? Integration tests where mocks are masking real behavior? AAA pattern? Correct test level? (Staleness and alignment checks are handled by the test-coverage dispatch below.)
Each task's Review-Tier (from the plan) determines which review steps execute. Phase 4 full-implementation gates are NOT affected by per-task tiers.
| Step | Tier 1 | Tier 2 | Tier 3 | |------|--------|--------|--------| | Implementer | Yes | Yes | Yes | | De-sloppify cleanup | Yes | Yes | Yes | | Pass 1: Code review | Single pass | Iterative | Iterative | | Implementer fixes (code) | If findings | If findings | If findings | | Pass 2: Test quality review | SKIP | Single pass (non-iterative) | Iterative | | Implementer fixes (test) | SKIP | If critical findings only | If findings | | Test alignment audit | SKIP | SKIP | Yes | | Test gap writer | SKIP | SKIP | Yes | | Adversarial tester | SKIP | Yes | Yes |
Tier 1 "single pass" code review: Dispatch one reviewer. If findings are Clean, task is complete. If findings include Critical or Important issues, dispatch implementer to fix, then the task is complete (no re-review). If findings include an Architectural Concern, escalate as normal.
Tier 2 "single pass" test review: Dispatch one test quality reviewer. Report findings but do NOT enter the iterative review loop. If the single pass surfaces Critical findings, escalate the task to Tier 3 for full iterative treatment.
Tier 2 "iterative" code review: Same as current behavior -- fresh reviewer each round, track issue count, loop until clean or stagnation.
The orchestrator may escalate a task's review tier during execution. Escalation is one-directional (up only).
Triggers:
Process:
[timestamp] DECISION: review-tier | choice=escalate T1->T2 | reason=<trigger> | alternatives=noneWhen a contract YAML exists for the current ticket, reviewers receive the contract alongside the implementation and must add the following checks to both review passes:
API surface compliance: Do the implemented public interfaces match the api_surface declarations in the contract? Check function signatures, class interfaces, endpoint shapes, parameter names/types, and return types. Any deviation from the contract's declared API surface is a blocking finding.
Checkable invariant satisfaction: Are all checkable invariants satisfied per their declared check_method?
grep: verify the pattern match (or absence) in production codecode-inspection: read and reason about code to confirm the invariant holdsfile-structure: check file existence/organization matches the constraint
Any unsatisfied checkable invariant is a blocking finding.Testable invariant test existence: Does a test exist for each testable invariant, tagged with the correct test_tag (pattern: contract:<category>:<id>)? A missing tagged test is a blocking finding.
Test correctness: Do the tagged tests actually validate the invariant they claim to cover? A test that exists but does not meaningfully exercise the invariant (e.g., a trivially passing assertion, a test that tests something unrelated despite having the right tag) is a blocking finding.
Severity: All contract-related review findings are classified as blocking — the same severity as contract violations in the quality gate. Contract findings must be resolved before the task is marked complete.
After the implementer addresses Pass 2 findings, invoke crucible:test-coverage against the task's changes:
git diff <pre-task-sha>..HEADThe test-coverage skill audits existing tests for staleness (wrong assertions, misleading descriptions, dead tests, coincidence tests) and handles its own fix dispatch and revert-on-failure logic. It returns a structured report. Note: the diff includes review fix commits — the audit agent should focus on behavioral changes to source files, not changes that only touch test files.
Skip this step if the task made no behavioral source changes (only .md, .json, config files).
After test-coverage completes (or is skipped), dispatch a Test Gap Writer (Opus) using ./test-gap-writer-prompt.md:
test: fill coverage gaps for task NIf all tests PASS: Continue to adversarial tester.
If some tests FAIL (gaps reveal genuinely missing implementation):
fix: address test gap failures for task N), continue to adversarial testerSkip this step if the Pass 2 test reviewer reported zero missing coverage gaps.
After the test gap writer completes (or is skipped), dispatch an Adversarial Tester (Opus) using skills/adversarial-tester/break-it-prompt.md:
git diff <pre-task-sha>..HEAD), project test conventions, cartographer module context (if available)test: adversarial tests for task NSkip this step when:
.md, .json, .yaml, .uss, .uxml)Each review pass (code and test) uses the iterative loop:
After each wave completes:
When in refactor mode, Phase 3 execution differs from feature mode in several ways.
Before the first task executes:
/tmp/crucible-build-mode.md — this is the rollback targetRunning the full test suite after every atomic step is prohibitively expensive. Instead:
When the executor encounters a task marked atomic: true:
./refactor-implementer-addendum.md appendedKey difference from feature mode: Feature mode does RED-GREEN-REFACTOR. Refactor mode for atomic steps does GREEN-GREEN — tests are green before, tests must be green after. No RED phase because no new behavior is being added.
After a successful atomic commit (step 4), the rest of the per-task pipeline continues as normal: de-sloppify cleanup, two-pass review cycle, test alignment audit, test gap writer, and adversarial tester (unless skipped per restructuring-only annotation below).
Non-atomic refactoring tasks follow normal execution — structural changes that don't break intermediate states (e.g., extracting a private method, adding a module nothing imports yet). These use standard TDD if they introduce new abstractions, or GREEN-GREEN if they are pure restructuring.
restructuring-only: true/false. If restructuring-only: true, adversarial testing is skipped. Tasks with restructuring-only: false still get adversarial testing. When in doubt, default to false.
restructuring-only: true examples: renames where all call sites are mechanically updated, file moves with updated paths, extract-method where the extracted method is private and preserves the original call signaturerestructuring-only: false examples: extract-class where callers must change call targets, splitting a module where consumers must update imports, any change where the consumer-facing API surface shiftsThe orchestrator records the baseline commit SHA before the first refactoring task executes (during pre-execution coverage check). Persisted in /tmp/crucible-build-mode.md.
When a single task fails after the executor's retry attempt:
When the user chooses full rollback (or cascading failures make forward progress impossible):
git reset --hard <baseline-SHA> to restore pre-refactoring stateThe planner annotates tasks with safe-partial: true/false. A task is safe-partial: true if the codebase is in a valid, shippable state after that task completes (all tests green, no dangling references). When a later task fails, the orchestrator can offer to keep changes through the last safe-partial task.
For plans with 10+ tasks, at ~50% completion or after a major subsystem:
./architecture-reviewer-prompt.mdAfter all implementers in Phase 3 report back and before writing the Phase 3 COMPLETE ledger entry, aggregate their ### Noticed But Not Touching sections into a single docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md artifact.
Scope discipline: Notice, do not act. If an implementer sees an out-of-scope issue during implementation, it must be logged under ### Noticed But Not Touching in their report — NOT fixed in their diff. Acting on noticed items in the same task is a scope-discipline failure. The orchestrator enforces this via reconciliation: noticed entries are surfaced here and converted to follow-up tickets later (see /finish).
7-step reconciliation process:
Collect each implementer's ### Noticed But Not Touching section from every Phase 3 implementer report.
Skip any section whose body is *(none)*.
Dedupe entries using the canonical dedupe key: sha256( normalize(file_path) + "|" + line_range + "|" + noticed[:40] ), where normalize(file_path) is the repo-relative POSIX path lowercased.
Sort the deduped entries by file path, then line range.
If any entries remain, write docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md matching the canonical filename regex ^docs/plans/\d{4}-\d{2}-\d{2}-[a-z0-9-]+-noticed\.md$. Use the date embedded in the sibling plan filename (not wall-clock date) so all sibling artifacts share a date; slug matches the ticket being built. Frontmatter and body must follow the Canonical Constants template exactly:
---
pipeline_id: "<build-YYYYMMDD-HHMMSS>"
date: "YYYY-MM-DD"
ticket: "#NNN"
---
# Noticed But Not Touching — <ticket-slug>
- **file:** `path:L<start>-L<end>`
**noticed:** <desc>
**why it matters:** <risk/opportunity>
**suggested follow-up:** <optional>
Idempotent overwrite: If the target -noticed.md already exists (same-ticket re-run on the same date), merge the existing entries with the newly collected entries, run the full dedupe (same key), sort, and overwrite the file in one write. No append-mode; the on-disk file is always the full deduped set for that date+ticket.
Stage the -noticed.md file so it lands in the PR commit.
Skip the write entirely if zero entries remain after dedupe — do not produce an empty -noticed.md.
After the last task wave's verification gate passes and all tasks are marked complete — but BEFORE the Phase 3→4 handoff — write Status: COMPLETE and Tasks: N/N complete to the Phase 3 ledger entry. If any task is in a retry/re-dispatch loop, COMPLETE is NOT written until retries resolve.
Before running acceptance tests and code review, verify the gate ledger and write a handoff manifest:
build-gate-ledger.md and verify Phase 3 Status is COMPLETE. If not, follow Enforcement Rules.Write the handoff manifest:
handoff-3-to-4.md with:
git diff scope), task summary (completed count, escalation outcomes)## Compression State in pipeline-status.md with manifest contents.phase_change event to the outbox: {"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 3 -> Phase 4 (Completion)","detail":{"skill":"build","from":"3","to":"4"}}.Eval-gate pointer (Mock Dispatch Mode): when
CRUCIBLE_BUILD_EVAL_MOCK_DIRis set, the temper, inquisitor, optional siege, quality-gate, forge, cartographer, and finish dispatches in this phase use the mock-dir substitution rule defined in the## Mock Dispatch Mode (eval-gate)section. Local test-suite execution (pytest, etc.) runs normally — substitution applies only to subagent dispatches.
After all tasks complete:
Write Phase 4 IN_PROGRESS to the gate ledger (after Phase 3 COMPLETE verification).
Feature mode: Run acceptance tests from Phase 1 Step 3 — verify they PASS (GREEN). Refactor mode: Run all contract tests from Phase 1 — verify they PASS (GREEN).
Run full test suite (unit + integration)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-temper" before dispatching code review. If the iterative review fix cycle introduces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:temper on full implementation (iterative until clean)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-inquisitor" before dispatching inquisitor. If the inquisitor's fix cycle produces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:inquisitor on full implementation (dispatches 5 parallel dimensions against full feature diff)
git diff <base-sha>..HEAD where base-sha is the commit before Phase 3 execution begancrucible:inquisitor for full processConditional: If the inquisitor's fix cycle produced any code changes, re-run crucible:temper scoped to the inquisitor fix commits only (git diff <pre-inquisitor-sha>..HEAD)
a. Contract check: If a contract YAML exists for this ticket with security_review.status: "required", siege is mandatory — skip to step (d).
b. Code scan: If no contract directive (or contract has security_review.status: "recommended" or field absent), scan for siege activation signals:
git diff <base-sha>..HEAD (changed file contents)shared/security-signals.md/siege --force manually if needed." Record in manifest and decision journal: security-review | choice=skip | reason=1 signal ([category]).crucible:siege with:
mixed)deployment_context: from contract security_review.deployment_context if present, else unset (siege defaults to public)security-review | choice=dispatch | reason=[N] signals ([categories]) [or contract-required]{"ts":"<now>","seq":0,"type":"security_review","summary":"Siege dispatched: [N] signals detected","detail":{"skill":"build","signals":[categories]}}
e. Blocking behavior: Siege iterates internally until zero Critical + zero High.git diff <pre-siege-sha>..HEAD). Same pattern as post-inquisitor conditional review at step 5.
f. Escape hatches: User can override automatic siege behavior:--force-siege — Dispatch siege regardless of signal count. Maps to siege's --force flag. Decision journal: security-review | choice=force-dispatch | reason=user --force-siege flag--skip-siege — Suppress siege even when signals/contract require it. Maps to siege's --skip flag. Decision journal: security-review | choice=force-skip | reason=user --skip-siege flagRECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-impl-gate" before dispatching the implementation quality gate. If gate fix rounds degrade the code, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:quality-gate on full implementation (artifact type: "code"). Include in the dispatch context: Phase: code and PipelineID: <current PipelineID>. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.)
6b. Verify verdict marker and write Phase 4 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
RECOMMENDED SUB-SKILL: Use crucible:forge (retrospective mode) — capture what happened vs what was planned 7.5. Chronicle signal fallback: If forge retrospective was skipped (user declined, session ending), append a minimal chronicle signal directly:
/tmp/crucible-metrics-<session-id>.log for duration and subagent countsv=1, ts=now, skill="build", outcome from acceptance test results,
duration_m from metrics log, branch from git, files_touched from git diff <base-sha>..HEAD --name-only,
metrics={mode, tasks count, tasks_passed count from task list, stagnation=false}~/.claude/projects/<hash>/memory/chronicle/signals.jsonlRECOMMENDED SUB-SKILL: Use crucible:cartographer-skill (record mode) — persist any new codebase knowledge discovered during build
Compile summary: what was built, acceptance tests passing, review findings addressed, inquisitor findings, concerns
Report to user
10.5. Session index event: Emit a skill_end event to the outbox: {"ts":"<now>","seq":0,"type":"skill_end","summary":"/build complete: <outcome summary>","detail":{"skill":"build","outcome":"success|failure|escalated"}}.
REQUIRED SUB-SKILL: Use crucible:finish — skip finish's Step 2.5 (test-coverage) since test-coverage ran per-task in Phase 3, and skip finish's Step 3 (red-team) since quality-gate already ran at step 6. Tell finish to skip both.
Delete pipeline-active marker: Remove <scratch>/.pipeline-active. This signals that the pipeline completed successfully. If deletion fails (permissions, missing file), log a warning but do not fail the pipeline.
Throughout the pipeline, the orchestrator appends timestamped entries to /tmp/crucible-metrics-<session-id>.log on each subagent dispatch and completion.
Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from shared/dispatch-convention.md:
input_chars and model_tier in the manifest entry.output_chars and tool_calls (if available) in the manifest completion entry.At completion (before reporting to user, i.e. step 9), read the metrics log and manifest, then compute:
-- Pipeline Complete ----------------------------------------
Subagents dispatched: 23 (14 Opus, 7 Sonnet, 2 Haiku)
Active work time: 2h 47m
Wall clock time: 11h 13m
Quality gate rounds: 4 (design: 2, plan: 1, impl: 1)
Siege: dispatched (3 agents, 2 rounds, 0 Critical, 0 High) | skipped (0 signals) | skipped (1 signal: auth)
Task tiers: 3 Tier 1, 3 Tier 2, 2 Tier 3
Subagent savings: ~21 dispatches skipped vs all-Tier-3
Est. input tokens: ~32,100 (128,400 chars)
Est. output tokens: ~20,500 (82,000 chars)
Token estimate note: Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%.
-------------------------------------------------------------
Metrics tracked:
input_chars from manifest / 4)output_chars from manifest / 4)Efficiency summary computation: Read manifest.jsonl from the dispatch directory. Sum input_chars and output_chars across all completed entries (skip nulls). Divide each by 4 for token estimates. Count dispatches grouped by model_tier. Include these in the pipeline completion report alongside existing metrics.
Gate tracking verification: Before compiling the pipeline summary (Phase 4 Step 9), verify that all three gate categories (design, plan, implementation) show round count >= 1 with clean final rounds (0 Fatal, 0 Significant). If any gate was skipped with explicit user approval, record it as USER_SKIP in the metrics. A zero without user approval indicates a gate was dropped — report this in the summary.
Alongside the metrics log, maintain a decision journal at /tmp/crucible-decisions-<session-id>.log. Append a structured entry for every non-trivial routing decision:
[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>
Decision types to capture:
reviewer-model — why Opus vs Sonnet for this reviewerreview-tier -- tier assignment read from plan, runtime escalation reason if applicablegate-round — issue count, severity shifts, progress/stagnation per roundescalation — why the orchestrator escalated to user (and user's decision)task-grouping — parallelism decisions for wave executioncleanup-removal — what de-sloppify removed and accept/reject decisionSTOP and ask the user when:
Minor issues: Log, work around, include in final report.
./acceptance-test-writer-prompt.md — Phase 1 acceptance test generation./prd-writer-prompt.md — Phase 1 PRD generation from design doc./plan-writer-prompt.md — Phase 2 plan writer dispatch./plan-reviewer-prompt.md — Phase 2 plan reviewer dispatch./build-implementer-prompt.md — Phase 3 implementer dispatch./build-reviewer-prompt.md — Phase 3 reviewer dispatch./cleanup-prompt.md — Phase 3 de-sloppify cleanup dispatch./test-gap-writer-prompt.md — Phase 3 test gap writer dispatch./architecture-reviewer-prompt.md — Mid-plan checkpoint./contract-test-writer-prompt.md — Phase 1 refactor-mode contract test generation./refactor-implementer-addendum.md — Phase 3 refactor-mode implementer addendum (appended to build-implementer-prompt)Red-team, innovate, adversarial tester, and inquisitor prompts live in their respective skills:
crucible:red-team — skills/red-team/red-team-prompt.mdcrucible:innovate — skills/innovate/innovate-prompt.mdcrucible:adversarial-tester — skills/adversarial-tester/break-it-prompt.mdcrucible:inquisitor — skills/inquisitor/inquisitor-prompt.mdBuild is the outermost orchestrator and controls all quality gates via crucible:quality-gate. Quality gate wraps crucible:red-team internally — do NOT invoke red-team separately at these points.
Gate points in the pipeline:
| Pipeline Stage | Artifact Type | Replaces |
|---------------|---------------|----------|
| Phase 1, Step 2 (after design) | design | Existing crucible:red-team on design |
| Phase 2, Step 3 (after plan review) | plan | Existing crucible:red-team on plan |
| Phase 4, Step 6 (after inquisitor + conditional re-review) | code | Existing crucible:red-team on implementation |
Code review (crucible:temper) and inquisitor (crucible:inquisitor) remain separate from the quality gate — temper does structured quality checks, inquisitor writes cross-component adversarial tests, and the quality gate does adversarial artifact review. All three serve distinct purposes.
When a contract YAML exists for the current ticket, the quality gate adds contract verification to its checks. This applies at all gate points (design, plan, and code), though most contract checks are only meaningful at the code gate (Phase 4, Step 6).
Version check: Before processing a contract, verify the version field is "1.0". If the version is missing or unrecognized, reject the contract with a clear error: "Contract version [X] is not supported. Expected version 1.0." Do not proceed with contract-aware checks — fall back to standard quality gate behavior without contract awareness.
Checkable invariant verification: For each checkable invariant in the contract, verify satisfaction using the declared check_method:
grep — pattern match (or absence) in production code. Run the grep and confirm the result matches the invariant's verification description.code-inspection — read and reason about the relevant code to confirm the invariant holds (e.g., idempotency, no side effects).file-structure — check that file existence, location, or organization matches the constraint.Testable invariant verification: For each testable invariant in the contract:
test_tag (pattern: contract:<category>:<id>) exists in the test suite.Contract violations are blocking issues. Contract violations are NOT warnings — they have the same severity as architectural concerns and must be resolved before the gate passes. The quality gate's iterative fix loop applies: dispatch fixes, re-check, track progress/stagnation as normal.
Required sub-skills:
Recommended sub-skills:
Recon/assay context: Inherits recon/assay context through /design (Phase 1). No direct dispatch. When design integrates recon, build benefits automatically. See #147 for rationale.
Phase 3 sub-skills (dispatched per-task):
Implementer sub-skills:
Contract consumption:
/spec (schema version 1.0). Contracts are read from docs/plans/*-contract.yaml and feed into pre-existing doc detection (Phase 1 Step 0), implementer dispatch (Phase 3), reviewer checks (Phase 3), and quality gate verification (all gate points). See crucible:spec/contract-schema.md for field definitions.testing
Standalone instance-bug reviewer — runs a parallel finder fan-out + verify gate over a diff or a path and prints ranked, verified findings. Use when the user says "delve", "find bugs in this diff", "review this for bugs", "scan this file/subsystem for defects", "instance-bug sweep", or wants concrete reproducible defects (not a merge verdict, not systemic health). Works on a PR id, a base..head range, or a path, on any forge (GitHub, GitLab, Bitbucket, self-hosted).
testing
Render the Crucible calibration ledger weekly report — the honest "Crucible caught N silent bugs" headline, verdict breakdown, per-skill severity rates, and the inflation detector. Triggers on "/ledger", "weekly report", "weekly ledger", "caught N", "quality ledger", "calibration report", "render the ledger".
development
The Book of Grudges — cross-session bug graveyard. Every fixed bug is recorded as a structured "grudge"; before touching code, skills query the grudgebook for the files in scope and surface past regressions as forced "DO NOT REPEAT" context. Read mode (pre-flight) and write mode (on bug resolution / fix(*) PR). Machine-local, per-repo, never committed. Triggers on /grudge, "check grudges", "record a grudge", "any past bugs here", "regression oracle", "bug graveyard".
testing
Reconcile the Crucible calibration ledger — walk merged fix/hotfix branches to falsify the originating gating-verdicts, compute per-skill Brier calibration scores, and append a falsification log. Triggers on "/calibration-reconcile", "reconcile ledger", "reconcile calibration", "falsify verdicts", "brier score", "calibration reconcile", "compute brier".