STARTER_CHARACTER = 🔬

Ralph Review (Deep)

Validate the beads graph and launch three independent reviews of the ralph artifacts in parallel using different models via opencode run. Each model produces one review file evaluating goal clarity, task decomposition, sequencing, plan completeness, risk and gaps, and feasibility.

For a faster single-model review using Claude opus as an internal subagent, use ralph-review instead. This deep variant trades wall time and cost for cross-model consensus.

Models

| Label | Model ID | | ------ | ----------------------------- | | openai | openai/gpt-5.3-codex | | gemini | google/gemini-3.1-pro-preview | | claude | az-anthropic/claude-opus-4-6 |

Procedure

Step 1: Validate Artifacts Exist

PROJECT_ROOT=$(git rev-parse --show-toplevel)

Verify all of these exist:

$PROJECT_ROOT/.llmtmp/ralph-plan.md
$PROJECT_ROOT/.llmtmp/ralph-prompt-insession.md
$PROJECT_ROOT/.llmtmp/ralph-prompt-external.md
$PROJECT_ROOT/.beads/ (directory; bd-aware probe via bd list >/dev/null 2>&1)

Abort if any are missing.

Step 2: Validate the bd Graph

Run these checks before launching opencode. Failures here are HIGH-severity findings and should abort or be flagged loudly.

The bd graph is organized under a per-run epic. The slug is derived from the current branch; the epic is found via the ralph:<slug> label. All scoped checks filter by --parent ${EPIC}.

cd "$PROJECT_ROOT"

SLUG=$(git branch --show-current | tr '/' '-')

# 1. Single open epic for this run
EPIC_COUNT=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq '[.[] | select(.status=="open")] | length')
if [[ "$EPIC_COUNT" -ne 1 ]]; then
  echo "HIGH: expected exactly one open epic with label ralph:${SLUG}, found ${EPIC_COUNT}"
fi
EPIC=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq -r '[.[] | select(.status=="open")] | .[0].id // empty')

# 2. Cycles
if ! bd dep cycles >/dev/null 2>&1; then
  echo "HIGH: bd graph has dependency cycles"
  bd dep cycles
fi

# 3. Minimum child count under the run epic
COUNT=$(bd list --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$COUNT" -eq 0 ]]; then
  echo "HIGH: run epic ${EPIC} has no child tasks"
fi

# 4. At least one ready task under the run epic
READY_COUNT=$(bd ready --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$READY_COUNT" -eq 0 && "$COUNT" -gt 0 ]]; then
  echo "HIGH: run epic ${EPIC} has tasks but no ready tasks (deadlock or all closed unexpectedly)"
fi

# 5. Isolated nodes within the run epic
if [[ "$COUNT" -gt 1 ]]; then
  bd list --parent "$EPIC" --json 2>/dev/null \
    | jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title'
fi

# 6. Description coverage: every child carries the worker's instructions
bd list --parent "$EPIC" --json 2>/dev/null \
  | jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'

# 7. Parent set equals label set (defense-in-depth orphan detection)
PARENT_IDS=$(bd list --parent "$EPIC" --json 2>/dev/null | jq -r '.[].id' | sort)
LABEL_IDS=$(bd list --label "ralph:${SLUG}" --json 2>/dev/null | jq -r '.[] | select(.issue_type != "epic") | .id' | sort)
if [[ "$PARENT_IDS" != "$LABEL_IDS" ]]; then
  echo "HIGH: parent set (--parent ${EPIC}) and label set (--label ralph:${SLUG} excluding epic) differ"
  echo "Parent-only (label missing): $(comm -23 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
  echo "Label-only (orphan, parent missing): $(comm -13 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
fi

Persist these findings; the reviewers reference them in their final reports.

Step 3: Clean Up Previous Reviews

rm -f "$PROJECT_ROOT"/.llmtmp/ralph-review-openai.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-gemini.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-claude.md

Step 4: Read Files and Build Prompt Context

Concatenate the contents of these files into a single CONTEXT string:

.llmtmp/ralph-plan.md
.llmtmp/ralph-prompt-insession.md
.llmtmp/ralph-prompt-external.md
The slug and epic ID resolved in Step 2 (literal label SLUG=... and EPIC=...)
Output of bd list --parent "$EPIC" --json
Output of bd ready --parent "$EPIC" --json
Output of bd list --label "ralph:${SLUG}" --json
The bd graph validation findings from Step 2
Plan documents listed in the plan's plan documents: line. Parse the line, split on commas, trim whitespace, read each path. Wrap each as === plan-documents/<path> === so the reviewer can distinguish them from baseline. Empty value means no plan documents to load.
CLAUDE.md and README.md from project root (if they exist)
All .md files in .llmdocs/

Wrap each as:

=== <relative-path or label> ===
<contents>

Step 5: Construct the Review Prompt

Build the prompt that each opencode process executes. Replace <PROJECT_ROOT> and <CONTEXT>:

PROMPT="You are a plan reviewer running headless in a non-interactive session. There is no user present. Do not ask questions. Do not prompt for confirmation. Do not stop to wait for input.

YOUR PRIMARY OBJECTIVE IS TO WRITE A FILE. Everything else is preparation. After completing your review, your VERY NEXT action must be a tool call that writes the output file. Do not summarize, do not reflect, do not plan — write the file immediately.

OUTPUT RULES: You are running headless. Nobody reads your text responses. ALL review content goes exclusively into the output file via tool calls. Any text you generate outside of a tool call is wasted tokens and risks hitting the output token limit before you can write the file. Keep text responses to one short sentence at most.

PROJECT_ROOT: <PROJECT_ROOT>

## Context

<CONTEXT>

## How Ralph Loops Work

A ralph loop is an autonomous, iterative execution mechanism. Two execution modes share the same artifacts:

- **In-session (ralph-loop plugin)**: a Stop hook re-feeds the prompt to the same Claude session. Cheap on short runs; expensive on long runs because context grows quadratically.
- **External (scripts/ralph.sh)**: a bash while-loop spawns a fresh `claude --print` per iteration. Higher fixed overhead per iteration; stays in the cheap part of the cost curve indefinitely.

Authority sources in CONTEXT:

The CONTEXT contains two distinct authority categories. Treat them differently.

- **Plan documents** (under `=== plan-documents/...` blocks): forward-looking intent the planner consulted, e.g., a PRD, a Jira ticket, a design note. The plan must faithfully render these. Flag the plan if it omits, contradicts, or stale-references a requirement from a plan document.
- **Baseline docs** (`CLAUDE.md`, `README.md`, `.llmdocs/*`): the system's current state. The plan's job is to change this state. Do NOT flag the plan for deviating from baseline. Flag only if the plan changes baseline without including a corresponding docs-update bead (typically delegated to `/docs`).

Mechanics that apply to both modes:

- The bd graph is the state machine. The graph is shared across worktrees of the same repo. Each ralph run is organized under a per-run epic so concurrent runs in different worktrees stay isolated by `--parent` filtering. `bd ready --parent <epic> --json` finds the next unblocked task within this run.
- The run epic carries the label `ralph:<slug>` where `<slug>` is derived from the current branch. Every child task carries both `--parent <epic>` and the same `ralph:<slug>` label. The two signals are independent so the reviewer can detect orphans.
- The plan's "Run Identity" section records the literal slug and epic ID. The agent reads them directly from the plan each iteration, no derivation.
- One task per iteration. The agent reads `bd ready --parent <epic> --json`, takes the first result, runs the per-task workflow, then closes the task.
- When the ready queue is empty, the agent closes the epic and emits the sentinel as the terminal action. The epic is not closed by `bd` automatically.
- Resumable. A partially-completed graph resumes cleanly. Partial completion is normal; iterations are cheap.
- Every action is a bd task, including ralph lifecycle scaffolding. The branch creation, BEGIN tag, code work, docs, and END tag are all first-class beads, all parented to the run epic.
- Each bead carries its full per-task instructions in its description: files to touch, expected behavior, verification command, commit message. The worker reads `bd show <id>` and executes the description exactly.
- Run-wide conventions (commit style, language, build command, test command) live in `ralph-plan.md`, not in every description.
- Environment is a precondition. Credentials, network, toolchain, and runtime dependencies are the operator's responsibility. Do not flag missing environment setup as a gap.

Do NOT recommend: collapsing tasks into bundles, adding environment preflight checks, treating partial completion as a risk, modifying the prompt body, replacing the bd graph with a markdown checklist, or reintroducing a workflow-label taxonomy. The `ralph:<slug>` label is run-identity for partition and orphan detection, not workflow dispatch; do not flag it as a label taxonomy.

Flag as HIGH severity if: zero or more than one open epic exists for `ralph:<slug>`; any open child has an empty description; the bd graph contains cycles; the run epic has no children; the run epic has children but no ready tasks; the parent set under the epic differs from the label set for `ralph:<slug>` (orphans or planning typos); the per-task workflow does not explicitly stop after `bd close` (the agent will do multiple beads in one context, defeating the Ralph design); the per-task workflow does not close the epic before emitting the sentinel; the plan lacks a "Run Identity" section recording the literal slug and epic ID; the plan lacks an "Inputs" section with a `plan documents:` line; the plan omits or contradicts a material requirement explicit in any plan document; the plan's Approach prose declares an ordering ("X first", "after Y") that the bd dependency edges do not enforce. Flag as MEDIUM if: a description omits the verification command or commit message; an isolated task exists in a multi-task graph; the plan duplicates information that already lives in bead descriptions; a bead that touches README.md/CLAUDE.md/.llmdocs/ authors edits inline instead of running `/docs`.

## Your Task

Review the Ralph artifacts:

- ralph-plan.md (mode-neutral execution plan)
- ralph-prompt-insession.md (slash-command wrapper for the in-session driver)
- ralph-prompt-external.md (body for scripts/ralph.sh)
- The bd graph (issues, dependencies, ready/closed state)

Evaluate across these areas:

### 1. Goal Clarity

- Is the objective well-defined and unambiguous?
- Would a fresh-context iteration understand what success looks like from the plan alone?

### 2. Task Decomposition

- Are tasks the right granularity (one focused deliverable each)?
- Is each task independently completable?
- Are there missing tasks needed to achieve the goal?
- Are there unnecessary or redundant tasks?
- Do titles name deliverables rather than techniques?
- Do descriptions carry enough detail (files, verification, commit message) for a cold-start worker?

### 3. Sequencing and Dependencies

- Are dependencies correct in the bd graph?
- Would executing in `bd ready --json` order produce correct results?
- Are there implicit dependencies that should be explicit edges?
- Are there cycles or isolated nodes?

### 4. Plan Completeness

- Do the instructions cover repo-specific conventions, tools, and commands?
- Are build, test, and lint commands accurate?
- Is the per-task workflow clear (read description, execute it, close)?
- Does the plan have a "Run Identity" section that records the literal branch slug, the literal epic ID, and the epic label `ralph:<slug>`?
- Does the Per-Task Workflow read the epic ID from the "Run Identity" section and scope `bd ready` with `--parent <epic-id>`?
- Does the Per-Task Workflow close the epic via `bd close <epic-id>` before emitting the sentinel when the ready queue is empty?
- Does the Per-Task Workflow explicitly say STOP after `bd close` so the agent does not loop within a single context?
- Does the Git Workflow section require a `ralph/` feature branch and BEGIN/END tags?
- Is the Per-Task Workflow explicit that `bd close` for the current task runs BEFORE the sentinel?
- Do beads that touch `README.md`, `CLAUDE.md`, or `.llmdocs/` invoke the `/docs` skill rather than authoring edits inline?

### 5. Risk and Gaps

- What could go wrong during autonomous execution?
- Are there tasks that require human judgment or external access?
- Could any task leave the repo in a broken state?

### 6. Feasibility

- Can a single iteration realistically complete each task?
- Are there tasks that exceed what a fresh context can do autonomously?

For each area, list specific findings with severity (high/medium/low) and actionable recommendations.

## Output

Your VERY NEXT action after the review must be a tool call that writes the file. No intermediate steps.

Determine your model label:

- Claude variants: claude
- GPT variants: openai
- Gemini variants: gemini

Write to `<PROJECT_ROOT>/.llmtmp/ralph-review-$MODEL_LABEL.md` using this template:

\`\`\`markdown
# Ralph Loop Review
**Model**: <MODEL_LABEL>

## Goal Clarity

- **[medium]** Short title. Explanation...

## Task Decomposition

- **[high]** Short title. Explanation...

## Sequencing and Dependencies

- **[medium]** Short title. Explanation...

## Plan Completeness

- **[low]** Short title. Explanation...

## Risk and Gaps

- **[high]** Short title. Explanation...

## Feasibility

- **[medium]** Short title. Explanation...

## Summary

<overall assessment and top 3 recommendations>
\`\`\`

Finding format: \`- **[severity]** Title. Description.\`

Every section must be present. If no findings for an area, write 'No findings.' under its heading.

Then verify: \`ls -la '<PROJECT_ROOT>/.llmtmp/ralph-review-<MODEL_LABEL>.md'\`

If the file does not exist, write it again. Do not exit without the file."

Step 6: Write Prompt to File

Avoid shell interpolation issues with large prompts:

STATE_DIR="$PROJECT_ROOT/.llmtmp/ralph_review_state"
mkdir -p "$STATE_DIR"
OPENAI_DIR=$(mktemp -d)
GEMINI_DIR=$(mktemp -d)
CLAUDE_DIR=$(mktemp -d)
cat > /tmp/ralph-review-prompt.txt <<'PROMPT_EOF'
<the prompt from Step 5>
PROMPT_EOF

Step 7: Launch 3 Separate Background Bash Tasks

Each opencode process must run as its own background Bash task. Do NOT chain them in a single shell with &; child processes get killed when the parent exits.

OpenAI:

STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m openai/gpt-5.3-codex \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<OPENAI_DIR>" \
  --title "Ralph Review - OpenAI" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/openai.ndjson" 2>"$STATE_DIR/openai.log"

Gemini:

STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m google/gemini-3.1-pro-preview \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<GEMINI_DIR>" \
  --title "Ralph Review - Gemini" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/gemini.ndjson" 2>"$STATE_DIR/gemini.log"

Claude:

STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m az-anthropic/claude-opus-4-6 \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<CLAUDE_DIR>" \
  --title "Ralph Review - Claude" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/claude.ndjson" 2>"$STATE_DIR/claude.log"

All 3 launch as separate background tasks. Do NOT use & in a single shell.

Output Streams

Each opencode process produces two output files:

<label>.ndjson (stdout): Structured NDJSON events from --format json. Use for programmatic progress tracking.
<label>.log (stderr): Plain-text info-level logs. Use for diagnosing startup failures, permission issues, MCP server errors, plugin loading.

Log lines are structured text, one per line:

INFO  2026-03-13T00:54:25 +4ms service=default directory=/private/tmp creating instance

Step 8: Monitor and Wait

Poll each background task until all 3 complete.

NDJSON progress (replace $NDJSON with the actual path):

# Count completed steps
grep -c '"type":"step_finish"' "$NDJSON" 2>/dev/null || echo 0

# Check if done (last step_finish has reason "stop")
tail -1 "$NDJSON" 2>/dev/null | jq -r '.part.reason // empty'

# Check for errors
grep '"type":"error"' "$NDJSON" 2>/dev/null

Text logs (replace $LOGFILE with the actual path):

# Errors or warnings
grep -E "^(ERROR|WARN)" "$LOGFILE" 2>/dev/null

# Recent activity
tail -5 "$LOGFILE" 2>/dev/null

Step 9: Report Results

For each model, check whether .llmtmp/ralph-review-<label>.md exists.

Report per model:

Success or failure (file present or not)
Total cost: sum of cost from all step_finish events
Whether any errors occurred

Surface the bd graph validation findings from Step 2 separately at the top of the report.

Step 10: Cleanup

The state directory is preserved for post-run inspection (per-model NDJSON event streams and text logs are useful for diagnosing review failures, calibration regressions, and cost trends).

rm -f /tmp/ralph-review-prompt.txt

$STATE_DIR (.llmtmp/ralph_review_state/) is intentionally NOT removed. Each subsequent reviewer run overwrites the per-model files in place; if historical retention is needed, archive the directory before re-running.

Expected Output Files

3 files total, one per model:

.llmtmp/ralph-review-openai.md
.llmtmp/ralph-review-gemini.md
.llmtmp/ralph-review-claude.md

NDJSON Log Format Reference

Each opencode process writes NDJSON to $STATE_DIR/<label>.ndjson. One JSON object per line. Skip lines that fail to parse (partial writes).

Key event types:

step_start - New LLM turn begins.
text - Model emitted text: {"type":"text","part":{"text":"..."}}
tool_use - Tool call: {"type":"tool_use","part":{"tool":"bash","state":{"status":"completed","metadata":{"exit":0}}}}
step_finish - Turn completed: {"type":"step_finish","part":{"reason":"stop","cost":0,"tokens":{"total":13494}}}. reason: "stop" = done, reason: "tool-calls" = continuing.
error - Session error: {"type":"error","error":{"data":{"message":"..."}}}

Useful jq queries (replace $NDJSON with the actual path):

NDJSON="<PROJECT_ROOT>/.llmtmp/ralph_review_state/openai.ndjson"
# Is done?
tail -1 "$NDJSON" | jq -r 'select(.type=="step_finish") | .part.reason'
# Total cost
jq -s '[.[] | select(.type=="step_finish") | .part.cost] | add' "$NDJSON"
# All errors
jq -r 'select(.type=="error") | .error.data.message' "$NDJSON"

Replace $LOGFILE with the text log path:

grep -E "^(ERROR|WARN)" "$LOGFILE"

Rules

Claude Code is ONLY a launcher. All review work happens inside the opencode processes.
Do NOT perform any review analysis directly. The opencode processes handle reviewing.
Do NOT ask questions during execution. This is non-interactive.
Launch all 3 processes in parallel. Do not wait for one to finish before starting another.
Use plain message invocation, not --command. The --command flag has a known issue with the c7 MCP server.
Preserve the state directory after reporting results. NDJSON and log files stay for post-run inspection.
If a model fails, still wait for and report the others.
The bd graph validation in Step 2 runs in Claude Code, not in opencode. Findings are passed to opencode reviewers as context.

STARTER_CHARACTER = 🔬

Ralph Review (Deep)

For a faster single-model review using Claude opus as an internal subagent, use ralph-review instead. This deep variant trades wall time and cost for cross-model consensus.

Models

| Label | Model ID | | ------ | ----------------------------- | | openai | openai/gpt-5.3-codex | | gemini | google/gemini-3.1-pro-preview | | claude | az-anthropic/claude-opus-4-6 |

Procedure

Step 1: Validate Artifacts Exist

PROJECT_ROOT=$(git rev-parse --show-toplevel)

Verify all of these exist:

$PROJECT_ROOT/.llmtmp/ralph-plan.md
$PROJECT_ROOT/.llmtmp/ralph-prompt-insession.md
$PROJECT_ROOT/.llmtmp/ralph-prompt-external.md
$PROJECT_ROOT/.beads/ (directory; bd-aware probe via bd list >/dev/null 2>&1)

Abort if any are missing.

Step 2: Validate the bd Graph

Run these checks before launching opencode. Failures here are HIGH-severity findings and should abort or be flagged loudly.

The bd graph is organized under a per-run epic. The slug is derived from the current branch; the epic is found via the ralph:<slug> label. All scoped checks filter by --parent ${EPIC}.

cd "$PROJECT_ROOT"

SLUG=$(git branch --show-current | tr '/' '-')

# 1. Single open epic for this run
EPIC_COUNT=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq '[.[] | select(.status=="open")] | length')
if [[ "$EPIC_COUNT" -ne 1 ]]; then
  echo "HIGH: expected exactly one open epic with label ralph:${SLUG}, found ${EPIC_COUNT}"
fi
EPIC=$(bd list -t epic -l "ralph:${SLUG}" --json 2>/dev/null | jq -r '[.[] | select(.status=="open")] | .[0].id // empty')

# 2. Cycles
if ! bd dep cycles >/dev/null 2>&1; then
  echo "HIGH: bd graph has dependency cycles"
  bd dep cycles
fi

# 3. Minimum child count under the run epic
COUNT=$(bd list --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$COUNT" -eq 0 ]]; then
  echo "HIGH: run epic ${EPIC} has no child tasks"
fi

# 4. At least one ready task under the run epic
READY_COUNT=$(bd ready --parent "$EPIC" --json 2>/dev/null | jq 'length')
if [[ "$READY_COUNT" -eq 0 && "$COUNT" -gt 0 ]]; then
  echo "HIGH: run epic ${EPIC} has tasks but no ready tasks (deadlock or all closed unexpectedly)"
fi

# 5. Isolated nodes within the run epic
if [[ "$COUNT" -gt 1 ]]; then
  bd list --parent "$EPIC" --json 2>/dev/null \
    | jq -r '.[] | select(.dependency_count == 0 and .dependent_count == 0) | "MEDIUM: isolated task " + .id + " " + .title'
fi

# 6. Description coverage: every child carries the worker's instructions
bd list --parent "$EPIC" --json 2>/dev/null \
  | jq -r '.[] | select(((.description // "") | length) == 0) | "HIGH: task " + .id + " has empty description: " + .title'

# 7. Parent set equals label set (defense-in-depth orphan detection)
PARENT_IDS=$(bd list --parent "$EPIC" --json 2>/dev/null | jq -r '.[].id' | sort)
LABEL_IDS=$(bd list --label "ralph:${SLUG}" --json 2>/dev/null | jq -r '.[] | select(.issue_type != "epic") | .id' | sort)
if [[ "$PARENT_IDS" != "$LABEL_IDS" ]]; then
  echo "HIGH: parent set (--parent ${EPIC}) and label set (--label ralph:${SLUG} excluding epic) differ"
  echo "Parent-only (label missing): $(comm -23 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
  echo "Label-only (orphan, parent missing): $(comm -13 <(echo "$PARENT_IDS") <(echo "$LABEL_IDS"))"
fi

Persist these findings; the reviewers reference them in their final reports.

Step 3: Clean Up Previous Reviews

rm -f "$PROJECT_ROOT"/.llmtmp/ralph-review-openai.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-gemini.md \
      "$PROJECT_ROOT"/.llmtmp/ralph-review-claude.md

Step 4: Read Files and Build Prompt Context

Concatenate the contents of these files into a single CONTEXT string:

.llmtmp/ralph-plan.md
.llmtmp/ralph-prompt-insession.md
.llmtmp/ralph-prompt-external.md
The slug and epic ID resolved in Step 2 (literal label SLUG=... and EPIC=...)
Output of bd list --parent "$EPIC" --json
Output of bd ready --parent "$EPIC" --json
Output of bd list --label "ralph:${SLUG}" --json
The bd graph validation findings from Step 2
Plan documents listed in the plan's plan documents: line. Parse the line, split on commas, trim whitespace, read each path. Wrap each as === plan-documents/<path> === so the reviewer can distinguish them from baseline. Empty value means no plan documents to load.
CLAUDE.md and README.md from project root (if they exist)
All .md files in .llmdocs/

Wrap each as:

=== <relative-path or label> ===
<contents>

Step 5: Construct the Review Prompt

Build the prompt that each opencode process executes. Replace <PROJECT_ROOT> and <CONTEXT>:

PROMPT="You are a plan reviewer running headless in a non-interactive session. There is no user present. Do not ask questions. Do not prompt for confirmation. Do not stop to wait for input.

YOUR PRIMARY OBJECTIVE IS TO WRITE A FILE. Everything else is preparation. After completing your review, your VERY NEXT action must be a tool call that writes the output file. Do not summarize, do not reflect, do not plan — write the file immediately.

OUTPUT RULES: You are running headless. Nobody reads your text responses. ALL review content goes exclusively into the output file via tool calls. Any text you generate outside of a tool call is wasted tokens and risks hitting the output token limit before you can write the file. Keep text responses to one short sentence at most.

PROJECT_ROOT: <PROJECT_ROOT>

## Context

<CONTEXT>

## How Ralph Loops Work

A ralph loop is an autonomous, iterative execution mechanism. Two execution modes share the same artifacts:

- **In-session (ralph-loop plugin)**: a Stop hook re-feeds the prompt to the same Claude session. Cheap on short runs; expensive on long runs because context grows quadratically.
- **External (scripts/ralph.sh)**: a bash while-loop spawns a fresh `claude --print` per iteration. Higher fixed overhead per iteration; stays in the cheap part of the cost curve indefinitely.

Authority sources in CONTEXT:

The CONTEXT contains two distinct authority categories. Treat them differently.

- **Plan documents** (under `=== plan-documents/...` blocks): forward-looking intent the planner consulted, e.g., a PRD, a Jira ticket, a design note. The plan must faithfully render these. Flag the plan if it omits, contradicts, or stale-references a requirement from a plan document.
- **Baseline docs** (`CLAUDE.md`, `README.md`, `.llmdocs/*`): the system's current state. The plan's job is to change this state. Do NOT flag the plan for deviating from baseline. Flag only if the plan changes baseline without including a corresponding docs-update bead (typically delegated to `/docs`).

Mechanics that apply to both modes:

- The bd graph is the state machine. The graph is shared across worktrees of the same repo. Each ralph run is organized under a per-run epic so concurrent runs in different worktrees stay isolated by `--parent` filtering. `bd ready --parent <epic> --json` finds the next unblocked task within this run.
- The run epic carries the label `ralph:<slug>` where `<slug>` is derived from the current branch. Every child task carries both `--parent <epic>` and the same `ralph:<slug>` label. The two signals are independent so the reviewer can detect orphans.
- The plan's "Run Identity" section records the literal slug and epic ID. The agent reads them directly from the plan each iteration, no derivation.
- One task per iteration. The agent reads `bd ready --parent <epic> --json`, takes the first result, runs the per-task workflow, then closes the task.
- When the ready queue is empty, the agent closes the epic and emits the sentinel as the terminal action. The epic is not closed by `bd` automatically.
- Resumable. A partially-completed graph resumes cleanly. Partial completion is normal; iterations are cheap.
- Every action is a bd task, including ralph lifecycle scaffolding. The branch creation, BEGIN tag, code work, docs, and END tag are all first-class beads, all parented to the run epic.
- Each bead carries its full per-task instructions in its description: files to touch, expected behavior, verification command, commit message. The worker reads `bd show <id>` and executes the description exactly.
- Run-wide conventions (commit style, language, build command, test command) live in `ralph-plan.md`, not in every description.
- Environment is a precondition. Credentials, network, toolchain, and runtime dependencies are the operator's responsibility. Do not flag missing environment setup as a gap.

Do NOT recommend: collapsing tasks into bundles, adding environment preflight checks, treating partial completion as a risk, modifying the prompt body, replacing the bd graph with a markdown checklist, or reintroducing a workflow-label taxonomy. The `ralph:<slug>` label is run-identity for partition and orphan detection, not workflow dispatch; do not flag it as a label taxonomy.

Flag as HIGH severity if: zero or more than one open epic exists for `ralph:<slug>`; any open child has an empty description; the bd graph contains cycles; the run epic has no children; the run epic has children but no ready tasks; the parent set under the epic differs from the label set for `ralph:<slug>` (orphans or planning typos); the per-task workflow does not explicitly stop after `bd close` (the agent will do multiple beads in one context, defeating the Ralph design); the per-task workflow does not close the epic before emitting the sentinel; the plan lacks a "Run Identity" section recording the literal slug and epic ID; the plan lacks an "Inputs" section with a `plan documents:` line; the plan omits or contradicts a material requirement explicit in any plan document; the plan's Approach prose declares an ordering ("X first", "after Y") that the bd dependency edges do not enforce. Flag as MEDIUM if: a description omits the verification command or commit message; an isolated task exists in a multi-task graph; the plan duplicates information that already lives in bead descriptions; a bead that touches README.md/CLAUDE.md/.llmdocs/ authors edits inline instead of running `/docs`.

## Your Task

Review the Ralph artifacts:

- ralph-plan.md (mode-neutral execution plan)
- ralph-prompt-insession.md (slash-command wrapper for the in-session driver)
- ralph-prompt-external.md (body for scripts/ralph.sh)
- The bd graph (issues, dependencies, ready/closed state)

Evaluate across these areas:

### 1. Goal Clarity

- Is the objective well-defined and unambiguous?
- Would a fresh-context iteration understand what success looks like from the plan alone?

### 2. Task Decomposition

- Are tasks the right granularity (one focused deliverable each)?
- Is each task independently completable?
- Are there missing tasks needed to achieve the goal?
- Are there unnecessary or redundant tasks?
- Do titles name deliverables rather than techniques?
- Do descriptions carry enough detail (files, verification, commit message) for a cold-start worker?

### 3. Sequencing and Dependencies

- Are dependencies correct in the bd graph?
- Would executing in `bd ready --json` order produce correct results?
- Are there implicit dependencies that should be explicit edges?
- Are there cycles or isolated nodes?

### 4. Plan Completeness

- Do the instructions cover repo-specific conventions, tools, and commands?
- Are build, test, and lint commands accurate?
- Is the per-task workflow clear (read description, execute it, close)?
- Does the plan have a "Run Identity" section that records the literal branch slug, the literal epic ID, and the epic label `ralph:<slug>`?
- Does the Per-Task Workflow read the epic ID from the "Run Identity" section and scope `bd ready` with `--parent <epic-id>`?
- Does the Per-Task Workflow close the epic via `bd close <epic-id>` before emitting the sentinel when the ready queue is empty?
- Does the Per-Task Workflow explicitly say STOP after `bd close` so the agent does not loop within a single context?
- Does the Git Workflow section require a `ralph/` feature branch and BEGIN/END tags?
- Is the Per-Task Workflow explicit that `bd close` for the current task runs BEFORE the sentinel?
- Do beads that touch `README.md`, `CLAUDE.md`, or `.llmdocs/` invoke the `/docs` skill rather than authoring edits inline?

### 5. Risk and Gaps

- What could go wrong during autonomous execution?
- Are there tasks that require human judgment or external access?
- Could any task leave the repo in a broken state?

### 6. Feasibility

- Can a single iteration realistically complete each task?
- Are there tasks that exceed what a fresh context can do autonomously?

For each area, list specific findings with severity (high/medium/low) and actionable recommendations.

## Output

Your VERY NEXT action after the review must be a tool call that writes the file. No intermediate steps.

Determine your model label:

- Claude variants: claude
- GPT variants: openai
- Gemini variants: gemini

Write to `<PROJECT_ROOT>/.llmtmp/ralph-review-$MODEL_LABEL.md` using this template:

\`\`\`markdown
# Ralph Loop Review
**Model**: <MODEL_LABEL>

## Goal Clarity

- **[medium]** Short title. Explanation...

## Task Decomposition

- **[high]** Short title. Explanation...

## Sequencing and Dependencies

- **[medium]** Short title. Explanation...

## Plan Completeness

- **[low]** Short title. Explanation...

## Risk and Gaps

- **[high]** Short title. Explanation...

## Feasibility

- **[medium]** Short title. Explanation...

## Summary

<overall assessment and top 3 recommendations>
\`\`\`

Finding format: \`- **[severity]** Title. Description.\`

Every section must be present. If no findings for an area, write 'No findings.' under its heading.

Then verify: \`ls -la '<PROJECT_ROOT>/.llmtmp/ralph-review-<MODEL_LABEL>.md'\`

If the file does not exist, write it again. Do not exit without the file."

Step 6: Write Prompt to File

Avoid shell interpolation issues with large prompts:

STATE_DIR="$PROJECT_ROOT/.llmtmp/ralph_review_state"
mkdir -p "$STATE_DIR"
OPENAI_DIR=$(mktemp -d)
GEMINI_DIR=$(mktemp -d)
CLAUDE_DIR=$(mktemp -d)
cat > /tmp/ralph-review-prompt.txt <<'PROMPT_EOF'
<the prompt from Step 5>
PROMPT_EOF

Step 7: Launch 3 Separate Background Bash Tasks

Each opencode process must run as its own background Bash task. Do NOT chain them in a single shell with &; child processes get killed when the parent exits.

OpenAI:

STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m openai/gpt-5.3-codex \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<OPENAI_DIR>" \
  --title "Ralph Review - OpenAI" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/openai.ndjson" 2>"$STATE_DIR/openai.log"

Gemini:

STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m google/gemini-3.1-pro-preview \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<GEMINI_DIR>" \
  --title "Ralph Review - Gemini" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/gemini.ndjson" 2>"$STATE_DIR/gemini.log"

Claude:

STATE_DIR="<PROJECT_ROOT>/.llmtmp/ralph_review_state" && \
opencode run \
  -m az-anthropic/claude-opus-4-6 \
  --format json \
  --print-logs \
  --log-level INFO \
  --dir "<CLAUDE_DIR>" \
  --title "Ralph Review - Claude" \
  "$(cat /tmp/ralph-review-prompt.txt)" \
  > "$STATE_DIR/claude.ndjson" 2>"$STATE_DIR/claude.log"

All 3 launch as separate background tasks. Do NOT use & in a single shell.

Output Streams

Each opencode process produces two output files:

<label>.ndjson (stdout): Structured NDJSON events from --format json. Use for programmatic progress tracking.
<label>.log (stderr): Plain-text info-level logs. Use for diagnosing startup failures, permission issues, MCP server errors, plugin loading.

Log lines are structured text, one per line:

INFO  2026-03-13T00:54:25 +4ms service=default directory=/private/tmp creating instance

Step 8: Monitor and Wait

Poll each background task until all 3 complete.

NDJSON progress (replace $NDJSON with the actual path):

# Count completed steps
grep -c '"type":"step_finish"' "$NDJSON" 2>/dev/null || echo 0

# Check if done (last step_finish has reason "stop")
tail -1 "$NDJSON" 2>/dev/null | jq -r '.part.reason // empty'

# Check for errors
grep '"type":"error"' "$NDJSON" 2>/dev/null

Text logs (replace $LOGFILE with the actual path):

# Errors or warnings
grep -E "^(ERROR|WARN)" "$LOGFILE" 2>/dev/null

# Recent activity
tail -5 "$LOGFILE" 2>/dev/null

Step 9: Report Results

For each model, check whether .llmtmp/ralph-review-<label>.md exists.

Report per model:

Success or failure (file present or not)
Total cost: sum of cost from all step_finish events
Whether any errors occurred

Surface the bd graph validation findings from Step 2 separately at the top of the report.

Step 10: Cleanup

The state directory is preserved for post-run inspection (per-model NDJSON event streams and text logs are useful for diagnosing review failures, calibration regressions, and cost trends).

rm -f /tmp/ralph-review-prompt.txt

Expected Output Files

3 files total, one per model:

.llmtmp/ralph-review-openai.md
.llmtmp/ralph-review-gemini.md
.llmtmp/ralph-review-claude.md

NDJSON Log Format Reference

Each opencode process writes NDJSON to $STATE_DIR/<label>.ndjson. One JSON object per line. Skip lines that fail to parse (partial writes).

Key event types:

step_start - New LLM turn begins.
text - Model emitted text: {"type":"text","part":{"text":"..."}}
tool_use - Tool call: {"type":"tool_use","part":{"tool":"bash","state":{"status":"completed","metadata":{"exit":0}}}}
step_finish - Turn completed: {"type":"step_finish","part":{"reason":"stop","cost":0,"tokens":{"total":13494}}}. reason: "stop" = done, reason: "tool-calls" = continuing.
error - Session error: {"type":"error","error":{"data":{"message":"..."}}}

Useful jq queries (replace $NDJSON with the actual path):

NDJSON="<PROJECT_ROOT>/.llmtmp/ralph_review_state/openai.ndjson"
# Is done?
tail -1 "$NDJSON" | jq -r 'select(.type=="step_finish") | .part.reason'
# Total cost
jq -s '[.[] | select(.type=="step_finish") | .part.cost] | add' "$NDJSON"
# All errors
jq -r 'select(.type=="error") | .error.data.message' "$NDJSON"

Replace $LOGFILE with the text log path:

grep -E "^(ERROR|WARN)" "$LOGFILE"

Rules

Claude Code is ONLY a launcher. All review work happens inside the opencode processes.
Do NOT perform any review analysis directly. The opencode processes handle reviewing.
Do NOT ask questions during execution. This is non-interactive.
Launch all 3 processes in parallel. Do not wait for one to finish before starting another.
Use plain message invocation, not --command. The --command flag has a known issue with the c7 MCP server.
Preserve the state directory after reporting results. NDJSON and log files stay for post-run inspection.
If a model fails, still wait for and report the others.
The bd graph validation in Step 2 runs in Claude Code, not in opencode. Findings are passed to opencode reviewers as context.

Adoption

jimweller/ralph-review-deep

$ install --global

Security Scan Results

SKILL.md

Ralph Review (Deep)

Models

Procedure

Step 1: Validate Artifacts Exist

Step 2: Validate the bd Graph

Step 3: Clean Up Previous Reviews

Step 4: Read Files and Build Prompt Context

Step 5: Construct the Review Prompt

Step 6: Write Prompt to File

Step 7: Launch 3 Separate Background Bash Tasks

Output Streams

Step 8: Monitor and Wait

Step 9: Report Results

Step 10: Cleanup

Expected Output Files

NDJSON Log Format Reference

Rules

Related Skills

jimweller/transcript-search

jimweller/prd-review

jimweller/md-syntax

jimweller/configs/claude-code/skills/md-style

jimweller/ralph-review-deep

$ install --global

Security Scan Results

SKILL.md

Ralph Review (Deep)

Models

Procedure

Step 1: Validate Artifacts Exist

Step 2: Validate the bd Graph

Step 3: Clean Up Previous Reviews

Step 4: Read Files and Build Prompt Context

Step 5: Construct the Review Prompt

Step 6: Write Prompt to File

Step 7: Launch 3 Separate Background Bash Tasks

Output Streams

Step 8: Monitor and Wait

Step 9: Report Results

Step 10: Cleanup

Expected Output Files

NDJSON Log Format Reference

Rules

Related Skills

jimweller/transcript-search

jimweller/prd-review

jimweller/md-syntax

jimweller/configs/claude-code/skills/md-style