plugins/dev/skills/orchestrate/SKILL.md
Coordinate multiple tickets in parallel across worktrees with wave-based execution, worker dispatch, and adversarial verification
npx skillsauth add coalesce-labs/catalyst orchestrateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Coordinate multiple Linear tickets in parallel across git worktrees. The orchestrator creates
worktrees, dispatches /oneshot workers, tracks progress via a dashboard, and enforces quality
gates through adversarial verification. The orchestrator NEVER writes application code — it only
coordinates, monitors, and verifies.
Two dispatch modes. With the default
catalyst.orchestration.dispatchMode: "oneshot-legacy", each ticket gets one longclaude -p/catalyst-dev:oneshotworker that runs the full lifecycle. WithdispatchMode: "phase-agents", the orchestrator dispatches nine short-livedclaude --bgphase skills (phase-triage…phase-monitor-deploy) per ticket, advancing onphase.<name>.complete.<ticket>broker events. The phase-agents mode is the post-2026-06-15 path that keeps worker dispatch on the subscription pool. See Phase agents for the pipeline, model assignment, cost economics, and the end-to-end runbook.
# 1. Git (REQUIRED)
if ! command -v git &>/dev/null; then
echo "ERROR: Git is required"
exit 1
fi
# 2. Linearis CLI (REQUIRED for ticket reading)
# See /catalyst-dev:linearis for CLI syntax reference
if ! command -v linearis &>/dev/null; then
echo "ERROR: Linearis CLI required for ticket intake"
echo "Install: npm install -g linearis"
exit 1
fi
# 3. GitHub CLI (REQUIRED for PR monitoring)
if ! command -v gh &>/dev/null; then
echo "ERROR: GitHub CLI required for PR/CI monitoring"
exit 1
fi
# 4. Claude CLI (REQUIRED for worker dispatch)
if ! command -v claude &>/dev/null; then
echo "ERROR: Claude CLI required for worker dispatch"
exit 1
fi
# 5. Project setup (REQUIRED — thoughts, config, workflow context)
if [[ -f "${CLAUDE_PLUGIN_ROOT}/scripts/check-project-setup.sh" ]]; then
"${CLAUDE_PLUGIN_ROOT}/scripts/check-project-setup.sh" || exit 1
fi
/catalyst-dev:orchestrate PROJ-101 PROJ-102 PROJ-103 # explicit tickets
/catalyst-dev:orchestrate --project "Q2 API Redesign" # pull from Linear project
/catalyst-dev:orchestrate --cycle current # pull from current cycle
/catalyst-dev:orchestrate --file tickets.txt # read ticket IDs from file
/catalyst-dev:orchestrate --auto 5 # auto-pick top 5 Todo tickets
| Flag | Description |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| --project <name> | Pull tickets from a Linear project |
| --cycle current | Pull tickets from the current Linear cycle |
| --file <path> | Read ticket IDs from a file (one per line) |
| --auto <N> | Auto-pick top N Todo tickets: urgent/high priority first, newer first. Default N=3. |
| --auto-merge | Workers auto-merge PRs when CI + verification pass |
| --max-parallel <n> | Override config maxParallel (default: 3) |
| --base-branch <branch> | Base branch for worktrees (default: main) |
| --interactive | Include PM intake phase before orchestration |
| --prd <path> | Run PRD review panel + ticket creation before orchestration |
| --dry-run | Show wave plan without executing |
| --state-on-merge <name> | Linear state to set on PR merge. Default: stateMap.done (typically "Done") |
| --stop | Print how to deregister an execution-core project (edit the central registry) and exit. Works regardless of the configured dispatchMode. |
Reads orchestration config from .catalyst/config.json (or .claude/config.json if .catalyst/
doesn't exist). Falls back to sensible defaults if no orchestration block exists.
{
"catalyst": {
"orchestration": {
"worktreeDir": null,
"maxParallel": 3,
"hooks": {
"setup": [],
"teardown": []
},
"workerCommand": "/catalyst-dev:oneshot",
"workerModel": "opus",
"dispatchMode": "phase-agents",
"thoughts": {
"profile": null,
"directory": null
},
"testRequirements": {
"backend": ["unit"],
"frontend": ["unit"],
"fullstack": ["unit"]
},
"verifyBeforeMerge": true,
"allowSelfReportedCompletion": false
}
}
}
dispatchMode (CTL-452):
"phase-agents" (default after Phase 6 lands) — the orchestrator dispatches 9 short-lived phase
agents per ticket via claude --bg (subscription pool). Phase 4 monitor subscribes a broker
phase_lifecycle interest per ticket and advances on
phase.<name>.{complete,skipped}.<TICKET> events via the orchestrate-phase-advance helper
(CTL-512: skipped is a monitor-deploy terminal-no-deploy status routed the same as complete)."oneshot-legacy" — the orchestrator dispatches one long claude -p oneshot worker per ticket.
Kept for rollback safety; flipping a single config key reverts to the pre-CTL-452 behavior."execution-core" (CTL-554, CTL-582) — daemon-served. /orchestrate runs no wave loop and no
Phase 4 session: it just ensures the single machine-level execution-core daemon is running and
exits. Enrolled projects are the central registry ~/catalyst/execution-core/registry.json
(maintained by setup-execution-core-states.sh); the daemon watches that file and serves each
team by composing the CTL-535 monitor, CTL-536 scheduler, and CTL-539 recovery modules.The workerCommand field is still honored in legacy mode. In phase-agents mode it is unused (each
phase has its own canonical skill).
See config template for full schema documentation.
The orchestrator operates ONLY from its own worktree. It:
claude CLI with streaming JSON output)It NEVER:
Resolve tickets: Based on invocation mode, use the Linearis CLI to fetch ticket data. For
exact CLI syntax, run linearis issues usage or linearis cycles usage — do not guess.
--project: list issues filtered by project name--cycle current: list the active cycle, then list its issues--file: read IDs from file, then read each ticket's details--auto <N>: list status=Todo issues, then select the top N. Ranking: urgent/high priority
first (Linear priority 1 = Urgent → 4 = Low, with 0 = "No priority" sorted LAST), then newest
createdAt first. Example jq after linearis issues list --status Todo:
sort_by((if .priority == 0 then 5 else .priority end), (-(.createdAt | fromdateiso8601))) | .[:N]
Present the auto-picked tickets to the user as part of the wave plan before proceeding.Read ticket details: For each ticket, extract:
Build dependency graph: Identify which tickets in the set depend on each other.
Group into waves:
Present wave plan for approval:
Orchestration Plan — "api-redesign"
Total: 6 tickets | 3 waves | Max parallel: 3
Wave 1 (parallel, 3 workers):
PROJ-101: Auth middleware rewrite [backend, 3pt]
PROJ-102: Rate limiting service [backend, 2pt]
PROJ-103: Email templates [frontend, 1pt]
Wave 2 (after Wave 1, 2 workers):
PROJ-104: OAuth integration [fullstack, 5pt] — depends on PROJ-101
PROJ-105: API usage dashboard [frontend, 3pt] — depends on PROJ-102
Wave 3 (after Wave 2, 1 worker):
PROJ-106: Self-service API keys [fullstack, 5pt] — depends on PROJ-104, PROJ-105
Estimated waves: 3 sequential rounds
Proceed? [Y/n]
--dry-run: Print wave plan and exit.Determine worktree base directory (in priority order):
catalyst.orchestration.worktreeDir from config${GITHUB_SOURCE_ROOT}/<org>/<repo>-worktrees/ (from env var + git remote)~/wt/<repo>/ (fallback)Read config for orchestration settings:
# Resolve config file (.catalyst/ first, then .claude/)
CONFIG_FILE=""
for CFG in ".catalyst/config.json" ".claude/config.json"; do
if [ -f "$CFG" ]; then CONFIG_FILE="$CFG"; break; fi
done
# Read orchestration config (all have defaults)
WORKTREE_DIR=$(jq -r '.catalyst.orchestration.worktreeDir // empty' "$CONFIG_FILE" 2>/dev/null)
MAX_PARALLEL=$(jq -r '.catalyst.orchestration.maxParallel // 3' "$CONFIG_FILE" 2>/dev/null)
SETUP_HOOKS=$(jq -c '.catalyst.orchestration.hooks.setup // []' "$CONFIG_FILE" 2>/dev/null)
TEARDOWN_HOOKS=$(jq -c '.catalyst.orchestration.hooks.teardown // []' "$CONFIG_FILE" 2>/dev/null)
WORKER_COMMAND=$(jq -r '.catalyst.orchestration.workerCommand // "/catalyst-dev:oneshot"' "$CONFIG_FILE" 2>/dev/null)
# CTL-208: workerCommand must be plugin-namespaced (/<plugin>:<skill>). A bare /oneshot
# becomes literal prompt text and the worker silently no-ops. Fail loudly here.
if [[ ! "$WORKER_COMMAND" =~ ^/[a-z][a-z0-9_-]*:[a-z][a-z0-9_-]*$ ]]; then
echo "ERROR: catalyst.orchestration.workerCommand=\"$WORKER_COMMAND\" must be plugin-namespaced (/<plugin>:<skill>), e.g. /catalyst-dev:oneshot. Update $CONFIG_FILE." >&2
exit 2
fi
WORKER_MODEL=$(jq -r '.catalyst.orchestration.workerModel // "opus"' "$CONFIG_FILE" 2>/dev/null)
VERIFY_BEFORE_MERGE=$(jq -r '.catalyst.orchestration.verifyBeforeMerge // "true"' "$CONFIG_FILE" 2>/dev/null)
ALLOW_SELF_REPORTED=$(jq -r '.catalyst.orchestration.allowSelfReportedCompletion // "false"' "$CONFIG_FILE" 2>/dev/null)
# CTL-554: dispatchMode drives the execution-core fork below (after Phase 2).
DISPATCH_MODE=$(jq -r '.catalyst.orchestration.dispatchMode // "oneshot-legacy"' "$CONFIG_FILE" 2>/dev/null)
Create ALL worktrees using create-worktree.sh — both orchestrator and workers go through the
same script so they all get .claude/, .catalyst/, dependency install, thoughts init, and custom
hooks:
# The create-worktree.sh script lives relative to this plugin
SCRIPT="${CLAUDE_PLUGIN_ROOT}/scripts/create-worktree.sh"
# Resolve --worktree-dir to pass through (omit flag if not configured — script uses its own defaults)
WT_DIR_FLAG=""
if [ -n "$WORKTREE_DIR" ]; then
WT_DIR_FLAG="--worktree-dir ${WORKTREE_DIR}"
fi
# Resolve --hooks-json to pass custom setup hooks from config
HOOKS_FLAG=""
if [ "$SETUP_HOOKS" != "[]" ] && [ -n "$SETUP_HOOKS" ]; then
HOOKS_FLAG="--hooks-json '${SETUP_HOOKS}'"
fi
# Pass --orchestration so all worktrees record which run they belong to
ORCH_FLAG="--orchestration ${ORCH_NAME}"
# 1. Create orchestrator worktree (same script, same initialization)
"$SCRIPT" "${ORCH_NAME}" "${BASE_BRANCH}" ${WT_DIR_FLAG} ${HOOKS_FLAG} ${ORCH_FLAG}
ORCH_WORKTREE="${WORKTREES_BASE}/${ORCH_NAME}"
# Per-orchestrator state lives under ~/catalyst/runs/<id>/ (decoupled from the
# git worktree — CTL-59). state.json, DASHBOARD.md, workers/, wave briefings,
# and SUMMARY.md all live here. Claude CLI output (streams/stderr) lands in
# workers/output/ so it sits alongside the signal files but does not pollute
# watchers that scan workers/*.json.
ORCH_DIR="$("${CLAUDE_PLUGIN_ROOT}/scripts/catalyst-state.sh" ensure-run-dir "${ORCH_NAME}")"
# 2. Create worker worktrees for current wave
for TICKET_ID in "${WAVE_TICKETS[@]}"; do
"$SCRIPT" "${ORCH_NAME}-${TICKET_ID}" "${BASE_BRANCH}" ${WT_DIR_FLAG} ${HOOKS_FLAG} ${ORCH_FLAG}
done
Where worktrees actually land — the create-worktree.sh script resolves the base directory in
this priority order:
--worktree-dir <path> flag (from catalyst.orchestration.worktreeDir config)~/catalyst/wt/<projectKey>/ (default — reads catalyst.projectKey from config)~/catalyst/wt/<repo>/ (fallback if no config)So for a project with projectKey: "acme" and no worktreeDir override, all worktrees land in:
~/catalyst/wt/acme/
├── api-redesign/ # orchestrator
├── api-redesign-ACME-101/ # worker
├── api-redesign-ACME-102/ # worker
└── api-redesign-ACME-103/ # worker
With worktreeDir: "~/catalyst/api" explicitly configured:
~/catalyst/api/
├── api-redesign/ # orchestrator
├── api-redesign-ACME-101/ # worker
├── api-redesign-ACME-102/ # worker
└── api-redesign-ACME-103/ # worker
Recommended: Add ~/catalyst to Claude Code's additionalDirectories in
~/.claude/settings.json so all worktrees across projects are automatically trusted:
{
"permissions": {
"additionalDirectories": ["/Users/you/catalyst"]
}
}
What create-worktree.sh does for EACH worktree (orchestrator and workers alike):
git worktree add -b <name> <path> <base-branch> — creates the worktree.claude/ directory (Claude Code native config, plugins, rules).catalyst/ directory (Catalyst workflow config, if it exists)catalyst.worktree.setup commands from config — dependency install, thoughts init,
permission grants, or any project-specific setup (like Conductor's conductor.json lifecycle
hooks)catalyst.worktree.setup configured, falls back to auto-detected setup: make setup or
bun/npm install, then humanlayer thoughts init + sync--hooks-json (from
catalyst.orchestration.hooks.setup)Available variables in setup commands: ${WORKTREE_PATH}, ${BRANCH_NAME}, ${TICKET_ID},
${REPO_NAME}, ${DIRECTORY}, ${PROFILE}
After worktree creation, set up the orchestrator's status directory:
# ORCH_DIR is the per-orchestrator state dir under ~/catalyst/runs/<id>/ (created
# by `catalyst-state.sh ensure-run-dir` above, which already makes workers/output/).
# This mkdir is a no-op for fresh runs but keeps the skill robust when ORCH_DIR
# is reconstructed on resume.
mkdir -p "${ORCH_DIR}/workers/output"
Render the initial DASHBOARD.md (CTL-230) — the renderer reads state.json + workers/*.json
signals + the events log every cycle, so calling it now produces a real header + empty worker
table + waves outline rather than a template skeleton:
"${CLAUDE_PLUGIN_ROOT}/scripts/update-dashboard.sh" \
--orch "${ORCH_NAME}" --orch-dir "${ORCH_DIR}" --roll-usage \
>/dev/null 2>>"${ORCH_DIR}/.update-dashboard.log" || true
--roll-usage is the wire that fulfils the "Worker usage / cost is rolled in by the monitor pass"
contract below (CTL-487): the dashboard helper iterates ${ORCH_DIR}/workers/*.json and invokes
orchestrate-roll-usage.sh -v per worker before rendering, with stderr captured to
${ORCH_DIR}/.roll-usage.log. The per-worker call is bounded — already-rolled workers short-circuit
on signal.cost != null and cost a single jq read.
Create the orchestrator's status directory:
${ORCH_DIR}/ # ~/catalyst/runs/${ORCH_NAME}/
├── DASHBOARD.md # human-readable status (re-rendered each Phase 4 cycle)
├── state.json # machine-readable orchestration state
├── wave-1-briefing.md # per-wave briefings
├── SUMMARY.md # final run summary (post-Phase 5)
└── workers/
├── ${TICKET_1}.json # worker signal (schema: worker-signal.json)
├── ${TICKET_2}.json
└── output/ # claude CLI output (streams, stderr)
├── ${TICKET_1}-stream.jsonl # streaming JSON events from claude
├── ${TICKET_1}-stderr.log # worker stderr (silent exits diagnosable)
├── ${TICKET_2}-stream.jsonl
└── ${TICKET_2}-stderr.log
Note on the runs/ split (CTL-59): ORCH_DIR lives at ~/catalyst/runs/${ORCH_NAME}/ and is
decoupled from the git worktree at ${ORCH_WORKTREE} (e.g.
~/catalyst/wt/${PROJECT_KEY}/${ORCH_NAME}/). This lets state survive worktree cleanup and keeps
git status clean. Claude CLI output (stream + stderr) lands in workers/output/ to keep file
watchers that scan workers/*.json free of noise from large stream files.
Debugging silent worker exits: If workers/output/${TICKET_ID}-stream.jsonl is 0 bytes AND
workers/output/${TICKET_ID}-stderr.log is 0 bytes, the worker exited before emitting its first
event — check git -C ${WORKER_DIR} log --oneline -5 and the worktree's .claude/ directory for
setup issues. A non-empty stderr log will identify permission, path, or environment errors.
Initialize state.json:
{
"orchestrator": "<name>",
"startedAt": "<ISO timestamp>",
"baseBranch": "main",
"totalTickets": 6,
"totalWaves": 3,
"currentWave": 1,
"worktreeBase": "<path>",
"waves": [
{
"wave": 1,
"status": "provisioning",
"tickets": ["PROJ-101", "PROJ-102", "PROJ-103"]
},
{
"wave": 2,
"status": "blocked",
"tickets": ["PROJ-104", "PROJ-105"],
"dependsOn": [1]
}
],
"workers": {}
}
Register with global state (immediately after local state initialization):
STATE_SCRIPT="${CLAUDE_PLUGIN_ROOT}/scripts/catalyst-state.sh"
# Resolve the catalyst-comms binary. Prefer the plugin-shipped copy so installs
# where `catalyst-comms` is only a shell alias (which doesn't propagate to
# subshells) still work. Fall back to PATH for users who have symlinked it.
COMMS_BIN="${CLAUDE_PLUGIN_ROOT:-}/scripts/catalyst-comms"
[ -x "$COMMS_BIN" ] || COMMS_BIN="$(command -v catalyst-comms 2>/dev/null || true)"
if [ -z "$COMMS_BIN" ] || [ ! -x "$COMMS_BIN" ]; then
echo "warn: catalyst-comms not found — comms disabled (install: plugins/dev/scripts/install-cli.sh)" >&2
COMMS_BIN=""
fi
# Build the registration JSON with all workers from all waves
# Use linearis CLI to read ticket titles (run `linearis issues usage` for syntax)
WORKERS_JSON="{}"
for TICKET_ID in "${ALL_TICKETS[@]}"; do
TITLE=$(linearis issues read "$TICKET_ID" | jq -r '.title') # see `linearis issues usage`
WORKERS_JSON=$(echo "$WORKERS_JSON" | jq \
--arg tid "$TICKET_ID" --arg title "$TITLE" \
'. + {($tid): {ticketId: $tid, title: $title, status: "dispatched", phase: 0, branch: null, pr: null, updatedAt: "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'", needsAttention: false, attentionReason: null}}')
done
# Detect repository from git remote
REPO=$(git remote get-url origin 2>/dev/null | sed 's|.*github.com[:/]||;s|\.git$||')
"$STATE_SCRIPT" register "${ORCH_NAME}" "$(jq -nc \
--arg id "${ORCH_NAME}" \
--arg pk "$(jq -r '.catalyst.projectKey // "unknown"' "$CONFIG_FILE")" \
--arg repo "$REPO" \
--arg bb "${BASE_BRANCH}" \
--arg wtd "${ORCH_DIR}" \
--arg sf "${ORCH_DIR}/state.json" \
--argjson total "${#ALL_TICKETS[@]}" \
--argjson waves "$TOTAL_WAVES" \
--argjson workers "$WORKERS_JSON" \
'{
id: $id, projectKey: $pk, repository: $repo, baseBranch: $bb,
status: "active", startedAt: (now | strftime("%Y-%m-%dT%H:%M:%SZ")),
worktreeDir: $wtd, stateFile: $sf,
progress: {totalTickets: $total, completedTickets: 0, failedTickets: 0, inProgressTickets: 0, currentWave: 1, totalWaves: $waves},
usage: {inputTokens: 0, outputTokens: 0, cacheReadTokens: 0, cacheCreationTokens: 0, costUSD: 0, numTurns: 0, durationMs: 0, durationApiMs: 0, model: null},
workers: $workers, attention: []
}')"
The CATALYST_ORCHESTRATOR_ID is set to ${ORCH_NAME} for use by workers (passed via environment
variable alongside CATALYST_ORCHESTRATOR_DIR).
Capture event-log baseline (CTL-491): Before any worker dispatch, snapshot the catalyst event
log's current line count and file path. The Phase 4 replay step uses this to find any
phase.*.{complete,failed,turn-cap-exhausted,skipped}.<TICKET> events that landed during the
dispatch-to-monitor handoff. The new state.json.race object is opaque to older state.json
consumers (they ignore unknown top-level fields via .field // default jq patterns).
EVENTS_DIR="${CATALYST_DIR:-$HOME/catalyst}/events"
mkdir -p "$EVENTS_DIR"
BASELINE_FILE="${EVENTS_DIR}/$(date -u +%Y-%m).jsonl"
[[ -f "$BASELINE_FILE" ]] || : > "$BASELINE_FILE"
BASELINE_LINE=$(wc -l < "$BASELINE_FILE" | tr -d ' ')
TMP="${ORCH_DIR}/state.json.tmp.$$"
jq --arg cursor "$BASELINE_LINE" --arg file "$BASELINE_FILE" --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
'.race = {startLineCursor: ($cursor | tonumber), startEventsFile: $file, capturedAt: $ts}' \
"${ORCH_DIR}/state.json" > "$TMP" \
&& mv "$TMP" "${ORCH_DIR}/state.json" || rm -f "$TMP"
Create the shared comms channel (CTL-111): the orchestrator creates a file-based channel that
every worker will auto-join via CATALYST_COMMS_CHANNEL in its dispatch env. Best-effort — the
orchestrator does not crash if catalyst-comms is missing.
# Shared channel for this run. Workers will join at dispatch time.
if [ -n "$COMMS_BIN" ]; then
"$COMMS_BIN" join "${ORCH_NAME}" \
--as orchestrator \
--capabilities "coordinates workers" \
--orch "${ORCH_NAME}" \
--ttl 7200 >/dev/null 2>&1 || true
fi
Start session tracking (alongside the global state registration above):
SESSION_SCRIPT="${CLAUDE_PLUGIN_ROOT}/scripts/catalyst-session.sh"
ORCH_STATUS_SCRIPT="${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-status.sh"
if [[ -x "$SESSION_SCRIPT" ]]; then
CATALYST_SESSION_ID=$("$SESSION_SCRIPT" start --skill "orchestrate" \
--label "${ORCH_NAME}" \
--workflow "${CATALYST_SESSION_ID:-}")
export CATALYST_SESSION_ID
"$SESSION_SCRIPT" phase "$CATALYST_SESSION_ID" "dispatching" --phase 3
fi
dispatchMode: "execution-core" and the --stop flag both short-circuit the wave loop. Evaluate
this before Phase 3 — when it applies, no workers are dispatched and no Phase 4 monitor session
starts.
CTL-582 (D4) made the central registry ~/catalyst/execution-core/registry.json the single source
of enrolled projects, maintained by setup-execution-core-states.sh. /orchestrate no longer
writes per-project enrollment records — there is nothing to enroll here.
Invoked with --stop — execution-core projects are deregistered by editing the central
registry, not by /orchestrate. Print the guidance and exit 0:
echo "execution-core: enrolled projects are the central registry" \
"~/catalyst/execution-core/registry.json — remove the team's entry there to deregister."
# Exit 0 — do not continue.
DISPATCH_MODE is execution-core (no --stop) — ensure the machine-level daemon is
running, then exit:
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-execution-core-route.sh"
# Ensures the daemon is running. No enrollment, no wave loop, no Phase 4
# session. Exit 0 — do not continue to Phase 3.
Any other DISPATCH_MODE — continue to Phase 3 below.
For each provisioned worker worktree, dispatch a /oneshot session.
Register broker filter interests BEFORE dispatch (CTL-491): Phase-agent workers can finish in
well under a second when their work is a no-op (e.g. an already-triaged ticket). Emitting
filter.register events AFTER dispatch opens a race window where the broker's interest map is still
empty when phase.<name>.complete.<TICKET> arrives — processEvent early-returns and the event is
dropped silently (broker/index.mjs:1782). Run the registration helper here so all four
deterministic + per-ticket interests are durable in the broker BEFORE any claude --bg invocation.
Idempotent at the broker (upserts by interest_id); the Phase 4 entry re-invokes the same helper as
a belt-and-suspenders.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-register-interests.sh" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}" \
--config "${REPO_ROOT:-$(git rev-parse --show-toplevel 2>/dev/null || echo .)}/.catalyst/config.json"
# Soft-fail-safe: helper exits 0 when broker/filter daemon is down (no-op) and
# 2 only on argument errors. See plugins/dev/scripts/orchestrate-register-interests.sh.
Emit dispatching status (CTL-405): Before launching workers, announce the phase:
TOTAL_WORKERS=$(jq -r '.progress.totalTickets // 0' "${ORCH_DIR}/state.json" 2>/dev/null || echo 0)
[[ -x "$ORCH_STATUS_SCRIPT" ]] && "$ORCH_STATUS_SCRIPT" emit \
--orch "${ORCH_NAME}" --phase dispatching --wave "${CURRENT_WAVE:-1}" \
--total "$TOTAL_WORKERS" \
--summary "wave ${CURRENT_WAVE:-1} dispatching" 2>/dev/null || true
Preferred entrypoint — orchestrate-dispatch-next (CTL-116):
The canonical dispatcher drains state.json's .queue.waveNPending for every N (dynamically, so
wave 1/2/3/…/N all work without code changes), respects maxParallel - currentlyRunning, writes
dispatched/phase-0 signal files, launches workers via nohup, updates global state, removes
dispatched tickets from whichever waveNPending list they lived in, and runs the post-dispatch
healthcheck:
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-dispatch-next" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}" \
--config "${REPO_ROOT}/.catalyst/config.json"
# Emits a one-line JSON summary: {"running":R,"slotsAfter":S,"dispatched":[...][,"queueEmpty":true]}
# Reads `orchestrator`, `worktreeBase`, `maxParallel` from state.json by default.
# Pass --session-id / --worker-command / --worker-args / --comms-channel to override.
# Pass --dry-run to preview without writing state or launching claude.
#
# CTL-452: --config gates the dispatch mode. With dispatchMode = "phase-agents",
# the dispatcher launches phase-triage agents on `claude --bg` (subscription pool)
# via `phase-agent-dispatch`, and the wake handler in this Phase 4 advances each
# worker through the 9-phase sequence. With dispatchMode = "oneshot-legacy" or
# no --config, the dispatcher uses the legacy `-p oneshot` worker (one long
# claude session per ticket). Phase advancement calls always pass explicit
# `--phase <name> --ticket <T>` and bypass the dispatchMode default.
Call this once when the current wave is ready to dispatch, and again whenever a worker slot frees
up. It supersedes the hand-rolled dispatch-next.sh pattern from pre-CTL-116 orchestration runs
(which hardcoded wave1Pending + wave2Pending + wave3Pending). The inline block below is preserved
as reference for the underlying machinery.
Phase advancement on phase.<name>.complete.<TICKET> wake (CTL-452):
# Inside the wake handler — called once per phase-complete event surfaced
# via the broker's phase_lifecycle interest.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-phase-advance" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}" \
--ticket "${TICKET}" \
--completed-phase "${COMPLETED_PHASE}"
# Emits one-line JSON: {"advanced": true|false, "fromPhase": "<name>", "toPhase": "<next>|null", ...}
# The helper resolves the next phase via the canonical 9-phase sequence,
# refuses to double-dispatch (idempotent), and delegates to dispatch-next
# with --phase <next> --ticket <T>. If `completed-phase = monitor-deploy`,
# the ticket has reached the terminal phase and no further advance happens.
Phase failure on phase.<name>.failed.<TICKET> wake (CTL-452):
# Run orchestrate-revive once. The script already implements the
# one-retry-then-escalate policy via .reviveCount + --max-revives, and
# CTL-452 added the --bg --resume path for phase-mode workers (legacy
# oneshot workers continue to use -p --resume — backward compatible).
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-revive" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}"
# On the second consecutive failure (reviveCount ≥ MAX_REVIVES), revive
# marks the worker stalled, sets attentionReason = "revive-budget-exhausted",
# and emits attention via catalyst-state — matching the existing escalation
# pattern for legacy workers.
Phase turn-cap exhaustion on phase.<name>.turn-cap-exhausted.<TICKET> wake (CTL-484):
# Same script — orchestrate-revive's continuation branch (CTL-484) detects
# status=turn-cap-exhausted on the top-level signal + handoffPath on the
# per-phase signal (written by phase-agent-emit-complete --handoff-path),
# dispatches a continuation worker with CATALYST_IS_CONTINUATION=true +
# CATALYST_HANDOFF_PATH=<path> + CATALYST_CONTINUATION_COUNT=<n>, and bumps
# .continuationCount on a budget separate from .reviveCount (default 3 vs 10).
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-revive" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}"
# On budget exhaustion (continuationCount ≥ MAX_CONTINUATIONS), revive marks
# the worker stalled with attentionReason = "continuation-budget-exhausted"
# and emits worker-continuation-budget-exhausted. The separate budget is what
# stops cap-exhaustion runs from burning the error-revive budget — that was
# the bug CTL-484 fixes.
Dispatch mechanism — claude CLI with streaming JSON:
WORKER_DIR="${WORKTREES_BASE}/${ORCH_NAME}-${TICKET_ID}"
WORKER_STREAM="${ORCH_DIR}/workers/output/${TICKET_ID}-stream.jsonl"
WORKER_STDERR="${ORCH_DIR}/workers/output/${TICKET_ID}-stderr.log"
SIGNAL_FILE="${ORCH_DIR}/workers/${TICKET_ID}.json"
# `claude -w` takes a *name* and creates a new worktree — it does NOT accept a
# path to an existing worktree. The worker worktree was already provisioned in
# Phase 2, so `cd` into it inside a backgrounded subshell and `exec` claude so
# its PID is reachable from the outer shell as `$!`.
# `--dangerously-skip-permissions` is required because headless workers have no
# TTY to answer permission prompts; the worktree is pre-trusted via Catalyst's
# setup hooks. `nohup` keeps the worker alive after the orchestrator shell
# exits. Stderr goes to a real file (not /dev/null) so a silent worker exit
# stays debuggable.
(
cd "${WORKER_DIR}" || exit 1
CATALYST_ORCHESTRATOR_DIR="${ORCH_DIR}" \
CATALYST_ORCHESTRATOR_ID="${ORCH_NAME}" \
CATALYST_COMMS_CHANNEL="${ORCH_NAME}" \
CATALYST_SESSION_ID="${CATALYST_SESSION_ID:-}" \
exec nohup claude \
-n "${ORCH_NAME}-${TICKET_ID}" \
--output-format stream-json \
--verbose \
--dangerously-skip-permissions \
-p "${WORKER_COMMAND} ${TICKET_ID} --auto-merge"
) > "$WORKER_STREAM" 2> "$WORKER_STDERR" &
WORKER_PID=$!
# Record the worker's PID + initial heartbeat into its signal file so the
# monitor can perform kill-0 liveness checks.
if [ -f "$SIGNAL_FILE" ]; then
jq --argjson pid "$WORKER_PID" '.pid = $pid | .lastHeartbeat = .updatedAt' \
"$SIGNAL_FILE" > "${SIGNAL_FILE}.tmp" && mv "${SIGNAL_FILE}.tmp" "$SIGNAL_FILE"
fi
Streaming JSON output (--output-format stream-json --verbose) emits NDJSON to stdout, one
event per line, in real-time as the worker runs. The monitor can tail the stream file to show live
worker activity. Key event types:
| Event | What it signals |
| ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| {"type":"system","subtype":"init"} | Worker session started; contains session_id |
| {"type":"stream_event","event":{"type":"content_block_start","content_block":{"type":"tool_use","name":"..."}}} | Worker is now invoking a specific tool (Bash, Read, Edit, etc.) |
| {"type":"stream_event","event":{"type":"content_block_delta","delta":{"type":"text_delta"}}} | Worker is generating reasoning/response text |
| {"type":"assistant"} | Complete assistant turn with all content blocks |
| {"type":"system","subtype":"api_retry"} | Worker hit rate limit / error; shows attempt and delay |
| {"type":"result"} | Worker finished; contains final answer and usage stats |
Worker usage / cost is rolled in by the monitor pass (CTL-115, wired through
update-dashboard.sh --roll-usage since CTL-487). The dispatch shell backgrounds workers with &
and never waits on them, so usage/cost extraction cannot happen here. Phase 4's
update-dashboard.sh call passes --roll-usage, which iterates ${ORCH_DIR}/workers/*.json and
invokes plugins/dev/scripts/orchestrate-roll-usage.sh -v per worker before rendering. The helper
parses the final result event from the worker's stream file and:
.cost = USAGE) — the dashboard reads signal files
(not global state) for per-worker cost columns.state.workers[ticket].usage.state.usage for the orchestrator-level aggregate.The helper is idempotent (gated on signal.cost == null) and safe to call every cycle. Because
every dashboard render now sweeps every worker, update-dashboard.sh --roll-usage doubles as the
periodic safety net for any worker whose stream contains a result event but whose signal.cost is
still null. Audit trail lands at ${ORCH_DIR}/.roll-usage.log.
Emit dispatch event and update global state after each worker dispatch:
"$STATE_SCRIPT" worker "${ORCH_NAME}" "${TICKET_ID}" '.status = "dispatched" | .phase = 0'
"$STATE_SCRIPT" update "${ORCH_NAME}" '.progress.inProgressTickets += 1'
"$STATE_SCRIPT" event "$(jq -nc \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--arg orch "${ORCH_NAME}" --arg w "${TICKET_ID}" \
'{ts: $ts, orchestrator: $orch, worker: $w, event: "worker-dispatched", detail: null}')"
Post-dispatch health check (CTL-87):
After the wave's per-worker dispatch loop has completed, run the batch health check. This is the
per-wave invocation (CTL-511 added a second call site — see the reactive scan in Phase 4). It
sleeps briefly (default 15s — configurable via --grace-seconds), then verifies that every worker
still sitting at status="dispatched"/phase=0 has a live PID. Any worker whose PID has already
died is transitioned to status="failed" with failureReason="launch-failure", an attention item
of type launch-failure is raised, and a worker-launch-failed event is emitted. This means
dead-on-arrival workers surface in under 30 seconds instead of after the 15-minute stalled-worker
timeout, and the orchestrator can re-dispatch them (via orchestrate-fixup or a manual redispatch)
in the same wave.
# Run ONCE, after all workers in this wave have been dispatched.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-healthcheck" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}"
# Prints a JSON summary on stdout: {"checked":N,"dead":M,"deadTickets":[...]}.
# Launch failures also appear in the attention list and as `worker-launch-failed`
# events in the global state log.
Healthy workers are untouched. Workers that have already advanced past dispatched (e.g. into
researching) are skipped because reaching a later status is itself proof of life. This check
complements the 15-minute stalled-worker detection in Phase 4 — healthcheck catches launch failures,
the stalled-worker scan catches workers that die mid-run.
CTL-511: the same orchestrate-healthcheck script also runs inside the reactive scan (Phase 4)
on every wake-up and every 10-minute idle tick — it is no longer a once-per-wave-only check. That
second call site catches phase agents that die after launch (printing a job id, then exiting
before their terminal emit), bounding detection to one scan interval. The healthcheck is idempotent
— terminal and stalled signals are skipped — so the extra call site adds no duplicate work.
Worker dispatch prompt includes mandatory testing AND lifecycle requirements:
MANDATORY: Before completing your contract:
1. TDD — write failing tests BEFORE implementation for every feature
2. Unit tests — required for all new functions/methods
3. Integration/API tests — required for every new/modified API endpoint
4. Security review — must pass /security-review or equivalent
5. Code review — must pass code-reviewer agent
6. All quality gates in config must pass
Your success contract ENDS at (CTL-252):
✓ PR open (gh pr create succeeded)
✓ Active listen loop resolved all blockers (CI, bot reviews, BEHIND) inline
✓ PR merged with gh pr merge --squash --delete-branch (no --auto)
✓ Optional deployment verified (if catalyst.deploy configured)
✓ pr.mergedAt, deployment.url (if applicable), status="done" written to signal file
✓ Worker process exits cleanly
Listen for PR events using this precedence ladder (matches oneshot Phase 5):
1. PREFERRED — Broker auto-correlation (Pattern 3, lowest cost):
The catalyst-broker daemon classifies events between Claude turns and
emits filter.wake.${CATALYST_SESSION_ID} only on semantic matches.
The agent.checkin event emitted by catalyst-session.sh start already
identifies you to the broker; emit a second agent.checkin carrying
claimed_pr after gh pr create (the oneshot helper broker_claim_pr does
exactly this). The broker auto-derives a pr_lifecycle interest covering
CI, reviews, comments, thread resolution, merge, BEHIND pushes, and
deployment status — all on a single wake event. Wait on:
catalyst-events wait-for \
--filter ".attributes.\"event.name\" == \"filter.wake.${CATALYST_SESSION_ID}\"" \
--timeout 600
Deterministic routes (pr_lifecycle / ticket_lifecycle / comms_lifecycle)
cost zero LLM tokens. See plugins/dev/skills/broker/SKILL.md for the
full protocol and plugins/dev/skills/catalyst-filter/SKILL.md for
prose-based registration when the deterministic routes aren't enough.
2. FALLBACK — catalyst-events wait-for with explicit jq (Pattern 2):
When catalyst-broker status returns non-running, use the two-phase
pattern from plugins/dev/skills/wait-for-github/SKILL.md. Blocking
subprocess call; works reliably in claude -p non-interactive sessions.
One jq predicate covers CI / reviews / push / merge for your PR.
3. FORBIDDEN for workers — Monitor over catalyst-events tail (Pattern 1):
That is the orchestrator's Phase 4 pattern, not a worker pattern.
A short-lived claude -p worker has no long-lived turn loop to consume
Monitor notifications, and tail filters wake on every matching line,
burning context.
NEVER use gh pr view --json in any loop — that burns GraphQL rate limits.
Use gh api REST endpoints only for the authoritative state check that
follows every wake.
WAKE NARRATION (MANDATORY, CTL-369): every Monitor wake and every wait-for
return must be acknowledged with a single short line of assistant text before
returning to the wait. A thinking-only end_turn after a Monitor wake makes the
harness's next <task-notification> XML leak as a confusing "Human:\n<task-id>"
phantom user message. The line shape is:
wake: <event.name> #<pr> [interest=<type>] — <action being taken>
wake: <event.name> — routine, staying in event loop
wake: <event.name> — already addressed, no-op
Include the matched filter clause / interest_id when the wake is from the
broker (.body.payload.interest_id, .body.payload.reason). See
plugins/dev/skills/monitor-events/SKILL.md § Narration for the full rule.
When you write a filter by hand, target the canonical event names listed in
[[event-name-allowlist]] — anything outside that allowlist is either
non-actionable or covered by a different lifecycle group. Do NOT use
`.attributes."catalyst.orchestrator.id"` as a bare clause: CTL-234 stamps it
on every github webhook tied to one of the orchestrator's PRs, so without an
event-type guard it wakes the worker on 60-70% of unrelated webhooks. See
[[wait-for-github]] § Known filter pitfalls.
Write these fields into your signal file as they become available:
pr.number
pr.url
pr.prOpenedAt (ISO timestamp when gh pr create returned)
pr.mergedAt (ISO timestamp when merge confirmed via REST)
pr.mergeCommitSha (from REST response after merge)
pr.ciStatus (pending → merged)
deployment.url (from deployment_status event, if configured)
status (pr-created → done)
On unrecoverable blockers (human changes-requested, DIRTY after attempts,
CI blocked after 3 fix attempts), write status="stalled" with details and
post a `comms attention` message — the orchestrator's Phase 4 dispatches
remediation for stalled workers.
Status transitions you do NOT write (orchestrator-owned fallback only):
done (written by orchestrator Phase 4 ONLY when worker stalled before merge)
COMMS DISCIPLINE: when posting to the shared comms channel, follow the rules in the
catalyst-comms skill (plugins/dev/skills/catalyst-comms/SKILL.md § Posting Discipline):
- info = phase transitions + PR-opened only (default heartbeat, ~5-7 per session)
- attention = orchestrator action required (0-2 per session, MANDATORY on: scope
conflict, missing access, ambiguous spec, 3+ repeated CI failures, status=stalled)
- done = exactly 1, only via the `done` subcommand at terminal success
- never use attention as a heartbeat — it triggers the orchestrator's NEEDS ATTENTION
banner
Write your status to the worker signal file at:
${ORCH_DIR}/workers/${TICKET_ID}.json
Update the signal file at each phase transition using the worker-signal.json schema.
Initialize worker signal file (orchestrator writes the initial state):
{
"ticket": "PROJ-101",
"orchestrator": "<name>",
"workerName": "<orch-name>-PROJ-101",
"label": "oneshot PROJ-101",
"status": "dispatched",
"phase": 0,
"startedAt": "<ISO timestamp>",
"updatedAt": "<ISO timestamp>",
"worktreePath": "<path>",
"pr": null,
"linearState": null,
"definitionOfDone": {
"testsWrittenFirst": false,
"unitTests": { "exists": false, "count": 0 },
"apiTests": { "exists": false, "count": 0 },
"functionalTests": { "exists": false, "count": 0 },
"typeCheck": { "passed": false },
"securityReview": { "passed": false },
"codeReview": { "passed": false },
"rewardHackingScan": { "passed": false }
}
}
if [[ -n "${CATALYST_SESSION_ID:-}" && -x "$SESSION_SCRIPT" ]]; then
"$SESSION_SCRIPT" phase "$CATALYST_SESSION_ID" "monitoring" --phase 4
fi
ACTIVE_COUNT=$(jq -rs '[.[] | select(.status != "done" and .status != "failed")] | length' \
"${ORCH_DIR}/workers/"*.json 2>/dev/null || echo 0)
TOTAL_COUNT=$(jq -rs 'length' "${ORCH_DIR}/workers/"*.json 2>/dev/null || echo 0)
[[ -x "$ORCH_STATUS_SCRIPT" ]] && "$ORCH_STATUS_SCRIPT" emit \
--orch "${ORCH_NAME}" --phase monitoring --wave "${CURRENT_WAVE:-1}" \
--active "$ACTIVE_COUNT" --total "$TOTAL_COUNT" \
--summary "wave ${CURRENT_WAVE:-1} monitoring (${ACTIVE_COUNT}/${TOTAL_COUNT} active)" 2>/dev/null || true
Re-register with catalyst-broker daemon (CTL-257, CTL-303, CTL-357, CTL-452, CTL-491): Phase 3
already invoked orchestrate-register-interests.sh BEFORE worker dispatch (the CTL-491 race-fix
hoist). This Phase 4 entry calls the same helper again as a belt-and-suspenders — registration is
idempotent at the broker (upserts by interest_id), and a second invocation ensures the four
interests stay fresh even if the broker restarted between Phase 3 and Phase 4.
The helper emits four deterministic interests (route without Groq):
pr_lifecycle — orchestrator-level aggregation across all worker PRs.ticket_lifecycle — Linear ticket state changes for the orchestrator's tickets.comms_lifecycle — worker-posted attention / done messages on the shared channel.phase_lifecycle (CTL-452, CTL-484, CTL-512) — phase.<name>.complete.<TICKET>,
phase.<name>.failed.<TICKET>, phase.<name>.turn-cap-exhausted.<TICKET>, and
phase.<name>.skipped.<TICKET> events emitted by phase agents. One interest per active ticket,
covering all 9 phase names. The turn-cap status routes through orchestrate-revive's continuation
branch (separate budget from error revives). The skipped status (CTL-512, monitor-deploy
terminal-no-deploy) routes the same as complete via orchestrate-phase-advance — advance no-ops
on monitor-deploy, so the wake's purpose is to free the wave slot via the scheduler's in-flight
predicate. Only registered when catalyst.orchestration.dispatchMode = "phase-agents".CTL-357 retired the Groq prose interest (~95% false-positive rate). All four interests share the
same notify_event: "filter.wake.${ORCH_NAME}", so the orchestrator's wait-for filter does not
change.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-register-interests.sh" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}" \
--config "${REPO_ROOT:-$(git rev-parse --show-toplevel 2>/dev/null || echo .)}/.catalyst/config.json"
# Helper internals: see plugins/dev/scripts/orchestrate-register-interests.sh
# for the exact wire format of each filter.register event. Soft-fails (exit 0)
# when neither catalyst-broker nor catalyst-filter is running.
Replay race-window phase events (CTL-491): Before entering the event-driven wait loop, catch up
on any phase.<name>.{complete,failed,turn-cap-exhausted,skipped}.<TICKET> events that landed
between the Phase 2 baseline (state.json.race.startLineCursor) and now. With the Phase 3
pre-dispatch registration the window should be zero-length in steady state, but the replay is
idempotent and cheap — it scans only the tail of the current month's event log past the baseline
cursor and routes each match through orchestrate-phase-advance (for complete / skipped) or
orchestrate-revive (for failed / turn-cap-exhausted). Cross-orchestrator events are filtered out by checking
workers/*.json for the ticket.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-replay-phase-events.sh" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}" || true
# Soft-fail with || true so a malformed event log or missing baseline can't
# stall the orchestrator; the live Monitor/wait-for path picks up new events
# regardless. The helper exits 1 only when state.json.race is missing.
Phase 4 is event-driven, not poll-driven (CTL-210, CTL-243). The orchestrator subscribes to the
unified event log via catalyst-events tail (wrapped in the Monitor tool) and wakes on every
relevant GitHub / Linear / orchestrator-lifecycle event. A 10-minute idle timer is the safety-net
fallback for daemon-down or missed-event scenarios — never the primary mechanism. Do NOT self-pace
with sleeps or "wake in N minutes" framing — that defeats the event-driven contract and burns
context to no purpose. See plugins/dev/skills/monitor-events/SKILL.md for the full pattern.
Launch the Monitor before entering the reactive scan. Wrap this command with the Monitor tool
— each emitted line is a wake-up.
The recommended filter is scope-aware: build it from the orchestrator's worker signal directory
using catalyst-events build-orchestrator-filter, then pass the result verbatim as --filter:
FILTER=$(catalyst-events build-orchestrator-filter "$ORCH_DIR")
catalyst-events tail --filter "$FILTER"
Use this command EXACTLY as shown — do NOT improvise a | grep … or | jq … post-pipe (see "Filter
discipline" below for why, including the specific | grep -v 'filter.wake' anti-pattern).
catalyst-filter alternative (CTL-257): When the filter daemon is running, you can replace the
broad catalyst-events tail with a targeted catalyst-events wait-for on
filter.wake.{ORCH_NAME}. The daemon batches raw events through Groq, classifies which are relevant
to this orchestrator's context, and emits a single wake event — so the reactive scan runs only on
semantically meaningful events rather than on every raw webhook. The 10-minute timeout acts as the
safety-net fallback for daemon-down scenarios.
WAKE_EVENT=$(catalyst-events wait-for \
--filter ".attributes.\"event.name\" == \"filter.wake.${ORCH_NAME}\"" \
--timeout 600 2>/dev/null || true)
if [ -n "$WAKE_EVENT" ]; then
WAKE_REASON=$(echo "$WAKE_EVENT" | jq -r '.body.payload.reason // "unknown"' 2>/dev/null || echo "unknown")
echo "[Phase 4] Filter wake: ${WAKE_REASON}"
fi
# Always follow with an authoritative REST re-check — the reason string is informational only.
Use WAKE_REASON for logging only; the reactive scan below reads authoritative state from
gh pr view, git rev-list, and the signal file regardless of how the wake was triggered.
The helper reads ${ORCH_DIR}/workers/*.json and emits a single jq predicate that matches
catalyst-origin events for this orchestrator, worker lifecycle events for any in-orch ticket, github
events scoped by branch-ref prefix (refs/heads/<orch>-...) or PR-number set,
check_suite/workflow_run events whose detail.prNumbers intersect the PR set, and linear events
for any in-orch ticket. Re-build it after orchestrate-dispatch-next adds new workers so the
PR/ticket sets stay in sync.
Filter discipline (CTL-240, CTL-372). All noise filtering belongs inside --filter. Do NOT
pipe catalyst-events tail through a downstream awk/sed/grep/jq stage for additional
filtering or projection. The primary reason is clarity — --filter is the single place a reader can
look to see what reaches the consumer. The secondary reason is buffering: BSD awk (and unflagged
grep/sed) buffer stdout in 4 KB blocks when stdout is not a TTY, and with the typical ~1–3
events/min orchestrator cadence the buffer never fills and notifications stall silently for 15+
minutes despite live PR activity. grep --line-buffered and jq --unbuffered DO line-flush
mechanically (per their macOS/Linux man pages), but you should still not need either flag because
filtering belongs in --filter.
Anti-pattern (CTL-372): … | grep -v '"event.name":"filter.wake"'. Observed in a real
orchestrator session and is wrong for two reasons: (a) filter.wake.* events are emitted by the
broker as canonical OTel envelopes with no top-level .event, .orchestrator, or .scope field.
build-orchestrator-filter's predicate reads only v1 paths, so canonical filter.wake.* events
never satisfy any clause and never reach the consumer in the first place. (b) The grep pattern would
also strip this orchestrator's OWN intended filter.wake.${ORCH_NAME} wake — the very event
registered with the broker. Since CTL-346 the broker no longer re-classifies its own emissions, so
there is no feedback loop to defend against on the consumer side either. If you find yourself
wanting to remove filter.wake noise from tail output, the answer is to use
build-orchestrator-filter (which already excludes them) and avoid any hand-rolled --filter that
adds a .attributes."catalyst.orchestrator.id" clause without an event-type guard.
GitHub event schema (CTL-240). github.* webhook events carry orchestrator: null and
worker: null on every line — they are scoped only by .attributes."vcs.repository.name",
.attributes."vcs.ref.name", .attributes."vcs.pr.number", .attributes."vcs.revision", and (for
check_suite / workflow_run) body.payload.prNumbers. Predicates that try to scope github events
by .attributes."catalyst.orchestrator.id" == "<orch>" will silently drop every github event.
build-orchestrator-filter handles this correctly — prefer it over hand-rolled filters.
If you need to write a filter by hand (e.g. for one-off wait-for calls), the broad event-type
recommendation is:
catalyst-events tail --filter '
(.attributes."event.name" | startswith("github.pr.")) or
(.attributes."event.name" | startswith("github.pr_review")) or
(.attributes."event.name" | startswith("github.issue_comment")) or
(.attributes."event.name" | startswith("github.check_")) or
(.attributes."event.name" | startswith("github.workflow_run")) or
(.attributes."event.name" | startswith("github.deployment")) or
(.attributes."event.name" == "github.push") or
(.attributes."event.name" | startswith("linear.issue.")) or
(.attributes."event.name" == "orchestrator.worker.phase_advanced") or
(.attributes."event.name" == "orchestrator.worker.status_terminal") or
(.attributes."event.name" == "orchestrator.worker.pr_created") or
(.attributes."event.name" == "orchestrator.worker.done") or
(.attributes."event.name" == "orchestrator.worker.failed") or
(.attributes."event.name" == "orchestrator.attention.raised") or
(.attributes."event.name" == "orchestrator.attention.resolved")
'
This list extends the pre-CTL-240 recommendation with pr_review_comment (Codex review threads land
here — needed for CTL-64 BLOCKED auto-fixup detection), issue_comment (general PR comments), and
workflow_run (the most reliable CI-done signal). The broad form has no scope filter, so events
from sibling orchestrators sharing the repo will also fire wake-ups; prefer
build-orchestrator-filter when you have an $ORCH_DIR to draw on.
Orchestrator-scoped filtering (CTL-234):
github.*events now carry.attributes."catalyst.orchestrator.id"(stamped at receive time by the webhook handler). You may safely add(.attributes."catalyst.orchestrator.id" == "${ORCH_NAME}") and (...)to narrow the filter to this run's PRs only.
Wake narration (MANDATORY, CTL-369). Every Monitor wake — including ones classified as routine
or already-addressed — must produce a single short line of assistant text before returning to the
wait. The Claude Code harness wraps each Monitor stdout line in a <task-notification> XML user
message; if the orchestrator's response is end_turn with only thinking blocks and no text
content, the UI renders the next <task-notification>'s <task-id> as a phantom
Human:\n<task-id> line in the transcript. The narration line defeats that artifact and gives the
operator reading the transcript later a record of what fired and what was decided.
Line shape (pick one; keep it under ~120 characters):
wake: <event.name> #<pr> [interest=<type>] — <action being taken>
wake: <event.name> — routine, staying in event loop
wake: <event.name> — already addressed, no-op
wake: idle-timeout — running periodic reconciliation scan
Surface the matched interest when wake came from the broker (filter.wake.${ORCH_NAME}): include
.body.payload.interest_id (or its type: pr_lifecycle / ticket_lifecycle / comms_lifecycle)
and a one-clause restatement of .body.payload.reason. For broad-form catalyst-events tail wakes,
surface the raw event.name and the PR/ticket scope instead.
See plugins/dev/skills/monitor-events/SKILL.md § Narration for the full rule and the good-vs-bad
transcript fixture.
Wake-up classification. When a line arrives on the Monitor, classify it before re-entering the
scan so the response stays proportional. Every reaction reads authoritative state from gh pr view,
git rev-list, or the signal file — events are wake-up triggers, never sources of truth.
| Event | Reaction |
| ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| orchestrator.worker.phase_advanced | Routine in-flight progress; re-render DASHBOARD.md. Coalesced — .body.payload.changes carries the batch (CTL-229) |
| orchestrator.worker.status_terminal, orchestrator.worker.done, orchestrator.worker.failed | Terminal transition; re-render DASHBOARD.md and run orchestrate-dispatch-next to fill freed slots. PR-bearing transitions carry .body.payload.pr.{number,url} (CTL-229) |
| orchestrator.worker.pr_created | Reconcile the PR number into signal/state; re-render DASHBOARD.md |
| orchestrator.attention.raised, orchestrator.attention.resolved | Re-render the DASHBOARD.md NEEDS ATTENTION banner |
| github.pr.merged, github.pr.closed | Run the merge-confirmation scan for that PR |
| github.pr.synchronize, github.push | Re-evaluate mergeStateStatus for the affected PR; if DIRTY ≥2 min, orchestrate-auto-rebase may dispatch a rebase worker (BEHIND auto-resolves via auto-merge) |
| github.check_*, github.workflow_run.completed | Re-check CI; if BLOCKED ≥10 min, orchestrate-auto-fixup may dispatch a fix-up. workflow_run.completed is the most reliable CI-done signal |
| github.pr_review*, github.pr_review_comment*, github.issue_comment.created | Re-evaluate mergeStateStatus; surface review activity on the dashboard. Codex review threads land as pr_review_comment.created — required for CTL-64 BLOCKED auto-fixup detection |
| github.deployment* | Record deploy outcome on the worker's signal file |
| linear.issue.state_changed | Reconcile Linear state with the worker signal |
| filter.wake.${ORCH_NAME} (matched on full dotted event.name) | Daemon-filtered semantic wake: read .body.payload.reason for log context, then run the full reactive scan. The reason describes what triggered the daemon (e.g., "CI failed on PR #416") but is never the authoritative source |
| phase.<name>.complete.<TICKET> (via phase_lifecycle, CTL-452) | Resolve the next phase via orchestrate-phase-advance --ticket <T> --completed-phase <name>; that helper looks up the next phase in the canonical 9-phase sequence and calls orchestrate-dispatch-next --phase <next> --ticket <T>. If completed-phase=monitor-deploy, no advance (terminal). The advance is idempotent under redundant wakes |
| phase.<name>.failed.<TICKET> (via phase_lifecycle, CTL-452) | Run orchestrate-revive once for the affected ticket; on the second failure (reviveCount ≥ MAX_REVIVES), mark worker stalled and post attention to the shared comms channel. Matches the existing one-retry-then-escalate handling for legacy oneshot workers |
| phase.<name>.turn-cap-exhausted.<TICKET> (via phase_lifecycle, CTL-484) | Run orchestrate-revive (same script — its continuation branch handles this status). The branch reads handoffPath from the per-phase signal, dispatches a claude --bg --resume continuation with CATALYST_IS_CONTINUATION=true + CATALYST_HANDOFF_PATH + CATALYST_CONTINUATION_COUNT, and bumps .continuationCount on a budget separate from .reviveCount (default 3). On budget exhaustion: stalled + attentionReason="continuation-budget-exhausted" |
| phase.<name>.skipped.<TICKET> (via phase_lifecycle, CTL-512) | Same as complete: resolve via orchestrate-phase-advance --ticket <T> --completed-phase <name>. Only emitted by phase-monitor-deploy when no deployment_status event arrived before PHASE_DEPLOY_TIMEOUT_SEC. Because completed-phase=monitor-deploy, the advance no-ops (terminal); the wake's purpose is to free the wave slot via the scheduler's in-flight predicate (scheduler.mjs:isTicketInFlight) |
| 10-minute idle (no event) | Run the full reactive scan as a safety net |
Ground truth is git + PR, not the signal file. The signal file is advisory — it reports the
worker's self-described phase. Authoritative decisions (done, stalled) come from gh pr view /
gh pr list --head <branch> and git rev-list --count <base>..<branch>. A merged upstream PR on a
worker's branch means the worker is done, regardless of what the signal file says. A worker with a
live upstream PR is not stalled even if its signal file is stale. When the signal disagrees with
git/PR, the orchestrator reconciles the signal from the authoritative source.
Reactive scan (per wake-up):
# CTL-511: re-run the phase-agent stall scan on every reactive scan (every wake
# + the 10-min idle fallback), not just once per dispatch wave. A phase agent
# that dies after launch is flipped to status:"stalled" here; orchestrate-healthcheck
# then emits phase.<name>.failed to wake orchestrate-revive within one scan
# interval instead of ~14 h. Idempotent — terminal/stalled signals are skipped,
# so frequent runs are safe. --grace-seconds 0 skips the 15s legacy-PID settle
# sleep: the phase-mode state.json check has its own --stale-bg-seconds
# threshold, so the sleep would only add latency to every wake.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-healthcheck" \
--orch-dir "${ORCH_DIR}" --orch-id "${ORCH_NAME}" --grace-seconds 0 \
>/dev/null 2>&1 || true
# For each active worker:
for WORKER_SIGNAL in ${ORCH_DIR}/workers/*.json; do
TICKET=$(jq -r '.ticket' "$WORKER_SIGNAL")
WORKER_DIR="${WORKTREE_BASE}/${ORCH_NAME}-${TICKET}"
# 1. Read worker signal file for self-reported status
STATUS=$(jq -r '.status' "$WORKER_SIGNAL")
# 2. Check git state in worker worktree
cd "$WORKER_DIR"
BRANCH=$(git branch --show-current)
COMMIT_COUNT=$(git rev-list --count "${BASE_BRANCH}..HEAD" 2>/dev/null || echo 0)
# 3. Check for PR
PR_URL=$(gh pr list --head "$BRANCH" --json url --jq '.[0].url' 2>/dev/null || echo "")
# 4. If PR exists, check CI
if [ -n "$PR_URL" ]; then
CI_STATUS=$(gh pr checks "$BRANCH" --json state --jq '.[].state' 2>/dev/null | sort -u)
fi
# 5. Update dashboard
done
Update DASHBOARD.md on each wake-up using the dashboard template — every incoming event
re-renders the file. The orch-monitor daemon file-watches DASHBOARD.md and forwards changes to
connected UI clients via SSE, so per-event writes propagate to operators immediately. Include:
Update state.json with machine-readable state for crash recovery.
Update global state and heartbeat after each wake-up:
# Heartbeat — proves the orchestrator is alive
"$STATE_SCRIPT" heartbeat "${ORCH_NAME}"
# Sync each worker's status from signal file to global state
for WORKER_SIGNAL in ${ORCH_DIR}/workers/*.json; do
TICKET=$(jq -r '.ticket' "$WORKER_SIGNAL")
W_STATUS=$(jq -r '.status' "$WORKER_SIGNAL")
W_PHASE=$(jq -r '.phase' "$WORKER_SIGNAL")
W_BRANCH=$(git -C "${WORKTREE_BASE}/${ORCH_NAME}-${TICKET}" branch --show-current 2>/dev/null || echo "")
W_PR=$(jq -c '.pr // null' "$WORKER_SIGNAL")
"$STATE_SCRIPT" worker "${ORCH_NAME}" "${TICKET}" \
".status = \"${W_STATUS}\" | .phase = ${W_PHASE} | .branch = \"${W_BRANCH}\" | .pr = ${W_PR}"
done
# Re-render DASHBOARD.md so the file artifact reflects current state (CTL-230).
# `--roll-usage` first iterates ${ORCH_DIR}/workers/*.json and invokes
# orchestrate-roll-usage.sh -v per worker (CTL-115, CTL-233, CTL-487) — bounded
# and idempotent (no-op when signal.cost is already populated) and writes the
# audit trail to ${ORCH_DIR}/.roll-usage.log. Then renders the dashboard so the
# refreshed signal.cost values appear in the same pass. Same inputs produce a
# byte-identical file. The orch-monitor daemon file-watches DASHBOARD.md and
# forwards changes to UI clients via SSE.
"${CLAUDE_PLUGIN_ROOT}/scripts/update-dashboard.sh" \
--orch "${ORCH_NAME}" --orch-dir "${ORCH_DIR}" --roll-usage \
>/dev/null 2>>"${ORCH_DIR}/.update-dashboard.log" || true
Merge confirmation fallback (CTL-31, refined by CTL-80, CTL-133, CTL-243, CTL-252):
Workers now exit at status: "done" after actively merging their own PR (CTL-252). The
orchestrator's merge confirmation scan is a safety-net fallback for workers that stalled or
crashed before completing their own merge. Every Monitor wake-up triggered by github.pr.merged,
github.pr.closed, github.push, or github.check_suite.completed runs this scan, and the
10-minute idle fallback re-runs it so daemon-down windows do not block indefinitely. For each worker
whose signal shows pr.number but not yet pr.mergedAt, ping GitHub directly:
for WORKER_SIGNAL in ${ORCH_DIR}/workers/*.json; do
TICKET=$(jq -r '.ticket' "$WORKER_SIGNAL")
W_STATUS=$(jq -r '.status' "$WORKER_SIGNAL")
PR_NUMBER=$(jq -r '.pr.number // empty' "$WORKER_SIGNAL")
PR_URL=$(jq -r '.pr.url // empty' "$WORKER_SIGNAL")
MERGED_AT=$(jq -r '.pr.mergedAt // empty' "$WORKER_SIGNAL")
# Skip terminal failure states and already-reconciled merges early.
[ -n "$MERGED_AT" ] && continue
[ "$W_STATUS" = "failed" ] && continue
[ "$W_STATUS" = "stalled" ] && continue
# If the signal does not have a PR number, try to discover one from the worker's
# branch. This catches workers that merged their PR but died before writing
# pr.number to their signal file (the ADV-224 class of failure — CTL-32).
if [ -z "$PR_NUMBER" ]; then
WORKER_DIR="${WORKTREE_BASE}/${ORCH_NAME}-${TICKET}"
BRANCH=$(git -C "$WORKER_DIR" branch --show-current 2>/dev/null || echo "")
[ -z "$BRANCH" ] && continue
REPO_SLUG=$(git -C "$WORKER_DIR" remote get-url origin 2>/dev/null \
| sed -E 's|.*github\.com[:/]([^/]+/[^/.]+)(\.git)?$|\1|')
[ -z "$REPO_SLUG" ] && continue
DISCOVERED=$(gh -R "$REPO_SLUG" pr list \
--head "$BRANCH" --state all \
--json number,state,mergedAt,url --limit 1 2>/dev/null || echo "[]")
PR_NUMBER=$(echo "$DISCOVERED" | jq -r '.[0].number // empty')
PR_URL=$(echo "$DISCOVERED" | jq -r '.[0].url // empty')
[ -z "$PR_NUMBER" ] && continue
# Record the discovery in the signal so future wake-ups take the fast path.
jq --argjson n "$PR_NUMBER" --arg u "$PR_URL" \
'.pr = ((.pr // {}) | .number = ($n | tonumber) | .url = $u)' \
"$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" && mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
# CTL-341: refresh handled by the unconditional refresh block after this
# loop. The signal file update above (.pr.number = $n) feeds into it.
fi
# Parse repo from PR URL (e.g. https://github.com/org/repo/pull/123)
REPO=$(echo "$PR_URL" | sed -E 's|https://github.com/([^/]+/[^/]+)/pull/.*|\1|')
# Ask GitHub the authoritative question
PR_JSON=$(gh -R "$REPO" pr view "$PR_NUMBER" \
--json state,mergeStateStatus,mergedAt,mergeable,mergedBy,mergeCommit 2>/dev/null || echo '{}')
PR_STATE=$(echo "$PR_JSON" | jq -r '.state // "UNKNOWN"')
MERGE_STATE=$(echo "$PR_JSON" | jq -r '.mergeStateStatus // "UNKNOWN"')
PR_MERGED_AT=$(echo "$PR_JSON" | jq -r '.mergedAt // empty')
MERGE_COMMIT_SHA=$(echo "$PR_JSON" | jq -r '.mergeCommit.oid // empty')
# CTL-211: load per-repo deploy verification config. When
# skipDeployVerification is true (default for repos without GitHub
# Deployments), keep today's behavior — MERGED → done. When false, MERGED →
# merged, then the deploy sub-loop below drives merged → deploying →
# done | deploy-failed via deployment_status events.
SKIP_DEPLOY=$(jq -r --arg repo "$REPO" \
'.catalyst.deploy[$repo].skipDeployVerification // true' "$CONFIG_FILE" 2>/dev/null)
PROD_ENV=$(jq -r --arg repo "$REPO" \
'.catalyst.deploy[$repo].productionEnvironment // "production"' "$CONFIG_FILE" 2>/dev/null)
DEPLOY_TIMEOUT_SEC=$(jq -r --arg repo "$REPO" \
'.catalyst.deploy[$repo].timeoutSec // 1800' "$CONFIG_FILE" 2>/dev/null)
case "$PR_STATE" in
MERGED)
if [ "$SKIP_DEPLOY" != "false" ]; then
# Today's behavior: MERGED → done immediately. The repo doesn't emit
# GitHub Deployments, or deploy verification is opted out per-repo.
TARGET_STATUS="done"
TARGET_PHASE=6
else
# CTL-211: MERGED → merged. The deploy state-machine sub-loop below
# advances it to deploying → done|deploy-failed on the SHA's
# deployment_status events.
TARGET_STATUS="merged"
TARGET_PHASE=5
fi
# Record merge in signal + global state, advance worker to TARGET_STATUS
jq --arg ts "$PR_MERGED_AT" --arg sha "$MERGE_COMMIT_SHA" \
--arg status "$TARGET_STATUS" --argjson phase "$TARGET_PHASE" \
'.pr.ciStatus = "merged" | .pr.mergedAt = $ts | .status = $status
| (if $status == "done" then .completedAt = $ts | .phaseTimestamps.done = $ts else . end)
| .phase = $phase
| (if $sha != "" then .pr.mergeCommitSha = $sha else . end)
| (if $status == "merged" then .deploy = ((.deploy // {}) | .startedAt = $ts | .environment = "'"$PROD_ENV"'") else . end)' \
"$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" && mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
"$STATE_SCRIPT" worker "${ORCH_NAME}" "${TICKET}" \
".status = \"${TARGET_STATUS}\" | .phase = ${TARGET_PHASE} | .pr.ciStatus = \"merged\" | .pr.mergedAt = \"${PR_MERGED_AT}\""
"$STATE_SCRIPT" event "$(jq -nc \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg orch "${ORCH_NAME}" \
--arg w "${TICKET}" --argjson pr "$PR_NUMBER" --arg mt "$PR_MERGED_AT" \
'{ts:$ts, orchestrator:$orch, worker:$w, event:"worker-pr-merged", detail:{pr:$pr, mergedAt:$mt}}')"
# Transition Linear ticket via the shared helper (CTL-69). The helper
# reads stateMap from `.catalyst/config.json`, is idempotent, and
# respects --state-on-merge when the operator wants a non-default
# state (e.g., "Shipped"). Since CTL-252, workers write their own done-
# transition; this runs only as a fallback for stalled/crashed workers.
#
# CTL-211: only transition Linear to done when TARGET_STATUS is "done"
# (skipDeployVerification=true). When TARGET_STATUS is "merged", the
# deploy state-machine sub-loop below transitions Linear after the
# production deployment_status.success arrives.
if [ "$TARGET_STATUS" = "done" ]; then
STATE_ON_MERGE_FLAG=""
if [ -n "${STATE_ON_MERGE:-}" ]; then
STATE_ON_MERGE_FLAG="--state ${STATE_ON_MERGE}"
fi
"${CLAUDE_PLUGIN_ROOT}/scripts/linear-transition.sh" \
--ticket "${TICKET}" \
--transition done \
--config "$CONFIG_FILE" \
${STATE_ON_MERGE_FLAG} >/dev/null 2>&1 || true
fi
# Pull latest main in the primary worktree (CTL-198). Non-fatal.
"${CLAUDE_PLUGIN_ROOT}/scripts/pull-primary-worktree.sh" \
--branch "${BASE_BRANCH:-main}" 2>&1 || true
# Post-merge verification (CTL-130). Run adversarial verification on
# the merged commit. The worker auto-merges independently so verification
# is always post-merge — it surfaces gaps for remediation rather than
# gating merge. Skipped when verifyBeforeMerge is false.
if [ "$VERIFY_BEFORE_MERGE" = "true" ]; then
WORKER_DIR="${WORKTREE_BASE}/${ORCH_NAME}-${TICKET}"
if [ -d "$WORKER_DIR" ]; then
TEST_REQ=$(jq -r --arg scope "$(jq -r '.labels // "" | ascii_downcase' "$WORKER_SIGNAL")" \
'.catalyst.orchestration.testRequirements[$scope] // "backend"' "$CONFIG_FILE" 2>/dev/null || echo "backend")
"$STATE_SCRIPT" event "$(jq -nc \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg orch "${ORCH_NAME}" --arg w "${TICKET}" \
'{ts:$ts, orchestrator:$orch, worker:$w, event:"verification-started", detail:null}')"
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-verify.sh" \
--worktree "$WORKER_DIR" \
--ticket "$TICKET" \
--base-branch "${BASE_BRANCH:-main}" \
--signal-file "$WORKER_SIGNAL" \
--test-requirements "$TEST_REQ"
VERIFY_EXIT=$?
VERIFY_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
VERIFY_RESULT=$([ $VERIFY_EXIT -eq 0 ] && echo "passed" || echo "failed")
jq --arg result "$VERIFY_RESULT" --arg ts "$VERIFY_TS" \
'.postMergeVerification = {result: $result, verifiedAt: $ts, remediationTicket: null}' \
"$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" && mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
if [ $VERIFY_EXIT -eq 0 ]; then
"$STATE_SCRIPT" event "$(jq -nc \
--arg ts "$VERIFY_TS" --arg orch "${ORCH_NAME}" --arg w "${TICKET}" \
'{ts:$ts, orchestrator:$orch, worker:$w, event:"verification-passed", detail:null}')"
"$STATE_SCRIPT" resolve-attention "${ORCH_NAME}" "${TICKET}"
else
"$STATE_SCRIPT" event "$(jq -nc \
--arg ts "$VERIFY_TS" --arg orch "${ORCH_NAME}" --arg w "${TICKET}" \
'{ts:$ts, orchestrator:$orch, worker:$w, event:"verification-failed", detail:null}')"
"$STATE_SCRIPT" attention "${ORCH_NAME}" "post-merge-verification-failed" "${TICKET}" \
"Post-merge verification found gaps in ${TICKET} — remediation needed"
# Write remediation file for human visibility
cat > "${ORCH_DIR}/workers/${TICKET}-remediation.md" <<REMEDIATION_EOF
# Post-Merge Verification Failed — ${TICKET}
PR #${PR_NUMBER} merged but independent verification found gaps.
Review the verification output above and file a follow-up ticket.
The orchestrator will $([ "$ALLOW_SELF_REPORTED" = "true" ] && echo "advance the wave (advisory mode)" || echo "block wave advancement until remediation is filed").
REMEDIATION_EOF
fi
fi
fi
;;
CLOSED)
# PR was closed without merge — surface for attention
"$STATE_SCRIPT" attention "${ORCH_NAME}" "pr-closed" "${TICKET}" \
"PR #${PR_NUMBER} was closed without merging"
;;
OPEN)
# Not merged yet — this is normal. Stay in the event-driven loop: the next
# github.push (auto-merge rebase or worker fixup), github.check_suite.completed
# (CI flip), or github.pr_review.submitted event will retrigger this scan
# with fresh state. Only raise attention for genuinely stuck states that a
# worker cannot unblock (CLEAN=pass, BLOCKED=review/CI gating, UNSTABLE=CI
# failed, BEHIND=needs rebase, DIRTY=conflicts).
case "$MERGE_STATE" in
DIRTY)
# Out-of-band: orchestrate-auto-rebase runs after this scan and handles
# DIRTY (merge conflicts) by dispatching a rebase worker once the state
# has been stable for ≥2 minutes (CTL-232). Falls back to attention only
# when the rebase budget is exhausted.
;;
BEHIND)
# Often auto-resolves when auto-merge rebases; log only. The next
# github.push event on the PR branch will wake the orchestrator to
# re-evaluate mergeStateStatus.
;;
BLOCKED)
# Out-of-band: orchestrate-auto-fixup runs after this scan and handles
# BLOCKED (unresolved review threads, failing checks, review-required)
# once the state has been stable for ≥10 minutes (CTL-64).
;;
esac
;;
esac
done
Refresh broker registration when PR or ticket set has changed (CTL-341, CTL-357, CTL-491). On
every Phase 4 wake-up, re-invoke the same registration helper with --refresh. The helper diffs the
current active PR + ticket sets against the prior .last-registration.json baseline:
pr_lifecycle, ticket_lifecycle, comms_lifecycle) with the updated set.phase_lifecycle
interest for each newly-added ticket. Existing tickets already have per-ticket interests upserted
at the broker from the initial pre-dispatch registration.interest_id); the diff just keeps the event log quiet.This block replaces the older .last-pr-registration.json PR-only refresh path — the helper covers
both deterministic-set refreshes AND the previously-missing phase_lifecycle refresh that would
otherwise leave wave 2 tickets without broker coverage (a localized CTL-491 race).
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-register-interests.sh" \
--orch-dir "${ORCH_DIR}" \
--orch-id "${ORCH_NAME}" \
--config "${REPO_ROOT:-$(git rev-parse --show-toplevel 2>/dev/null || echo .)}/.catalyst/config.json" \
--refresh
# In --refresh mode the helper is a no-op when the PR + ticket sets are
# unchanged; otherwise it emits a minimal set of filter.register events to
# bring the broker in sync. See plugins/dev/scripts/orchestrate-register-interests.sh.
Deploy state-machine sub-loop (CTL-211) — runs on each wake-up for any worker in merged or
deploying. Wakes on github.deployment* events from the event log; otherwise the 10-minute
fallback sweep catches missed events. The authoritative source is gh api repos/<repo>/deployments
and /deployments/<id>/statuses. Events are wake-up triggers only.
for WORKER_SIGNAL in ${ORCH_DIR}/workers/*.json; do
W_STATUS=$(jq -r '.status' "$WORKER_SIGNAL")
case "$W_STATUS" in
merged|deploying|deploy-failed) ;;
*) continue ;;
esac
TICKET=$(jq -r '.ticket' "$WORKER_SIGNAL")
PR_URL=$(jq -r '.pr.url // empty' "$WORKER_SIGNAL")
MERGE_SHA=$(jq -r '.pr.mergeCommitSha // empty' "$WORKER_SIGNAL")
REPO=$(echo "$PR_URL" | sed -E 's|https://github.com/([^/]+/[^/]+)/pull/.*|\1|')
PROD_ENV=$(jq -r --arg repo "$REPO" \
'.catalyst.deploy[$repo].productionEnvironment // "production"' "$CONFIG_FILE")
TIMEOUT_SEC=$(jq -r --arg repo "$REPO" \
'.catalyst.deploy[$repo].timeoutSec // 1800' "$CONFIG_FILE")
STARTED_AT=$(jq -r '.deploy.startedAt // empty' "$WORKER_SIGNAL")
FAILED_ATTEMPTS=$(jq -r '.deploy.failedAttempts // 0' "$WORKER_SIGNAL")
# 1. Hard timeout — escalate via comms.attention, set status=stalled.
if [ -n "$STARTED_AT" ]; then
NOW_EPOCH=$(date -u +%s)
START_EPOCH=$(date -u -j -f "%Y-%m-%dT%H:%M:%SZ" "$STARTED_AT" +%s 2>/dev/null \
|| date -u -d "$STARTED_AT" +%s)
ELAPSED=$((NOW_EPOCH - START_EPOCH))
if [ "$ELAPSED" -gt "$TIMEOUT_SEC" ]; then
jq '.status = "stalled"' "$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" \
&& mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
"$STATE_SCRIPT" attention "${ORCH_NAME}" "deploy-timeout" "${TICKET}" \
"Deploy verification timed out after ${TIMEOUT_SEC}s for ${REPO}@${MERGE_SHA}"
continue
fi
fi
# 2. Authoritative deploy lookup. Fetch the most recent deployment_status
# for the merge SHA on the production environment.
[ -z "$MERGE_SHA" ] && continue
DEPLOY_JSON=$(gh api -X GET "/repos/${REPO}/deployments" \
-f sha="$MERGE_SHA" -f environment="$PROD_ENV" --jq '.[0] // empty' 2>/dev/null || echo "")
[ -z "$DEPLOY_JSON" ] && continue
DEPLOY_ID=$(echo "$DEPLOY_JSON" | jq -r '.id // empty')
[ -z "$DEPLOY_ID" ] && continue
STATUS_JSON=$(gh api "/repos/${REPO}/deployments/${DEPLOY_ID}/statuses" \
--jq '.[0] // empty' 2>/dev/null || echo "")
DEPLOY_STATE=$(echo "$STATUS_JSON" | jq -r '.state // "pending"')
case "$DEPLOY_STATE" in
success)
jq --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --argjson did "$DEPLOY_ID" \
'.status = "done" | .phase = 6 | .completedAt = $ts | .phaseTimestamps.done = $ts
| .deploy.completedAt = $ts | .deploy.deploymentId = $did | .deploy.result = "success"' \
"$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" && mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
"$STATE_SCRIPT" worker "${ORCH_NAME}" "${TICKET}" \
".status = \"done\" | .phase = 6"
"$STATE_SCRIPT" event "$(jq -nc --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--arg orch "${ORCH_NAME}" --arg w "${TICKET}" --argjson did "$DEPLOY_ID" \
'{ts:$ts, orchestrator:$orch, worker:$w, event:"worker-deploy-success", detail:{deploymentId:$did}}')"
# Transition Linear → done now that deploy succeeded.
STATE_ON_MERGE_FLAG=""
[ -n "${STATE_ON_MERGE:-}" ] && STATE_ON_MERGE_FLAG="--state ${STATE_ON_MERGE}"
"${CLAUDE_PLUGIN_ROOT}/scripts/linear-transition.sh" \
--ticket "${TICKET}" --transition done --config "$CONFIG_FILE" \
${STATE_ON_MERGE_FLAG} >/dev/null 2>&1 || true
;;
failure|error)
NEW_ATTEMPTS=$((FAILED_ATTEMPTS + 1))
MAX_ATTEMPTS=3
jq --argjson n "$NEW_ATTEMPTS" --argjson did "$DEPLOY_ID" --arg state "$DEPLOY_STATE" \
'.status = "deploy-failed" | .deploy.failedAttempts = $n | .deploy.deploymentId = $did
| .deploy.lastFailureState = $state' \
"$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" && mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
if [ "$NEW_ATTEMPTS" -ge "$MAX_ATTEMPTS" ]; then
"$STATE_SCRIPT" attention "${ORCH_NAME}" "deploy-budget-exhausted" "${TICKET}" \
"Production deploy ${DEPLOY_STATE} ${NEW_ATTEMPTS}× for ${REPO}@${MERGE_SHA} — manual intervention required"
else
"$STATE_SCRIPT" attention "${ORCH_NAME}" "deploy-failed" "${TICKET}" \
"Production deploy ${DEPLOY_STATE} (attempt ${NEW_ATTEMPTS}/${MAX_ATTEMPTS}) for ${REPO}@${MERGE_SHA}"
fi
"$STATE_SCRIPT" event "$(jq -nc --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--arg orch "${ORCH_NAME}" --arg w "${TICKET}" --arg state "$DEPLOY_STATE" \
--argjson did "$DEPLOY_ID" --argjson att "$NEW_ATTEMPTS" \
'{ts:$ts, orchestrator:$orch, worker:$w, event:"worker-deploy-failed", detail:{deploymentId:$did, state:$state, attempts:$att}}')"
;;
in_progress|pending|queued)
# Advance status only if not already deploying.
if [ "$W_STATUS" = "merged" ]; then
jq --argjson did "$DEPLOY_ID" \
'.status = "deploying" | .deploy.deploymentId = $did' \
"$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" && mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
"$STATE_SCRIPT" worker "${ORCH_NAME}" "${TICKET}" '.status = "deploying"'
fi
;;
esac
done
The state-machine transition logic is mirrored in
plugins/dev/scripts/orch-monitor/lib/deploy-state-machine.ts as a pure function
(nextDeployState) so the transitions are mechanically verified by unit tests independently of this
bash glue.
Since CTL-252, workers exit at status: "done" after actively merging their own PR — this
orchestrator scan is a safety-net fallback that writes pr.mergedAt + status: "done" for workers
that stalled before completing their own merge.
Drain shared comms channel for attention (CTL-111, CTL-269):
Workers post type:attention messages to ${ORCH_NAME} when blocked. On each wake-up, the
orchestrator drains new messages from the channel and promotes any attention to a state-level
attention item so the dashboard's NEEDS ATTENTION banner surfaces it (with author + reason).
The wake mechanism for comms attention is now unified with the filter daemon (CTL-269): when a
worker posts to comms, catalyst-comms send emits a comms.message.posted event to the unified
event log; the filter daemon matches it against the orchestrator's filter.register prompt (which
now mentions "any of my workers posts a comms message of type attention to me") and emits
filter.wake.${ORCH_NAME}. The orchestrator wakes from that single event and runs this drain step
to act on the attention.
A small cursor file ${ORCH_DIR}/.comms-cursor tracks the line count already processed so repeated
wake-ups don't re-surface the same message. Single-writer (this scan) so no race. The cursor-based
drain remains the action mechanism even though wakes come via filter.wake.
if [ -n "$COMMS_BIN" ]; then
CURSOR_FILE="${ORCH_DIR}/.comms-cursor"
SINCE=$(cat "$CURSOR_FILE" 2>/dev/null || echo 0)
CH_FILE="${HOME}/catalyst/comms/channels/${ORCH_NAME}.jsonl"
TOTAL=$(wc -l < "$CH_FILE" 2>/dev/null | tr -d ' ' || echo 0)
if [ "${TOTAL:-0}" -gt "${SINCE:-0}" ]; then
# `poll` here is the catalyst-comms CLI subcommand name (read since cursor),
# not a poll-loop metaphor — the orchestrator runs this on wake-up only.
"$COMMS_BIN" poll "${ORCH_NAME}" --since "$SINCE" 2>/dev/null | \
while IFS= read -r MSG; do
MSG_TYPE=$(echo "$MSG" | jq -r '.type // ""' 2>/dev/null)
MSG_FROM=$(echo "$MSG" | jq -r '.from // ""' 2>/dev/null)
MSG_BODY=$(echo "$MSG" | jq -r '.body // ""' 2>/dev/null)
if [ "$MSG_TYPE" = "attention" ]; then
# Extract the ticket id from the author name (workers use their TICKET_ID as --as)
MSG_TICKET=$(echo "$MSG_FROM" | grep -oE '^[A-Z]+-[0-9]+' || echo "$MSG_FROM")
"$STATE_SCRIPT" attention "${ORCH_NAME}" "comms-attention" "$MSG_TICKET" \
"[$MSG_FROM] $MSG_BODY" 2>/dev/null || true
fi
done
echo "$TOTAL" > "$CURSOR_FILE"
fi
fi
Detect stalled workers and raise attention:
Before raising stalled, consult git + PR state. A stale signal file is not stall evidence on its
own — if the worker's upstream branch has an OPEN or MERGED PR, the worker is progressing (or
finished) regardless of what the signal file says. Only escalate when no authoritative source shows
activity.
for WORKER_SIGNAL in ${ORCH_DIR}/workers/*.json; do
TICKET=$(jq -r '.ticket' "$WORKER_SIGNAL")
W_STATUS=$(jq -r '.status' "$WORKER_SIGNAL")
UPDATED=$(jq -r '.updatedAt' "$WORKER_SIGNAL")
# If no update in 15+ minutes and not in a terminal state, consider escalating.
if [[ "$W_STATUS" != "done" && "$W_STATUS" != "failed" && "$W_STATUS" != "stalled" ]]; then
STALE_CUTOFF=$(date -u -v-15M +%Y-%m-%dT%H:%M:%SZ 2>/dev/null \
|| date -u -d "15 minutes ago" +%Y-%m-%dT%H:%M:%SZ)
if [[ "$UPDATED" < "$STALE_CUTOFF" ]]; then
# Before raising stalled, consult git + PR state. CTL-32: a stale signal
# on its own is not stall evidence when the PR shows real progress.
WORKER_DIR="${WORKTREE_BASE}/${ORCH_NAME}-${TICKET}"
BRANCH=$(git -C "$WORKER_DIR" branch --show-current 2>/dev/null || echo "")
COMMITS_AHEAD=0
HAS_UPSTREAM=0
PR_STATE="NONE"
if [ -n "$BRANCH" ]; then
COMMITS_AHEAD=$(git -C "$WORKER_DIR" rev-list --count \
"${BASE_BRANCH}..HEAD" 2>/dev/null || echo 0)
if git -C "$WORKER_DIR" ls-remote --heads origin "$BRANCH" 2>/dev/null | grep -q .; then
HAS_UPSTREAM=1
fi
REPO_SLUG=$(git -C "$WORKER_DIR" remote get-url origin 2>/dev/null \
| sed -E 's|.*github\.com[:/]([^/]+/[^/.]+)(\.git)?$|\1|')
if [ -n "$REPO_SLUG" ]; then
PR_STATE=$(gh -R "$REPO_SLUG" pr list --head "$BRANCH" --state all \
--json state --jq '.[0].state // "NONE"' 2>/dev/null || echo "NONE")
fi
fi
case "$PR_STATE" in
MERGED|OPEN)
# Worker's PR is the authoritative progress signal. Clear any prior
# stalled attention that an earlier wake-up may have raised on signal
# staleness alone (the merge-confirmation scan will reconcile to done if MERGED).
"$STATE_SCRIPT" resolve-attention "${ORCH_NAME}" "${TICKET}" 2>/dev/null || true
;;
*)
"$STATE_SCRIPT" attention "${ORCH_NAME}" "stalled" "${TICKET}" \
"No progress for 15+ minutes (last update: ${UPDATED}); branch=${BRANCH:-?} commits=${COMMITS_AHEAD} pushed=${HAS_UPSTREAM} pr=${PR_STATE}"
;;
esac
fi
fi
done
Auto-revive dead/wedged workers (CTL-63, CTL-62):
After the stalled-worker scan, attempt to resume any dead, heartbeat-stale, or
API-stream-idle-timeout'd worker from its original session_id. Resumed sessions preserve tool-call
history, plan context, and PR state at ~10× lower cost than a fresh redispatch.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-revive" \
--orch-dir "$ORCH_DIR" \
--orch-id "$ORCH_NAME"
The script checks every non-terminal worker signal and revives when any of:
kill -0 <pid> fails (CTL-63)lastHeartbeat older than 15 minutes (catches zombie-sleep PIDs whose
process is alive but idle) (CTL-63)workers/output/<ticket>-stream.jsonl contains a
type=result, is_error=true event whose api_error_status or result mentions
Stream idle timeout or partial response received, and whose uuid differs from the signal's
lastApiErrorUuid (CTL-62)Each successful revive records lastReviveReason (pid-dead / heartbeat-stale /
api-stream-idle-timeout) in the signal file and emits a worker-revived event with the same
reason in its detail. The per-ticket revive budget (default 10) applies across all reasons combined.
Workers whose budget is exhausted or whose session_id cannot be found transition to status=stalled
with an attention item so you can decide between manual intervention and a fresh redispatch. Session
resume uses workers/output/<ticket>-stream.jsonl (with legacy / transcript fallbacks) to find the
original session_id.
Auto-resolve already-fixed bot review threads (CTL-378):
After revive and before auto-fixup, a pass resolves unresolved bot review threads that the
worker's last pushed commit already addressed — closing the gap where a fix-up worker pushes a
correct fix but dies before calling resolveReviewThread, which would otherwise make auto-fixup
dispatch a wasteful second worker.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-resolve-fixed-threads" \
--orch-dir "$ORCH_DIR" \
--orch-id "$ORCH_NAME"
It reads the shared blockedSince clock (written by auto-fixup) and acts only on PRs stably BLOCKED
for --stable-minutes (default 10). A thread is auto-resolved only when ALL hold: its author is a
bot (__typename == "Bot"); the PR's last pushed commit touches the thread's file; and that commit
landed after the thread's comment. Human-authored threads are never auto-resolved. After resolving
≥1 thread it re-checks the authoritative merge state via REST (repos/{owner}/{repo}/pulls/{n}).
| Field | Written by | Purpose |
| --------------------- | --------------------------------- | --------------------------------------------- |
| resolvedThreadCount | orchestrate-resolve-fixed-threads | Cumulative count of auto-resolved bot threads |
| threadsResolvedAt | orchestrate-resolve-fixed-threads | Timestamp of the most recent auto-resolution |
Because this runs before auto-fixup on the same cycle, any thread it resolves disappears from
auto-fixup's unresolved-thread count, so auto-fixup will not classify the PR as threads-unresolved
or consume a fixupAttempts budget slot for work that is already done.
Auto-dispatch fix-up workers for BLOCKED PRs (CTL-64):
After resolve-fixed-threads, a further pass detects PRs stuck in state=OPEN, mergeStateStatus=BLOCKED and either auto-dispatches orchestrate-fixup or escalates to an attention
item depending on the cause.
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-auto-fixup" \
--orch-dir "$ORCH_DIR" \
--orch-id "$ORCH_NAME"
The script records blockedSince on the worker signal the first time it observes BLOCKED, then —
once the state has been stable for --stable-minutes (default 10) — classifies the cause via
gh pr view and an api graphql query for unresolved review threads:
| Classification | Trigger | Action |
| -------------------- | --------------------------------------------------- | ------------------------------------------------------------------------ |
| ci-running | any check status ∈ {IN_PROGRESS, QUEUED, PENDING} | defer — try again next tick |
| checks-failing | any check conclusion ∈ {FAILURE, TIMED_OUT, …} | raise checks-failing attention (worker's own loop / revive handles it) |
| threads-unresolved | checks pass AND unresolved review threads exist | dispatch orchestrate-fixup with --issues composed from thread bodies |
| review-required | checks pass AND reviewDecision = REVIEW_REQUIRED | raise review-required attention (human must approve) |
| blocked-unknown | none of the above (rare — shape not yet classified) | raise blocked-unknown attention |
Each auto-dispatch bumps fixupAttempts on the signal. When fixupAttempts ≥ --max-fixups (default
2), the script raises fixup-budget-exhausted attention instead of dispatching again, so a human
can decide between manual intervention and abandonment.
Signal-file fields the script reads/writes:
| Field | Written by | Purpose |
| ----------------------- | ---------------------- | ------------------------------------------------------------ |
| blockedSince | orchestrate-auto-fixup | First observation of BLOCKED; cleared when PR leaves BLOCKED |
| fixupAttempts | orchestrate-auto-fixup | Auto-dispatch counter (max = --max-fixups) |
| lastFixupDispatchedAt | orchestrate-auto-fixup | Timestamp of the most recent dispatch (for the dashboard) |
Auto-dispatch rebase workers for DIRTY PRs (CTL-232):
After auto-fixup, a further pass detects PRs stuck in state=OPEN, mergeStateStatus=DIRTY (merge
conflicts that GitHub's auto-merge cannot resolve on its own — it can only handle BEHIND, never
DIRTY) and dispatches orchestrate-rebase to spawn a worker that rebases the PR branch onto current
base, resolves conflicts, and force-pushes (--force-with-lease).
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-auto-rebase" \
--orch-dir "$ORCH_DIR" \
--orch-id "$ORCH_NAME"
The script records dirtySince on the worker signal the first time it observes DIRTY, then — once
the state has been stable for --stable-minutes (default 2; DIRTY is unambiguous so the window is
much shorter than auto-fixup's 10) — reads baseRefName from gh pr view and dispatches
orchestrate-rebase --base-branch <ref>.
Each auto-dispatch bumps rebaseAttempts on the signal. When rebaseAttempts ≥ --max-rebases
(default 2), the script raises rebase-budget-exhausted attention instead of dispatching again. A
dispatch failure raises rebase-dispatch-failed attention.
Signal-file fields the script reads/writes:
| Field | Written by | Purpose |
| ------------------------ | ----------------------- | ---------------------------------------------------------- |
| dirtySince | orchestrate-auto-rebase | First observation of DIRTY; cleared when PR leaves DIRTY |
| rebaseAttempts | orchestrate-auto-rebase | Auto-dispatch counter (max = --max-rebases) |
| lastRebaseDispatchedAt | orchestrate-auto-rebase | Timestamp of the most recent dispatch (for the dashboard) |
| rebaseCommit | rebase worker | SHA of the rebased HEAD (written by the dispatched worker) |
Deregister from catalyst-broker (CTL-257, updated CTL-303): When all workers in the current wave
reach a terminal state and Phase 4 exits, emit filter.deregister so the daemon stops routing
events to this orchestrator:
if catalyst-broker status >/dev/null 2>&1 || catalyst-filter status >/dev/null 2>&1; then
"$STATE_SCRIPT" event "$(jq -nc \
--arg orch "${ORCH_NAME}" \
'{ts: (now | todate), event: "filter.deregister", orchestrator: $orch, worker: null, detail: null}')" \
2>/dev/null || true
fi
if [[ -n "${CATALYST_SESSION_ID:-}" && -x "$SESSION_SCRIPT" ]]; then
"$SESSION_SCRIPT" phase "$CATALYST_SESSION_ID" "verifying" --phase 5
fi
[[ -x "$ORCH_STATUS_SCRIPT" ]] && "$ORCH_STATUS_SCRIPT" emit \
--orch "${ORCH_NAME}" --phase reviewing --wave "${CURRENT_WAVE:-1}" \
--summary "wave ${CURRENT_WAVE:-1} post-merge verification" 2>/dev/null || true
Context (CTL-130, updated by CTL-252): Workers actively merge their own PRs via
gh pr merge --squash --delete-branch (no --auto) after the listen loop confirms CLEAN. The merge
happens the moment the worker's listen loop confirms CI passed and reviews are satisfied.
Verification therefore runs post-merge — it surfaces gaps for remediation rather than gating
merge.
The Phase 4 merge-confirmation scan (MERGED branch) already runs orchestrate-verify.sh on every
merged PR when verifyBeforeMerge is true (default). Phase 5 aggregates those results and handles
remediation for any failures.
Aggregation: For each worker in the current wave, read postMergeVerification.result from its
signal file. Three possible states:
"passed" — verification ran and succeeded, no action needed"failed" — verification ran and found gaps, remediation needednull — verification hasn't run yet (worker merged between wake-ups, worktree already cleaned
up, or verifyBeforeMerge is false)On failure — file a remediation ticket:
Since the code is already on main, the orchestrator cannot "send the worker back." Instead:
${ORCH_DIR}/workers/${TICKET_ID}-remediation.md (written by Phase
4 on verification failure)jq --arg ticket "$REMEDIATION_TICKET_ID" \
'.postMergeVerification.remediationTicket = $ticket' \
"$WORKER_SIGNAL" > "$WORKER_SIGNAL.tmp" && mv "$WORKER_SIGNAL.tmp" "$WORKER_SIGNAL"
Wave advancement gating (interacts with allowSelfReportedCompletion):
ALLOW_SELF_REPORTED is "false" (default) AND any worker has result: "failed": block wave
advancement until all remediation tickets are filed. The wave does not advance until every worker
either passes verification or has a filed remediation ticket.ALLOW_SELF_REPORTED is "true" AND any worker has result: "failed": log a warning, file
remediation tickets, but allow wave advancement to proceed. Verification is advisory.The verification script checks:
/scan-reward-hacking on changed files — check for as any,
@ts-ignore, void patterns, suppressed errors.Verification outcomes:
allowSelfReportedCompletion is false (until remediation ticket
is filed — not until it is resolved, which would be a future cycle's work)When ALL tickets in the current wave are merged and verified:
Confirm merges and verification (CTL-130): Before advancing the wave, check every worker in this wave:
status="done" with a non-null pr.mergedAt — confirms the PR mergedVERIFY_BEFORE_MERGE is "true": postMergeVerification.result must not be null
(verification must have run)ALLOW_SELF_REPORTED is "false" (default) AND any worker has
postMergeVerification.result: "failed": block wave advancement until all failed workers have
a non-null postMergeVerification.remediationTicket (the remediation ticket has been filed).
The ticket does not need to be resolved — filing it is sufficient to unblock.ALLOW_SELF_REPORTED is "true" AND any worker has result: "failed": log a warning but
allow wave advancement (verification is advisory)pr-created or merging, run one more Phase 4 reactive scan before
proceeding. If --auto-merge is off, flag these PRs for human review on the dashboard instead
of advancing.Write wave briefing for the next wave (see Wave Briefing section below). Then persist a copy to the thoughts repository so it survives worktree cleanup:
HANDOFF_DIR="thoughts/shared/handoffs/${ORCH_NAME}"
mkdir -p "${HANDOFF_DIR}"
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
cp "${ORCH_DIR}/wave-${WAVE}-briefing.md" \
"${HANDOFF_DIR}/${TIMESTAMP}_wave-${WAVE}-briefing.md"
Clean up completed worktrees: Run teardown hooks from config, then remove.
WORKER_DIR="${WORKTREES_BASE}/${ORCH_NAME}-${TICKET_ID}"
BRANCH_NAME="${ORCH_NAME}-${TICKET_ID}"
# Run teardown hooks from catalyst.orchestration.hooks.teardown
# Variable substitution: ${WORKTREE_PATH}, ${BRANCH_NAME}, ${TICKET_ID}, ${REPO_NAME}
for HOOK in $(echo "$TEARDOWN_HOOKS" | jq -r '.[]'); do
HOOK="${HOOK//\$\{WORKTREE_PATH\}/$WORKER_DIR}"
HOOK="${HOOK//\$\{BRANCH_NAME\}/$BRANCH_NAME}"
HOOK="${HOOK//\$\{TICKET_ID\}/$TICKET_ID}"
eval "$HOOK" || true
done
# If teardown hooks didn't already remove the worktree, do it now.
# CTL-649: stop any bg sessions still cwd'd into this worker dir before
# yanking the filesystem. Without the presweep, the supervisor session
# lingers as an ORPHAN (status=idle, cwd=<deleted>) — that's 70% of the
# 157-session leak observed on the affected host. --force fallback keeps
# us moving when an individual `claude stop` fails; the periodic reaper
# picks up any survivors on its next tick.
if [ -d "$WORKER_DIR" ]; then
PRESWEEP_BIN="$CATALYST_PLUGIN_DIR/scripts/lib/worktree-presweep.sh"
if [ -x "$PRESWEEP_BIN" ]; then
"$PRESWEEP_BIN" "$WORKER_DIR" 2>/dev/null \
|| "$PRESWEEP_BIN" --force "$WORKER_DIR" 2>/dev/null || true
fi
git worktree remove "$WORKER_DIR" 2>/dev/null || true
git branch -D "$BRANCH_NAME" 2>/dev/null || true
fi
Provision next wave: Create worktrees for Wave N+1 tickets using the same
create-worktree.sh invocation from Phase 2.
Dispatch next wave workers: Include wave briefing in dispatch prompt:
IMPORTANT: Read the Wave ${PREV} briefing before starting:
${ORCH_DIR}/wave-${PREV}-briefing.md
This briefing contains patterns, conventions, test helpers, and gotchas
discovered by the previous wave. Build ON TOP of these — do not reinvent.
Update dashboard, local state, and global state: Advance currentWave, update wave statuses.
# Update global state for wave advancement
"$STATE_SCRIPT" update "${ORCH_NAME}" \
".progress.currentWave = ${NEXT_WAVE} | .progress.completedTickets += ${COMPLETED_COUNT}"
"$STATE_SCRIPT" event "$(jq -nc \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--arg orch "${ORCH_NAME}" \
--argjson wave $NEXT_WAVE \
--argjson tickets "$(printf '%s\n' "${NEXT_WAVE_TICKETS[@]}" | jq -R . | jq -sc .)" \
'{ts: $ts, orchestrator: $orch, worker: null, event: "wave-started", detail: {wave: $wave, tickets: $tickets}}')"
NEXT_TOTAL=${#NEXT_WAVE_TICKETS[@]}
[[ -x "$ORCH_STATUS_SCRIPT" ]] && "$ORCH_STATUS_SCRIPT" emit \
--orch "${ORCH_NAME}" --phase dispatching --wave "$NEXT_WAVE" \
--total "$NEXT_TOTAL" \
--summary "wave ${NEXT_WAVE} dispatching (${NEXT_TOTAL} tickets)" 2>/dev/null || true
When all waves are complete:
Write final summary to ${ORCH_DIR}/SUMMARY.md:
Then render the dashboard one final time so the persisted handoff copy reflects the run's terminal state (CTL-230), and persist both alongside any remaining briefings:
"${CLAUDE_PLUGIN_ROOT}/scripts/update-dashboard.sh" \
--orch "${ORCH_NAME}" --orch-dir "${ORCH_DIR}" --roll-usage \
>/dev/null 2>>"${ORCH_DIR}/.update-dashboard.log" || true
HANDOFF_DIR="thoughts/shared/handoffs/${ORCH_NAME}"
mkdir -p "${HANDOFF_DIR}"
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")
cp "${ORCH_DIR}/SUMMARY.md" \
"${HANDOFF_DIR}/${TIMESTAMP}_${ORCH_NAME}-summary.md"
cp "${ORCH_DIR}/DASHBOARD.md" \
"${HANDOFF_DIR}/${TIMESTAMP}_${ORCH_NAME}-dashboard.md"
Archive orchestrator artifacts (CTL-110).
Before any worktree cleanup, sweep artifacts from the runs dir and worktrees into
~/catalyst/archives/${ORCH_NAME}/ and index them in ~/catalyst/catalyst.db. The sweep is
filesystem-first: blobs are written to the archive root BEFORE the SQLite rows are inserted.
If SQLite write fails, the filesystem artifacts remain on disk (syncable later via
catalyst-archive sync).
bun "${CLAUDE_PLUGIN_ROOT}/scripts/orch-monitor/catalyst-archive.ts" sweep "${ORCH_NAME}"
The sweep is idempotent (ON CONFLICT upserts). Re-running is safe. If it fails, capture the
exit code and stderr but proceed with the remaining cleanup steps — artifacts can be re-swept
later before teardown.
Verify Linear states: Check all tickets are in stateMap.done. If any are stuck, update them
using the Linearis CLI (run linearis issues usage for update syntax).
File improvement findings (CTL-176 / CTL-183 routing): Drain the shared findings queue and
file one ticket per entry. The orchestrator and every dispatched worker share one queue (dispatch
sets CATALYST_FINDINGS_FILE=$ORCH_DIR/findings.jsonl), so this one pass covers everything
surfaced across the whole run. Runs as a no-op when the queue is empty.
Recording findings during the run. The moment you or a worker notices friction worth fixing (workflow gaps, bugs spotted in adjacent code, recurring manual steps, gaps in tooling), record it on the shared queue:
"${CLAUDE_PLUGIN_ROOT}/scripts/add-finding.sh" \
--title "Short imperative title" \
--body "Reproduction + expected + observed + any links" \
--skill orchestrate --severity low
Record inline, the moment it's observed — context compaction loses it otherwise. Don't prompt the user mid-run; don't wait for the end; don't batch. Step 4 below files the whole queue in one pass.
What counts: friction the maintainer would want fixed, bugs in adjacent catalyst code spotted incidentally, gaps in tooling, manual steps that should be automated. What doesn't: this run's own ticket TODOs (those go in the PR body), user preferences that should be durable memory, routine debugging.
FEEDBACK="${CLAUDE_PLUGIN_ROOT}/scripts/file-feedback.sh"
CONSENT="${CLAUDE_PLUGIN_ROOT}/scripts/feedback-consent.sh"
FINDINGS_FILE="${CATALYST_FINDINGS_FILE:-${ORCH_DIR}/findings.jsonl}"
if [ -x "$FEEDBACK" ] && [ -f "$FINDINGS_FILE" ] && [ -s "$FINDINGS_FILE" ]; then
COUNT=$(wc -l < "$FINDINGS_FILE" | tr -d ' ')
# Autonomous mode (orchestrator runs without a TTY): file only when consent
# is already granted — never prompt. Interactive maintainer invocations
# prompt once, then persist on yes.
if [ "$("$CONSENT" check)" != "granted" ] && [ -z "${CATALYST_AUTONOMOUS:-}" ] && [ -t 0 ]; then
read -r -p "File $COUNT improvement tickets at end of run? [Y/n] " yn
case "$yn" in [Nn]*) : ;; *) "$CONSENT" grant >/dev/null ;; esac
fi
if [ "$("$CONSENT" check)" = "granted" ]; then
FILED=0
while IFS= read -r line; do
TITLE=$(jq -r '.title' <<<"$line")
BODY=$(jq -r '.body' <<<"$line")
SKILL=$(jq -r '.skill // "orchestrate"' <<<"$line")
RESULT=$("$FEEDBACK" --title "$TITLE" --body "$BODY" --skill "$SKILL" --json 2>/dev/null || true)
STATUS=$(jq -r '.status // "failed"' <<<"$RESULT")
if [ "$STATUS" = "filed" ]; then
ID=$(jq -r '.identifier // .url // ""' <<<"$RESULT")
echo " filed: $ID ($TITLE)"
FILED=$((FILED + 1))
fi
done < "$FINDINGS_FILE"
# Preserve queue on partial failure; delete on full success.
[ "$FILED" -eq "$COUNT" ] && rm -f "$FINDINGS_FILE"
fi
fi
Clean up all worktrees (including orchestrator worktree, unless user wants to keep it). Use
/catalyst-dev:teardown ${ORCH_NAME} for a safe, archive-gated deletion. Teardown refuses to run
unless step 2's sweep succeeded (use --force to override).
Sync thoughts: humanlayer thoughts sync to persist any shared documents.
Complete and archive global state:
# CTL-111: post orchestrator done to shared comms channel. Workers have already
# posted their own done messages from their merging-loop exit; this closes out the orch
# participant and is advisory — rc is ignored. Channel cleanup is deferred to
# CTL-110's archive sweep (do NOT call `catalyst-comms gc` here — gc is global).
if [ -n "$COMMS_BIN" ]; then
"$COMMS_BIN" done "${ORCH_NAME}" --as orchestrator >/dev/null 2>&1 || true
fi
# Mark completed in global state
"$STATE_SCRIPT" update "${ORCH_NAME}" \
'.status = "completed" | .completedAt = $now | .progress.completedTickets = .progress.totalTickets | .progress.inProgressTickets = 0'
"$STATE_SCRIPT" event "$(jq -nc --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg orch "${ORCH_NAME}" \
'{ts: $ts, orchestrator: $orch, worker: null, event: "orchestrator-completed", detail: null}')"
# Mark worktree as done (distinguishes done vs in-progress in ls)
touch "${WORKTREE_PATH}/.done" 2>/dev/null || true
# Archive to history (removes from active state)
"$STATE_SCRIPT" archive "${ORCH_NAME}"
# End session tracking
if [[ -n "${CATALYST_SESSION_ID:-}" && -x "$SESSION_SCRIPT" ]]; then
"$SESSION_SCRIPT" end "$CATALYST_SESSION_ID" --status done
fi
Report to user:
Orchestration Complete — "api-redesign"
Waves: 3/3 complete
Tickets: 6/6 merged
PRs: #87, #88, #89, #91, #92, #93
Duration: 2h 14m
Verification: 2 tickets required remediation
- PROJ-101: Missing Bruno API tests (fixed on retry)
- PROJ-105: as any cast (fixed on retry)
Summary: ${ORCH_DIR}/SUMMARY.md
History: ~/catalyst/history/${ORCH_NAME}--<timestamp>.json
Before dispatching each wave after Wave 1, the orchestrator writes a briefing document to
${ORCH_DIR}/wave-${N}-briefing.md summarizing what prior waves learned.
How the briefing is created:
git diff --stat)thoughts/shared/Use the wave briefing template from plugins/dev/templates/orchestrate-wave-briefing.md.
When two tickets in the same wave both add a Supabase migration, they can race on the same NNN_
filename prefix — whichever PR merges first wins, the other must rebase post-PR. The orchestrator
pre-assigns numbers in the briefing to prevent this.
Generation step (run before rendering the template):
# Scan migrations dir and assign numbers to migration-likely tickets in the NEXT wave.
# Prints a Markdown "## Migration Number Assignments" section, or nothing if the
# project has no supabase/migrations/ directory or no ticket in the wave is migration-
# likely. Safe to append unconditionally to the briefing.
MIG_SECTION=$("${CLAUDE_PLUGIN_ROOT}/scripts/pre-assign-migrations.sh" \
--migrations-dir "${ORCH_WORKTREE}/supabase/migrations" \
--tickets "${NEXT_WAVE_TICKETS[*]}") || MIG_SECTION=""
The script replaces the ${MIGRATION_ASSIGNMENTS} placeholder in the briefing template.
Detection heuristic (matches pre-assign-migrations.sh):
database, migration, schemasupabase/migrations, migration,
schema, ALTER TABLE, CREATE TABLEBehavior:
supabase/migrations/ does not exist in the orchestrator worktree, the script emits nothing
(repo-agnostic — projects without Supabase are unaffected).NNN_ prefix and assigns NNN+1, NNN+2, ... to
each migration-likely ticket in input order.Tests: plugins/dev/scripts/__tests__/pre-assign-migrations.test.sh covers the detection
heuristic, the scanning logic, repo-agnostic fallback, and sequential assignment.
Thoughts persistence: Every briefing is copied to thoughts/shared/handoffs/${ORCH_NAME}/ with
timestamped filenames (YYYY-MM-DD_HH-MM-SS_wave-N-briefing.md). This ensures briefings survive
worktree cleanup and are available via thoughts sync across workspaces. The final SUMMARY.md is
also persisted there at completion.
Why this matters: This is a unique advantage over other frameworks. GSD executors are stateless. Gas Town Polecats don't share findings. Wave briefings mean:
In addition to per-wave briefings (which summarize upstream for downstream workers), the orchestrator also exposes an aggregate rollup briefing for humans reviewing the whole run. The rollup is not written by the orchestrator — it is derived on-read by the orch-monitor from:
${ORCH_DIR}/workers/${ticket}.json) — provides the "what shipped" list
(any worker with pr.number set).${ORCH_DIR}/workers/${ticket}-rollup.md) — optional markdown files
written by workers after a successful merge (see oneshot/SKILL.md Phase 5 Step 4). Each
fragment contributes a ### ${ticket} section to the "Gotchas" area and its first non-blank line
becomes the one-liner next to the shipped PR.The orch-monitor assembles these on every snapshot — there is no persisted rollup file to maintain, no sync step for the orchestrator to run. Workers that do not write a fragment simply appear in "What shipped" with PR title only.
The rollup surfaces in the orch-monitor UI under the existing Briefing tab (first section, above
per-wave briefings) and as a small rollup pill on the orchestrator dashboard card.
Every worker dispatch includes mandatory testing requirements in the prompt itself. Not a suggestion — a hard requirement. The prompt explicitly states that work will be independently verified and workers should not claim done without tests.
The existing quality gate system (/validate-type-safety, /security-review, code-reviewer,
pr-test-analyzer) plus config-based gates run inside each worker's /oneshot pipeline. These are
the worker's own self-checks.
The orchestrator's own verification script audits the worker's output AFTER the worker claims done. This is the anti-reward-hacking layer — the worker can't game its own quality gates because the orchestrator runs a separate, adversarial check. The verification agent has no incentive to pass — it's scored on catching gaps, not shipping fast.
The orchestrator maintains a live dashboard at ${ORCH_DIR}/DASHBOARD.md. Re-rendered on each
Monitor wake-up (per-event), not on a poll cycle. Uses the template from
plugins/dev/templates/orchestrate-dashboard.md.
The dashboard includes:
The orchestrator manages Linear state transitions as the primary authority (CTL-133):
| Event | Linear Action |
| -------------------------- | ------------------------------------------------- |
| Worker dispatched | Move ticket to stateMap.inProgress |
| Worker creates PR | Verify ticket is stateMap.inReview — fix if not |
| Worker passes verification | No change (already in review) |
| PR merged | Verify ticket is stateMap.done — fix if not |
| Worker fails/stalls | Add comment with status, keep inProgress |
The orchestrator also adds comments to tickets for visibility using the Linearis CLI (run
linearis comments usage for syntax).
linear-transition.shAll Linear state transitions go through plugins/dev/scripts/linear-transition.sh. Since CTL-133,
the orchestrator's Phase 4 monitor is the primary source of done transitions (workers exit at
merging before merge completes). The helper reads stateMap from .catalyst/config.json, is
idempotent (no-op when the ticket is already in the target state), and exits 0 when the linearis
CLI is not installed (graceful skip).
# Transition via transition-key (reads stateMap.done from config):
"${CLAUDE_PLUGIN_ROOT}/scripts/linear-transition.sh" \
--ticket PROJ-123 --transition done --config .catalyst/config.json
# Override with an explicit state name (e.g., --state-on-merge "Shipped"):
"${CLAUDE_PLUGIN_ROOT}/scripts/linear-transition.sh" \
--ticket PROJ-123 --state "Shipped" --config .catalyst/config.json
The --state-on-merge flag on orchestrate is passed through to this helper whenever it is set.
orchestrate-bulk-closeFor runs that predated the state-transition wiring (or where the orchestrator's monitor exited
before reconciling tickets), run the bulk-close helper. It walks workers/*.json, inspects each PR
via gh, and transitions tickets via linear-transition.sh:
stateMap.donestateMap.canceledstatus=done (zero-scope) → stateMap.canceled# Preview what the helper would do (no changes):
plugins/dev/scripts/orchestrate-bulk-close --orch-dir ~/catalyst/runs/<orch-name> --dry-run
# Actually transition tickets:
plugins/dev/scripts/orchestrate-bulk-close --orch-dir ~/catalyst/runs/<orch-name>
# JSON summary for scripting:
plugins/dev/scripts/orchestrate-bulk-close --orch-dir ~/catalyst/runs/<orch-name> --json
Flags mirror orchestrate: --state-on-merge <name> and --state-on-canceled <name> override the
respective defaults.
Start the orchestrator with remote control for access from claude.ai/code. The orchestrator worktree
was already created in Phase 2, so cd into it — do not pass -w (that would ask claude to create
a new worktree using the path as a name):
( cd "${ORCH_WORKTREE}" && claude --remote-control "${ORCH_NAME}" )
Workers should NOT use remote control — they're autonomous. The human monitors workers through the orchestrator's dashboard.
Multiple orchestrators can run concurrently. Worktree names are prefixed with the orchestrator name to avoid collisions:
${WORKTREE_BASE}/
├── auth-orch/ # orchestrator 1
├── auth-orch-PROJ-101/ # orchestrator 1's worker
├── auth-orch-PROJ-102/ # orchestrator 1's worker
├── dash-orch/ # orchestrator 2
├── dash-orch-PROJ-201/ # orchestrator 2's worker
└── dash-orch-PROJ-202/ # orchestrator 2's worker
Workers write status updates to ${ORCH_DIR}/workers/${TICKET_ID}.json. The /oneshot skill
detects orchestrator presence by checking for a sibling orchestrator/ directory or the
CATALYST_ORCHESTRATOR_DIR environment variable.
Worker signal file schema: See plugins/dev/templates/worker-signal.json for the full JSON
schema including the definitionOfDone block.
How workers detect orchestrator mode:
# Detection order:
# 1. CATALYST_ORCHESTRATOR_DIR env var (set by orchestrator in dispatch)
# 2. Sibling directory matching *-orchestrator or <prefix>/ pattern
# 3. ../*/workers/ directory exists (convention-based)
ORCH_DIR="${CATALYST_ORCHESTRATOR_DIR:-}"
if [ -z "$ORCH_DIR" ]; then
# Check for sibling orchestrator directory
PARENT=$(dirname "$(pwd)")
ORCH_DIR=$(find "$PARENT" -maxdepth 1 -name "*/workers" -type d 2>/dev/null | head -1 | sed 's|/workers$||')
fi
Under the CTL-80 contract, workers poll until state=MERGED and exit at done. If a worker exits
earlier (stalled, failed, or process crash), or if findings surface after merge, the
orchestrator triages them. Two recovery patterns cover the cases that came up in
orch-data-import-2026-04-13 Round 2:
Did the PR already merge?
├── No — PR is still OPEN
│ └── Blockers on the existing PR (Codex inline threads, CI failure, missed review point).
│ → Pattern A: FIX-UP WORKER (orchestrate-fixup)
│
└── Yes — PR is MERGED
└── Findings surfaced AFTER merge (late scan, post-merge review, prod observation).
→ Pattern B: FOLLOW-UP TICKET (orchestrate-followup)
Ask gh pr view $PR_NUMBER --json state before choosing. OPEN → fix-up. MERGED/CLOSED →
follow-up. If the PR is MERGED you physically cannot push to that branch anymore; a fix-up attempt
will fail silently or push to an orphan branch.
Used on ADV-219 / PR #130 and ADV-220 / PR #132 during orch-data-import-2026-04-13. Either the
original worker exited at stalled because Codex or a security scanner posted inline threads it
could not resolve in its own poll loop, or the worker process died before reaching MERGED.
Auto-merge is blocked on unresolved threads.
When to use:
pr.state = OPENHow the orchestrator dispatches:
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-fixup" "${TICKET}" \
--issues "src/auth/middleware.ts:42: handle null session token
src/auth/middleware.ts:89: Codex flagged timing-attack comparison
test/auth.test.ts: add regression test for the null-token path" \
--pr "${PR_NUMBER}" \
--dispatch
--issues accepts a multi-line value verbatim — pass each blocker on its own
line as shown above and the rendered prompt's ## Blockers to resolve section
will contain every line in order. No caller-side escaping or \n-joining is
required.
What this does:
templates/fixup-prompt.md → ${ORCH_DIR}/workers/fixup-${TICKET}-prompt.mdtemplates/dispatch-fixup.sh.template →
${ORCH_DIR}/workers/dispatch-fixup-${TICKET}.sh--dispatch, runs the dispatch script in the background (via claude -p with streaming
JSON output)The fix-up worker:
fix(...): resolve review feedback on #${PR}gh api graphql resolveReviewThreadfixupCommitstate=MERGED (CTL-80 contract), writes pr.mergedAt + status: "done", transitions
Linear, then exitsThe orchestrator's Phase 4 monitor is the authoritative merge watcher: if the fix-up worker exits
before merge, the orchestrator observes the eventual MERGED state via the next github.pr.merged
event (or 10-minute idle fallback) and writes the merge signal. fixupCommit is metadata for the
dashboard.
Typical cost: ~$2 (much cheaper than a fresh worker because scope is narrow).
Used on ADV-221 → ADV-222 / PR #133 during orch-data-import-2026-04-13. The parent PR merged
cleanly; a post-merge security review or prod observation surfaced issues later. A fix-up is
physically impossible — the merged branch is gone.
When to use:
pr.state = MERGEDHow the orchestrator dispatches:
"${CLAUDE_PLUGIN_ROOT}/scripts/orchestrate-followup" "${PARENT_TICKET}" \
--findings "post-merge: validateSessionToken allows empty string (src/auth/middleware.ts:42)
post-merge: missing rate-limit on POST /api/auth/token (src/api/auth.ts:18)"
What this does:
linearis issues create, with description that references the
parent and enumerates the findings. Title defaults to
Follow-up: <PARENT_TICKET> post-merge findings; override with --title.main via create-worktree.sh, named
${ORCH_NAME}-${NEW_TICKET}.followUpTo: "${PARENT_TICKET}" — the orchestrator and
dashboard both use this field to render the ancestry.templates/followup-prompt.md → ${ORCH_DIR}/workers/${NEW_TICKET}-prompt.md, which
points the worker at the findings, the parent PR, and the TDD contract.claude -p command to actually start the worker — it does NOT auto-dispatch
(follow-up tickets are heavier and warrant human confirmation).The follow-up worker runs the full /oneshot pipeline (research → plan → implement → validate →
ship), same as any other worker. Its PR description must reference the parent PR number; the prompt
enforces this.
Typical cost: ~$4 (full pipeline, but scoped to the findings).
Skip Linear ticket creation with --ticket <id> if you filed the ticket manually or Linear is
unavailable. The rest of the flow proceeds with the given ticket ID.
| Field | Written by | Pattern |
| ------------- | ------------------------------ | ------- |
| fixupCommit | fix-up worker (after push) | A |
| followUpTo | orchestrator (at provisioning) | B |
These fields are additive — they do not conflict with pr.prOpenedAt, pr.autoMergeArmedAt, or
pr.mergedAt (which remain worker-owned / orchestrator-owned per the normal split).
DASHBOARD.md has two additional columns: Fix-up Commit (short SHA, empty for normal workers) and
Follow-up To (parent ticket ID, empty for normal workers and fix-up workers). See
templates/orchestrate-dashboard.md.
All error paths that stop the orchestrator must end the session:
if [[ -n "${CATALYST_SESSION_ID:-}" && -x "$SESSION_SCRIPT" ]]; then
"$SESSION_SCRIPT" end "$CATALYST_SESSION_ID" --status failed
fi
Worker crashes or stalls:
Orchestrator crash recovery:
${ORCH_DIR}/state.json + worker signal files/catalyst-dev:orchestrate --resume ${ORCH_DIR}Worktree conflicts:
/oneshot with all its phases--auto-merge flag applies to workers, not the orchestratorthoughts/shared/handoffs/${ORCH_NAME}/ for archival (CTL-230).catalyst/ paths. Falls back to .claude/ if .catalyst/ doesn't existtesting
Phase-agent that fixes a failing verify verdict so the pipeline self-heals instead of stalling to needs-human (CTL-653). Reads `${ORCH_DIR}/workers/<ticket>/verify.json`, fixes the `findings[]` (every severity:"high" plus the regression_risk drivers) directly via Edit/Write, commits the remediation, and emits `phase.remediate.complete.<ticket>`. The scheduler's router then re-dispatches `verify` to re-check (the verify⇄remediate cycle, cap 3). Dispatched as a `claude --bg` job by `phase-agent-dispatch`, which invokes it via slash command — hence `user-invocable: true`.
development
Phase agent for the verify step of the 9-phase orchestrator pipeline (CTL-450). NEW skill — has no canonical wrapper. Runs read-only adversarial verification against the implement-phase diff: tsc, tests, lint, security scan, reward-hacking scan, code review, test coverage, silent-failure hunt. Writes ${ORCH_DIR}/workers/<TICKET>/verify.json then emits phase.verify.complete.<ticket>. Reads phase-implement.json as its prior-phase artifact. NEVER writes application code — only test files allowed. Spawned via phase-agent-dispatch via slash command — hence `user-invocable: true`.
tools
--- name: phase-triage description: Phase agent that triages a Linear ticket — expands acronyms, classifies (feature/bug/docs/refactor/chore), identifies dependencies, estimates scope, writes triage.json, and posts a triage analysis comment to Linear. Triage completion is signaled by that comment plus the local triage.json — there is no `triaged` label. Emits phase.triage.complete.<TICKET> on success and phase.triage.failed.<TICKET> on error. Dispatched by the phase-agent orchestrator (CTL-452)
testing
Phase agent for the review step of the 9-phase orchestrator pipeline (CTL-450). Wraps the /review skill (gstack) — explicitly skips /ultrareview per user decision. Reads verify.json from the prior phase, runs /review against the diff, writes ${ORCH_DIR}/workers/<TICKET>/review.json, and creates a remediation commit for any HIGH-severity finding that has a deterministic fix. Emits phase.review.complete.<ticket>. Spawned via phase-agent-dispatch via slash command — hence `user-invocable: true`.