plugins/dev/skills/monitor-events/SKILL.md
Reference for the canonical event-driven wait pattern in Catalyst skills. Use when a skill needs to block on a state change (PR merged, CI completed, push to branch, ticket transitioned) WITHOUT polling. Pairs the `catalyst-events` CLI with the Claude Code `Monitor` tool and `wait-for` for short-lived workers.
npx skillsauth add coalesce-labs/catalyst monitor-eventsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
CTL-210 unified the Catalyst event log: every GitHub webhook, Linear webhook, comms post,
and orchestrator/worker lifecycle event flows through ~/catalyst/events/YYYY-MM.jsonl.
Consumers no longer poll gh pr view, linearis read, or signal files — they subscribe
to the event stream via filter.
This skill documents the canonical patterns. Use it as a reference when writing or migrating skill prose; do not invoke it as a slash command.
The two primitives below read from ~/catalyst/events/YYYY-MM.jsonl, which is populated
by the orch-monitor daemon (plugins/dev/scripts/orch-monitor/server.ts). When the
daemon is not running:
catalyst-events tail returns an empty streamcatalyst-events wait-for blocks until its --timeout expires (default 600s) and
exits non-zero — callers fall back to gh pr view polling, which can't see deploysLiveness check (the same call wired into check-project-setup.sh):
plugins/dev/scripts/catalyst-monitor.sh status # human-readable
plugins/dev/scripts/catalyst-monitor.sh status --json # {"running":true,"pid":...}
Skills that invoke check-project-setup.sh (orchestrate, oneshot, merge-pr) handle the
liveness check automatically — interactive runs prompt to start the daemon, autonomous
runs warn-to-stderr and proceed. If you reuse the primitives outside those skills, run
the status check yourself and either start the daemon (catalyst-monitor.sh start) or
plan for the polling fallback.
Three patterns are available; pick by cost shape, not just by mechanism. Listed cheapest first.
| Pattern | Mechanism | Cost shape | When to use |
|---|---|---|---|
| Broker interest (preferred) | The catalyst-broker daemon (see [[broker]]) classifies events between Claude turns and emits filter.wake.{id} only on semantic match. Deterministic interest types (pr_lifecycle, ticket_lifecycle, comms_lifecycle) match by typed-field comparison; prose interests go through Groq Llama 3.1 8B. | Lowest. Zero turns while blocked; 1 turn per matched wake. Deterministic routes cost 0 LLM tokens; prose routes ~$0.05–0.10/M tokens at small batch sizes. Pre-filtering happens out of Claude's context entirely. | Whenever the broker daemon is running. Worker-scope (single PR) and orchestrator-scope (many PRs) both supported via the same registration mechanism. |
| catalyst-events wait-for with jq | Blocking Bash CLI; one jq predicate; exits on first match or --timeout. Works without the broker. | Zero turns while blocked, 1 turn per match. Same per-match cost as Monitor but workers usually wait for ONE specific event, so total turn count stays low. Filter expansion (CI + reviews + push + merge) widens the per-match probability. | Short-lived claude -p workers when the broker is not running; standalone one-shot waits; CI scripts. |
| Monitor over catalyst-events tail | Claude Code's Monitor tool wraps tail --filter; every matching line surfaces as a turn-resuming notification. | Highest. 1 wake per matching line means broad filters can dominate context. | Long-lived orchestrator only. NOT for short-lived claude -p workers — they have no long-lived turn loop to consume notifications. |
Worker contract (matches
oneshotPhase 5, CTL-371): dispatched workers prefer the broker (Pattern 3), fall back towait-for(Pattern 2) when the daemon is down, and never useMonitor/tail(Pattern 1). Seeplugins/dev/skills/oneshot/SKILL.mdPhase 5 and theorchestratedispatch prompt for the canonical invocation.
Note on numbering. The "Pattern N" labels in the recipe sections below (Pattern 1 — worker waits for PR merge, Pattern 2 — long-lived orchestrator wakes, Pattern 3 — reactive PR lifecycle, Pattern 4 — tail by ticket) are recipe IDs, not the cost-tier rank in the table above. The recipes pre-date the broker integration; both the broker-preferred path and the
wait-forfallback inside each recipe map to the table rows above.
Both wait-for and Monitor use catalyst-events under the hood. tail is the streaming
foundation; wait-for is tail | head -n 1 with a timeout. The broker reads the same event log
as tail does but classifies before waking, which is why its per-wake cost is so much lower.
A claude -p worker that just opened PR #342 needs to block until the PR merges, then
do post-merge work.
Preferred (when catalyst-filter is running, CTL-269): register a single semantic
interest covering every concern the worker cares about (CI, comms, reviews, BEHIND,
Linear), then wait on filter.wake.${CATALYST_SESSION_ID}. The Groq-backed daemon
classifies raw events against the natural-language prompt and emits one wake per
match. See [[catalyst-filter]] for the full registration recipe and the daemon-restart
contract. The two-phase pattern below is the fallback for environments where the
daemon is not running.
Use the two-phase pattern from [[wait-for-github]]: a 3-minute Phase 1 with a diagnostic checkpoint before committing to the full 2-hour wait.
# Two-phase pattern — see [[wait-for-github]] for full reference.
REPO=$(gh repo view --json nameWithOwner --jq '.nameWithOwner')
EVENT=""
_WFG_MATCHED=false
# Phase 1: short wait with diagnostic checkpoint (3 minutes).
EVENT=$(catalyst-events wait-for \
--filter ".attributes.\"event.name\" == \"github.pr.merged\" and .attributes.\"vcs.pr.number\" == ${PR_NUMBER}" \
--timeout 180 2>/dev/null || true)
if [ -n "$EVENT" ]; then
_WFG_MATCHED=true
else
# Phase 1 timed out — run diagnostics before extending to Phase 2.
echo "Phase 1 timed out after 3 min — running diagnostics..."
STALLED=false
FILTER_MISMATCH=false
_LOG_FILE=~/catalyst/events/$(date -u +%Y-%m).jsonl
_LOG_LINES=$(wc -l < "$_LOG_FILE" 2>/dev/null | tr -d ' ')
_SINCE_LINE=$(( ${_LOG_LINES:-0} > 500 ? ${_LOG_LINES:-0} - 500 : 0 ))
HEARTBEATS=$(catalyst-events tail --since-line "$_SINCE_LINE" 2>/dev/null \
| jq -c 'select(.attributes."event.name" == "session.heartbeat")' | wc -l | tr -d ' ')
[ "${HEARTBEATS:-0}" -eq 0 ] && { echo "WARN: No heartbeats — event log may be stalled"; STALLED=true; }
RAW_HIT=$(catalyst-events tail --since-line "$_SINCE_LINE" 2>/dev/null | jq -c \
--argjson pr "$PR_NUMBER" \
'select((.attributes."vcs.pr.number" == $pr) or (.body.payload.prNumbers // [] | contains([$pr])))' | head -1)
if [ -n "$RAW_HIT" ]; then
echo "WARN: Event arrived but filter did not match. Raw event:"; echo "$RAW_HIT" | jq .
FILTER_MISMATCH=true
fi
TUNNEL_STATE=$(catalyst-monitor status --json 2>/dev/null | jq -r '.webhookTunnel.connected // false')
[ "$TUNNEL_STATE" != "true" ] && { echo "WARN: Webhook tunnel not running"; STALLED=true; }
if [ "$FILTER_MISMATCH" = "false" ] && [ "$STALLED" = "false" ]; then
# Infrastructure healthy — extend to Phase 2.
EVENT=$(catalyst-events wait-for \
--filter ".attributes.\"event.name\" == \"github.pr.merged\" and .attributes.\"vcs.pr.number\" == ${PR_NUMBER}" \
--timeout 7200 2>/dev/null || true)
[ -n "$EVENT" ] && _WFG_MATCHED=true
fi
fi
# Authoritative REST confirmation — always follows any wait-for path.
MERGED=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}" --jq '.merged' 2>/dev/null || echo "false")
if [ "$MERGED" = "true" ]; then
# Proceed with post-merge work
fi
Non-negotiable: every wait-for is paired with an authoritative REST check. Reasons:
wait-for
blocks until timeout. The gh api call after timeout is the safety net.wait-for. The fallback covers that gap too.wait-for returns the first matching line; gh api
returns canonical truth. Use gh api (REST), never gh pr view --json (GraphQL).The orchestrator's Phase 4 used to poll every 2–3 minutes for every active worker. With
CTL-210, the orchestrator runs a Monitor watching all PR/CI/push/lifecycle events, and
the reactive scan drops to a 10-minute idle fallback as the safety net (CTL-243).
Preferred (when catalyst-filter is running, CTL-257 + CTL-269): the orchestrate
skill emits filter.register at Phase 4 start with a prompt covering CI events,
PR transitions, BEHIND-state pushes, comms attention from workers, and Linear ticket
changes. Phase 4 then waits on filter.wake.${ORCH_NAME} for a single unified wake
covering all those concerns. See [[catalyst-filter]]. The Monitor-over-tail pattern
below is the fallback for environments without the daemon.
The recommended shape is scope-aware, generated from the orchestrator's worker signal directory (CTL-240):
Use the `Monitor` tool with this command:
FILTER=$(catalyst-events build-orchestrator-filter "$ORCH_DIR")
catalyst-events tail --filter "$FILTER"
When a notification arrives, re-evaluate the affected worker's state via the
canonical `gh pr view` query. Do NOT trust the event's payload as the source
of truth — use it only as a wake-up trigger.
build-orchestrator-filter reads ${ORCH_DIR}/workers/*.json and emits a single jq
predicate that scopes catalyst-origin events by orchestrator name, github events by
branch-ref prefix and PR-number set, check_suite / workflow_run events by
detail.prNumbers, and linear events by ticket. Re-build it after dispatching new
workers so the PR/ticket sets stay in sync.
If you need a hand-rolled equivalent (e.g. the orchestrator name isn't yet known, or you only want broad event-type coverage and don't care about scoping out sibling orchestrators), the broad form is:
catalyst-events tail --filter '
(.attributes."event.name" | startswith("github.pr.")) or
(.attributes."event.name" | startswith("github.pr_review")) or
(.attributes."event.name" | startswith("github.issue_comment")) or
(.attributes."event.name" | startswith("github.check_")) or
(.attributes."event.name" | startswith("github.workflow_run")) or
(.attributes."event.name" | startswith("github.deployment")) or
(.attributes."event.name" == "github.push") or
(.attributes."event.name" | startswith("linear.issue.")) or
(.attributes."event.name" == "orchestrator.worker.phase_advanced") or
(.attributes."event.name" == "orchestrator.worker.status_terminal") or
(.attributes."event.name" == "orchestrator.worker.pr_created") or
(.attributes."event.name" == "orchestrator.worker.done") or
(.attributes."event.name" == "orchestrator.worker.failed") or
(.attributes."event.name" == "orchestrator.attention.raised") or
(.attributes."event.name" == "orchestrator.attention.resolved")
'
pr_review_comment events are where Codex review threads land (required for CTL-64
BLOCKED auto-fixup detection); workflow_run.completed is the most reliable
CI-done signal. The filter is intentionally broad — it covers every event type that
could require a dashboard re-render, a fix-up dispatch, or a merge-confirmation
re-scan. See orchestrate/SKILL.md Phase 4 for the wake-up classification table that
maps each event to its reaction.
The orchestrator continues to maintain its 10-minute fallback scan (defense-in-depth). The fast path is event-driven; the slow path is the safety net.
Cross-orchestrator scoping (CTL-234). When multiple orchestrators run on the same
machine, narrow the filter with (.attributes."catalyst.orchestrator.id" == "orch-foo") to ignore events from
sibling runs. As of CTL-234, the webhook receiver stamps .attributes."catalyst.orchestrator.id" (and
the back-compat top-level .orchestrator) on github.* events for PRs whose head
branch starts with <orchId>-, so the filter
(.attributes."catalyst.orchestrator.id" == "orch-foo") and (
(.attributes."event.name" | startswith("github.pr.")) or
(.attributes."event.name" | startswith("github.check_")) or
(.attributes."event.name" == "github.push") or
(.attributes."event.name" | startswith("worker-"))
)
works for both worker-lifecycle events (already attributed) and webhook events
(now attributed via PR-number lookup or head-ref prefix). Events that don't belong
to any active orchestrator (human-merged PRs to main, dependabot PRs, etc.) keep
.orchestrator == null and are filtered out, which is the desired behaviour.
Every Monitor wake and every event-driven wait-for return MUST be acknowledged
with a single short line of assistant text before the agent returns to waiting.
This is the canonical way to defeat the Human:\n<task-id> rendering bleed —
without it, transcripts become unreadable.
Why this is non-negotiable. The Claude Code harness wraps every Monitor wake
in a user-role <task-notification> XML message whose <summary> is the
description field you passed to the Monitor tool. If the assistant returns
an end_turn containing only thinking blocks (no text content) — which
happens when the model decides the event is routine and has "nothing to say" —
the UI renders the next <task-notification> raw and the <task-id> element
leaks into the visible transcript as a phantom user message:
⏺ Monitor event: "orch-adv-931 events (PR/CI/worker/comms)"
⏺ Human:
ba18h9cyy
⏺ Monitor event: "orch-adv-931 events (PR/CI/worker/comms)"
⏺ Human:
ba18h9cyy
The "ba18h9cyy" lines look like the human user spoke but are actually harness
XML element content. The fix is on the agent side, not the harness side: emit
any text content in the response turn and the rendering artifact disappears.
Required line shapes. Pick the one that matches the wake. Each is a single line under ~120 characters; both "what arrived" and "what we're doing about it" must appear (even if "what we're doing" is "nothing").
| Situation | Line shape |
|---|---|
| Actionable wake | wake: <event.name> #<pr> [interest=<type>] — <action being taken> |
| Non-actionable / routine | wake: <event.name> — routine, staying in event loop |
| Already addressed (stale broker re-fire) | wake: <event.name> — already addressed, no-op |
| Idle-timeout safety-net scan | wake: idle-timeout — running periodic reconciliation scan |
| Daemon-down / REST fallback | wake: rest-poll — broker down, polling gh api |
Include the matched filter clause or interest type when available. Broker
wakes (filter.wake.<sid>) carry .body.payload.interest_id and
.body.payload.reason; surface both. Hand-rolled tail / wait-for filters
have no interest_id — surface the event name and the matched PR/ticket scope
instead. The audience for this line is a human reading the transcript later
trying to reconstruct why the agent fired now and what it decided to do.
Wake-narration fixture (good vs bad). Use this as the acceptance check when reviewing a transcript:
# BAD — orchestrator returned thinking-only end_turn, task-id leaks
⏺ Monitor event: "orch-adv-931 events (PR/CI/worker/comms)"
⏺ Human:
ba18h9cyy
⏺ Monitor event: "orch-adv-931 events (PR/CI/worker/comms)"
⏺ Human:
c9a4f2x1q
# GOOD — orchestrator narrates every wake; no task-id bleed
⏺ Monitor event: "orch-adv-931 events (PR/CI/worker/comms)"
⏺ wake: github.check_suite.completed #412 [interest=pr_lifecycle] —
CI failed on PR #412, dispatching auto-fixup
⏺ Monitor event: "orch-adv-931 events (PR/CI/worker/comms)"
⏺ wake: orchestrator.worker.phase_advanced — routine, staying in event loop
This requirement applies to every event-driven listen loop in Catalyst skills:
orchestrate Phase 4 (orchestrator's own Monitor over catalyst-events tail)orchestrate worker dispatch prompt (each worker's listen loop)oneshot Phase 5 (worker's PR listen loop)Monitor or catalyst-events wait-forThe narration line is not for the agent — it is for the operator reading the transcript later. Treat the wake as a question; treat the line as the answer.
When the broker daemon is running and a filter.wake.* event arrives, the payload contains
richer context than just the reason string. Use catalyst-events wake-extract to normalize
the varied payload shapes into a single predictable object:
EVENT=$(catalyst-events wait-for \
--filter ".attributes.\"event.name\" | startswith(\"filter.wake.${CATALYST_SESSION_ID}\")" \
--timeout 600)
# Narrate the wake (mandatory — see Narration section above)
FIELDS=$(echo "$EVENT" | catalyst-events wake-extract)
REASON=$(echo "$FIELDS" | jq -r '.reason // "unknown"')
INTEREST=$(echo "$FIELDS" | jq -r '.interest_id // "unknown"')
echo "wake: filter.wake [interest=${INTEREST}] — ${REASON}"
# Branch on normalized fields instead of re-querying GitHub/Linear
CI_CONCLUSION=$(echo "$FIELDS" | jq -r '.ci_conclusion // empty')
REVIEW_STATE=$(echo "$FIELDS" | jq -r '.review_state // empty')
MERGED=$(echo "$FIELDS" | jq -r '.merged // empty')
case "$CI_CONCLUSION" in
failure|timed_out)
# CI failed — pull logs, fix, push without a separate gh api call
;;
esac
case "$MERGED" in
true)
# PR merged event in the payload — still confirm via gh api REST before declaring done
;;
esac
See [[broker]] §10 for the complete wake-extract output schema and the per-interest-type
reason string catalogue.
When source_events is empty (watchdog wakes, some Groq prose wakes): all wake-extract
fields are null except interest_id and reason. Treat the wake as a "go re-check" signal
and fall back to the authoritative REST check.
Pattern 2/3 fallback (no broker): the raw event patterns in this skill use raw
github.* events from wait-for, not filter.wake.* wakes — wake-extract does not apply
to those paths.
Pattern 1's single-event wait is fine for the happy path: the PR merges, the worker exits. But between PR-create and PR-merge, four things can happen that the agent should react to, not just sleep through:
| Event | Means | Agent should |
|---|---|---|
| github.check_suite.completed (conclusion=failure / timed_out) | CI failed | pull failure logs, fix, push, re-enter the wait |
| github.pr_review.submitted (state=changes_requested) | Reviewer requested changes | run /review-comments, push, re-enter the wait |
| github.push to the base branch | PR is now BEHIND | gh pr update-branch, re-enter the wait |
| github.pr.merged / github.pr.closed | terminal | confirm via gh api REST, exit |
Wrap one disjunctive wait-for around all of them; classify with a case on
.event; re-enter the loop on every non-terminal event. Authoritative
gh api REST check runs on every wake-up — same safety rule as Pattern 1.
# Two-phase compliant cadence loop — see [[wait-for-github]]. The 1800s timeout
# serves as a cadence fallback; the authoritative REST check runs on every wake-up.
REPO=$(gh repo view --json nameWithOwner --jq '.nameWithOwner')
BASE_BRANCH=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}" --jq '.base.ref')
ITER=0
MAX_ITER=20
while [ $ITER -lt $MAX_ITER ]; do
ITER=$((ITER + 1))
EVENT_JSON=$(catalyst-events wait-for \
--filter '
(.attributes."event.name" == "github.pr.merged" and .attributes."vcs.pr.number" == '"$PR_NUMBER"') or
(.attributes."event.name" == "github.pr.closed" and .attributes."vcs.pr.number" == '"$PR_NUMBER"') or
(.attributes."event.name" == "github.check_suite.completed"
and (.body.payload.prNumbers // [] | index('"$PR_NUMBER"') != null)
and (.attributes."cicd.pipeline.run.conclusion" == "failure" or .attributes."cicd.pipeline.run.conclusion" == "timed_out")) or
(.attributes."event.name" == "github.pr_review.submitted"
and .attributes."vcs.pr.number" == '"$PR_NUMBER"'
and (.body.payload.state == "changes_requested"
or (.body.payload.state == "commented" and (.body.payload.author.type // "") == "Bot"))) or
(.attributes."event.name" == "github.push" and .attributes."vcs.ref.name" == "refs/heads/'"$BASE_BRANCH"'")
' \
--timeout 1800 || true)
# MANDATORY authoritative REST re-check on every wake-up.
PR_STATE=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}" \
--jq 'if .merged then "MERGED" elif .state == "closed" then "CLOSED" else "OPEN" end' \
2>/dev/null || echo "OPEN")
if [ "$PR_STATE" = "MERGED" ]; then break; fi
if [ "$PR_STATE" = "CLOSED" ]; then exit 1; fi
EVENT=$(echo "$EVENT_JSON" | jq -r '.attributes."event.name" // ""')
case "$EVENT" in
github.check_suite.completed)
# Pull failure logs, classify, fix, push. Then re-enter the loop.
;;
github.pr_review.submitted)
# Bot reviewers are addressable inline; humans require operator action.
# See "Bot vs human authorship" below for the routing heuristic.
AUTHOR_TYPE=$(echo "$EVENT_JSON" | jq -r '.body.payload.author.type // "User"')
if [ "$AUTHOR_TYPE" = "Bot" ]; then
/catalyst-dev:review-comments "$PR_NUMBER"
fi
;;
github.push)
gh pr update-branch "$PR_NUMBER" || true
;;
"")
# Timed out — no event. The gh api check above confirmed not merged;
# fall through to next iteration.
;;
esac
done
Review and comment events carry body.payload.author = { login, type } where type
is GitHub's user.type field — typically "User" or "Bot". Use it to route
review-changes-requested events without re-fetching from the GitHub API:
AUTHOR_TYPE=$(echo "$EVENT_JSON" | jq -r '.body.payload.author.type // "User"')
case "$AUTHOR_TYPE" in
Bot)
# codex, claude-code-review, dependabot — addressable inline.
/catalyst-dev:review-comments "$PR_NUMBER"
;;
*)
# Human reviewer — surface to the operator and keep waiting.
;;
esac
The // "User" fallback ensures pre-CTL-228 events (no author field) are
treated as human-authored — the safer default.
check_suite.completed has no vcs.pr.number. A check suite spans many
PRs; the affected PR numbers live in body.payload.prNumbers. Filter with
(.body.payload.prNumbers // [] | index($PR) != null), not .attributes."vcs.pr.number" == $PR.
The filter is one jq expression. Clauses are joined with or, not
comma. Each clause is parenthesized.
Bash quoting. The shell-variable interpolation ('"$PR_NUMBER"') is
intentional — the outer single quotes protect the jq syntax from $-expansion,
the inner double quotes re-enable it for one variable. Test your filter
by piping a fixture event through jq -c "select(<filter>)" before
trusting it in production.
Iteration cap. MAX_ITER=20 prevents runaway loops on a stuck failure
mode. Apply per-failure-type fix budgets inside each handler too (e.g. give
up after 3 distinct fix attempts on the same CI check).
All filtering belongs inside the --filter jq predicate (CTL-240, CTL-372).
Do NOT add a downstream | grep … / | awk … / | sed … / | jq …
post-pipe to a catalyst-events tail invocation. The primary reason is
clarity: --filter is the single place a reader can look to know what
reaches the consumer. Splitting filter logic across two stages hides
conditions and invites small regressions (someone drops the
--line-buffered flag, or the post-pipe pattern no longer matches the
canonical envelope). Use catalyst-events build-orchestrator-filter "$ORCH_DIR" to generate a complete scope-aware predicate from the worker
signal directory instead of hand-rolling secondary pipes.
Secondary reason (the historical CTL-240 concern): BSD awk, unflagged
BSD grep, and unflagged sed buffer stdout in 4 KB blocks when stdout
is not a TTY (the Monitor harness captures it). With the typical
~1–3 events/min orchestrator cadence the buffer never fills and
notifications stall silently for 15+ minutes despite live PR activity.
grep --line-buffered and jq --unbuffered DO mechanically flush per
line on macOS and Linux (per their man pages), so the buffering failure
mode is conditional, not absolute — but you should still not need either
flag, because filtering belongs in --filter.
Anti-pattern: | grep -v '"event.name":"filter.wake"' on the
orchestrator's Monitor (observed in real sessions). Wrong for two reasons:
(a) filter.wake.* envelopes are canonical-only and do not satisfy any
clause of build-orchestrator-filter's v1 predicate, so they never reach
the consumer in the first place. (b) The pattern would also strip the
orchestrator's OWN intended filter.wake.${ORCH_NAME} wake — the event
the orchestrator registered for. Since CTL-346 the broker no longer
re-classifies its own emissions, so there is no feedback loop to defend
against on the consumer side either.
github.* events carry orchestrator: null and worker: null (CTL-240).
Real webhook events are scoped only by .attributes."vcs.repository.name",
.attributes."vcs.ref.name", .attributes."vcs.pr.number",
.attributes."vcs.revision", and .body.payload.prNumbers. A scope predicate like
.attributes."catalyst.orchestrator.id" == "orch-foo" will silently drop every github event.
Use branch-ref prefix matching (.attributes."vcs.ref.name" | startswith("refs/heads/orch-foo-"))
and PR-number-set matching (.attributes."vcs.pr.number" | IN(501,502)) instead — or use
build-orchestrator-filter which handles this for you.
The orchestrator's Phase 4 loop has used this shape for a while —
Monitor over tail with a disjunctive filter, then case on the
gh pr view result. The pattern above is the short-lived claude -p-friendly
equivalent: wait-for instead of Monitor, case on the matched event
instead of the canonical PR state. They share the same safety rule: treat
events as wake-up triggers; treat gh pr view (or its equivalent) as truth.
The worker emitter splits phase transitions into two topics so subscribers can
filter by severity instead of inspecting .detail fields:
| Topic | Tier | When | Coalesces? | Carries detail.pr? |
|---|---|---|---|---|
| worker-phase-advanced | info | routine in-flight phases (researching, planning, implementing, validating, shipping) | yes — batched per orchestrator within windowSec (default 30 s) | no |
| worker-status-terminal | act | actionable transitions (pr-created, merging, merged, done, failed, stalled, deploy-failed, deploying) | no — emitted immediately and flushes any pending coalesce queue | yes when to ∈ {pr-created, merging, merged, done, deploy-failed} |
Coalesced orchestrator.worker.phase_advanced events leave
attributes."catalyst.worker.ticket" unset at the envelope level; the per-change
worker lives inside .body.payload.changes[]:
{
"ts": "2026-05-04T22:00:00Z",
"orchestrator": "orch-foo",
"worker": null,
"event": "worker-phase-advanced",
"detail": {
"windowSec": 30,
"changes": [
{ "ts": "2026-05-04T21:59:32Z", "worker": "CTL-229", "from": "researching", "to": "planning" },
{ "ts": "2026-05-04T21:59:36Z", "worker": "CTL-232", "from": "planning", "to": "implementing" }
]
}
}
Stragglers (the last event in a sequence) flush via the next emit OR via an
explicit emit-worker-status-change.sh flush --orch <id> invocation. The
orchestrator's 10-min idle scan is the documented contract for periodic
flushing — a worker exiting between phases does not need to flush its own
queue.
Subscriber recipes:
# Subscribe to actionable transitions only (no routine progress noise)
catalyst-events tail --filter '.attributes."event.name" == "orchestrator.worker.status_terminal"'
# Subscribe to routine progress (already coalesced into batches)
catalyst-events tail --filter '.attributes."event.name" == "orchestrator.worker.phase_advanced"'
# A worker just opened a PR — wait until it tells you the PR number
catalyst-events wait-for --timeout 600 \
--filter '.attributes."event.name" == "orchestrator.worker.status_terminal" and .body.payload.to == "pr-created" and .attributes."catalyst.worker.ticket" == "CTL-229"' \
| jq -r '.body.payload.pr.number'
Useful for live debugging or operator dashboards:
# linear.issue.identifier for Linear-event context; catalyst.worker.ticket for worker/orchestrator context
catalyst-events tail --filter '.attributes."linear.issue.identifier" == "CTL-210" or .attributes."catalyst.worker.ticket" == "CTL-210"'
Captures GitHub PR events scoped to that ticket, Linear webhook events for the issue, comms posts where the ticket is the from/parent, and orchestrator/worker lifecycle events.
The patterns above are all subscription-mode usage. tail and wait-for seek to EOF on
first run, so they only see events that arrive after the command starts. That is the
correct default when a worker is blocking on a fresh PR merge or an orchestrator is
waking on live progress — historical heartbeat noise would otherwise drown out the
signal.
It is the wrong default when the question is "are events flowing at all?"
# User runs this to "check if any events are coming through"
catalyst-events tail --filter '.attributes."event.name" | startswith("github.")'
# Sits silent. User concludes: tunnel is dead.
# Reality: tunnel is fine, just no NEW events since they started tailing.
A silent live-tail does NOT mean the tunnel is dead. It means there has been no NEW
activity matching your filter since you started tailing. To verify flow, switch to
diagnostic mode by passing --since-line 0, which reads the entire current month's log
from the start.
# Most recent github event of any kind, regardless of repo
catalyst-events tail --since-line 0 --filter '.attributes."event.name" | startswith("github.")' \
| tail -1
# Hourly count over the current log file
catalyst-events tail --since-line 0 --filter '.attributes."event.name" | startswith("github.")' \
| jq -r '.ts | sub("Z$"; "") | sub(":[0-9]{2}:[0-9]{2}$"; ":00:00")' \
| sort | uniq -c
# Per-repo breakdown — distinguishes "quiet repo" from "dead tunnel"
catalyst-events tail --since-line 0 --filter '.attributes."event.name" | startswith("github.")' \
| jq -r '.attributes."vcs.repository.name"' | sort | uniq -c | sort -rn
The per-repo breakdown is the one that most often resolves the misdiagnosis — a tunnel can be perfectly healthy while one watched repo has been quiet for hours and another is flowing normally.
Once CTL-244 lands, catalyst-monitor status --json will expose a webhookTunnel
object ({connected, smeeUrl, lastEventAt, eventCount24h, eventCount24hByRepo}). That
is the structured first diagnostic step and should be checked before reaching for the
recipes above. The diagnostic recipes here are the manual deep-dive when status JSON is
unavailable, insufficient, or contradicts what you expect.
All event.name values are the canonical OTel form that appears on disk. The
authoritative list of actionable names for workers lives in
[[event-name-allowlist]]; the rows below are illustrative filters built from it.
| Need | Filter |
|---|---|
| All GitHub webhook events | .attributes."event.name" \| startswith("github.") |
| All Linear webhook events | .attributes."event.name" \| startswith("linear.") |
| One PR's merge | .attributes."event.name" == "github.pr.merged" and .attributes."vcs.pr.number" == 342 |
| Any push to a branch | .attributes."event.name" == "github.push" and .attributes."vcs.ref.name" == "refs/heads/main" |
| CI completion | .attributes."event.name" \| startswith("github.check_suite.") |
| CI failure for one PR | .attributes."event.name" == "github.check_suite.completed" and .attributes."cicd.pipeline.run.conclusion" == "failure" and (.body.payload.prNumbers // [] \| index(342) != null) |
| Review changes-requested by a bot | .attributes."event.name" == "github.pr_review.submitted" and .body.payload.state == "changes_requested" and .body.payload.author.type == "Bot" |
| Comment from a human on a PR | .attributes."event.name" == "github.issue_comment.created" and (.body.payload.author.type // "User") != "Bot" |
| Linear ticket state change | .attributes."event.name" == "linear.issue.state_changed" and .attributes."linear.issue.identifier" == "CTL-210" |
| Comms message in one channel | .attributes."event.name" == "comms.message.posted" and .body.payload.channel == "orch-foo" |
| Routine worker phase transitions (info-tier, coalesced batches; CTL-229) | .attributes."event.name" == "orchestrator.worker.phase_advanced" |
| Worker terminal transitions (PR-created, merging, done, fail; CTL-229) | .attributes."event.name" == "orchestrator.worker.status_terminal" |
| One worker's terminal events with PR number | .attributes."event.name" == "orchestrator.worker.status_terminal" and .attributes."catalyst.worker.ticket" == "CTL-210" and (.body.payload.pr.number // null) |
| Worker reached terminal state | .attributes."event.name" == "orchestrator.worker.done" or .attributes."event.name" == "orchestrator.worker.failed" |
| PR review activity | (.attributes."event.name" \| startswith("github.pr_review")) or (.attributes."event.name" == "github.issue_comment.created") |
| Deploy outcome | .attributes."event.name" \| startswith("github.deployment") |
| Attention raised in this orchestrator | .attributes."event.name" == "orchestrator.attention.raised" and .attributes."catalyst.orchestrator.id" == "orch-foo" |
--timeout semanticswait-for --timeout N exits 1 after N seconds with no output. The caller decides what
to do (usually: run the authoritative one-shot, then either re-invoke wait-for or
give up).--timeout 7200. The fallback after timeout
re-checks via gh and either continues or re-invokes wait-for.The event stream is a single point of failure. Mitigations:
wait-for with a one-shot fallback. No skill prose may say "trust the
event stream" — every wait must be paired with an authoritative check.tail -F it; no daemon required for reads.The event log carries two legacy schemas plus the new canonical shape (CTL-300):
catalyst-state.sh event): { ts, event, orchestrator, worker, detail }id, schemaVersion: 2,
source, scope (replacing flat orchestrator / worker with a nested object;
v2 still emits the flat fields too as backward-compat aliases).attributes."event.name", attributes."vcs.pr.number",
etc. All new producers emit canonical; filters in this doc target canonical paths.Filters that read .attributes."vcs.repository.name" / .attributes."vcs.pr.number" / .attributes."linear.issue.identifier" only match canonical envelopes.
Filters that read .attributes."event.name" work for canonical; .event / .worker / .orchestrator work for v1/v2. Choose based on
which sources you need to match — webhook events use canonical, orchestrator events may still use v1/v2.
catalyst-events tail [--filter <jq>] [--since-line <N>]
catalyst-events wait-for [--filter <jq>] [--timeout <sec>]
# Exit codes:
# 0 wait-for: matched a line (printed to stdout)
# 1 wait-for: timed out
# 2 usage error
Environment:
CATALYST_DIR — base directory (default $HOME/catalyst)CATALYST_EVENTS_DIR — events directory (default $CATALYST_DIR/events)CATALYST_EVENTS_FILE — override path entirely (used by tests)merge-pr Phase 6 — uses Pattern 3 (reactive PR lifecycle, CTL-228)create-pr Step 12 — uses Pattern 3 (reactive PR lifecycle, CTL-228)oneshot Phase 5 — worker exits at merging; long-lived watchers
(orchestrator Phase 4, standalone /merge-pr) consume Pattern 3 on its
behalforchestrate Phase 4 — uses Monitor over tail with a disjunctive
filter; the long-lived precedent for Pattern 3catalyst-comms — agent-to-agent pub/sub on per-channel files;
comms.message.posted fan-out events go through this same logcatalyst-broker daemon protocol (auto-correlation via
agent.checkin, deterministic pr_lifecycle / ticket_lifecycle /
comms_lifecycle routes). Preferred wake mechanism when running;
collapses the per-concern jq filters in the recipes below into a single
filter.wake.{id} per agent (CTL-303, CTL-371)testing
Phase-agent that fixes a failing verify verdict so the pipeline self-heals instead of stalling to needs-human (CTL-653). Reads `${ORCH_DIR}/workers/<ticket>/verify.json`, fixes the `findings[]` (every severity:"high" plus the regression_risk drivers) directly via Edit/Write, commits the remediation, and emits `phase.remediate.complete.<ticket>`. The scheduler's router then re-dispatches `verify` to re-check (the verify⇄remediate cycle, cap 3). Dispatched as a `claude --bg` job by `phase-agent-dispatch`, which invokes it via slash command — hence `user-invocable: true`.
development
Phase agent for the verify step of the 9-phase orchestrator pipeline (CTL-450). NEW skill — has no canonical wrapper. Runs read-only adversarial verification against the implement-phase diff: tsc, tests, lint, security scan, reward-hacking scan, code review, test coverage, silent-failure hunt. Writes ${ORCH_DIR}/workers/<TICKET>/verify.json then emits phase.verify.complete.<ticket>. Reads phase-implement.json as its prior-phase artifact. NEVER writes application code — only test files allowed. Spawned via phase-agent-dispatch via slash command — hence `user-invocable: true`.
tools
--- name: phase-triage description: Phase agent that triages a Linear ticket — expands acronyms, classifies (feature/bug/docs/refactor/chore), identifies dependencies, estimates scope, writes triage.json, and posts a triage analysis comment to Linear. Triage completion is signaled by that comment plus the local triage.json — there is no `triaged` label. Emits phase.triage.complete.<TICKET> on success and phase.triage.failed.<TICKET> on error. Dispatched by the phase-agent orchestrator (CTL-452)
testing
Phase agent for the review step of the 9-phase orchestrator pipeline (CTL-450). Wraps the /review skill (gstack) — explicitly skips /ultrareview per user decision. Reads verify.json from the prior phase, runs /review against the diff, writes ${ORCH_DIR}/workers/<TICKET>/review.json, and creates a remediation commit for any HIGH-severity finding that has a deterministic fix. Emits phase.review.complete.<ticket>. Spawned via phase-agent-dispatch via slash command — hence `user-invocable: true`.