cmux-agents

Orchestration layer for AI agents in cmux panes. Low-level pane operations (splits, reads, sends) use cmux MCP tools — this skill handles the workflow on top.

Default Lifecycle (2026-04-19)

Visible worker peers now use the agent-based cmux MCP tools by default:

mcp__cmux__spawn_agent({repo, cli, prompt, workspace?, parent_agent_id?})
Capture the returned agent_id immediately and write it into your collab / AGENT_REGISTRY.
mcp__cmux__wait_for({agent_id, target_state, timeout_ms}) for lifecycle gates.
mcp__cmux__send_to_agent({agent_id, text, press_enter}) for follow-ups.
mcp__cmux__get_agent_state({agent_id}) for a combined registry + parsed-screen snapshot.
mcp__cmux__list_agents(...) or mcp__cmux__my_agents(...) whenever topology changes.
mcp__cmux__stop_agent({agent_id}) to stop or recycle a worker.

Reference agents by agent_id, not surface index. Surface IDs drift after crashes, respawns, and pane reuse; agent_id is the durable handle. Surface refs still matter for raw inspection (read_screen) and non-agent panes (new_surface, browser tabs), but not for the default worker lifecycle.

repoGolem Launcher Contract

Default path is still spawn_agent. When FR-01, non-agent terminals, or recovery flows force a manual launcher, use the repoGolem one true form:

{repo}{Tool} -s "prompt"

brainlayerCursor -s "audit X"      # gather/read-only
brainlayerCodex  -s "fix Y"        # implement
brainlayerClaude -s "coordinate Z" # orchestrate
brainlayerGemini -s "visual task"  # visual/OCR

The launcher already handles cd, MCP wiring, env vars, iTerm profile, 1Password-backed secrets, and tab title. Do not add cd ~/Gits/..., MCP_CONNECTION_NONBLOCKING=1, CLAUDE_CODE_NO_FLICKER=1, source ~/.zshrc &&, raw cursor/codex/claude, or --fast. See /repogolem for launcher flags and /agent-routing AP11 for the incident history.

Model + Headless Gates (2026-06-05)

repoGolem launchers are the only authoritative model source for visible cmux pane workers. Do not pass model to cmux spawn_agent, and do not bake stale model strings into launcher commands. Stale overrides silently downgraded real panes (including a Claude 1M-context pin falling back to 200K, and stale Codex strings while the pane was actually on a newer default). Internal ephemeral subagents: see /agent-routing AP11 (canonical model-param rule).
Agent sessions are interactive. Do not use --print, repoGolem -p, or other headless/print modes for workers, leads, or verification gates. Run a labeled interactive pane and harvest the result from the live session.

Worker vs Audit Placement

| Intent | Placement | Why | |--------|-----------|-----| | Worker: code, implementation, collab | Split in current workspace | Visible beside the lead session; easy to monitor and nudge | | Audit/research: read-only analysis, perspective, discovery | Separate named workspace | Keeps the working view uncluttered and easy to find in the sidebar |

Name audit/research workspaces clearly, e.g. Audit: golems or Research: auth. Use same-workspace down splits only when the user explicitly asks to keep the audit visible next to the lead.

Current Caveats

FR-01 — hyphenated repo names can mis-resolve to the wrong launcher

Known live failure: spawn_agent({repo:"skill-creator"}) can guess skill-creatorClaude instead of the real repoGolem launcher skillcreatorClaude. The immediate workaround lives in /repogolem; verify the launcher there before relying on spawn_agent for a new hyphenated repo.

FR-06 — state parser ambiguity (`booting` vs idle-with-spinner)

wait_for and get_agent_state depend on cmux's parser. In rare cases the registry still says booting while read_screen shows a perfectly usable prompt. If wait_for times out but the raw pane clearly shows the agent is ready, trust read_screen, file a cmuxlayer bug, and use one surface-level fallback send only if you must unblock delivery.

First-run Touch ID note

First launch of a newly signed launcher binary can still trigger a one-time Touch ID prompt. Treat this as a first-run caveat, not a primary orchestration strategy.

Completion Signals — file > wait_for > get_agent_state > read_screen

Why this exists: surface-based polling burned hundreds of read_screen calls and still missed real state transitions. The new default is event-driven waits on stable agent_ids, with raw screen reads reserved for ambiguity or output extraction.

Ranked reliability of completion signals (use the highest that fits the job):

| Signal | Reliability | When to use | |--------|-------------|-------------| | 1. Output file with DONE marker | Ground truth. File either exists + contains marker, or not. | Every multi-minute autonomous worker task. | | 2. wait_for({agent_id, target_state:"done"}) | Default event-driven lifecycle gate. | Standard worker completion, especially when you already have the agent_id. | | 3. get_agent_state({agent_id}) | Good current snapshot of registry state + parsed screen. | Quick checks without a full raw read. | | 4. read_agent_output / read_screen | Best adjudicator when parser and pane disagree. | FR-06, output extraction, or manual troubleshooting. | | 5. list_agents / my_agents | Discovery only. | "What is alive?" not "is this task complete?" |

Visual Proof / Screenshots

When Etan asks to see something, deliver a Computer Use screenshot. read_screen is text inspection, not a screenshot. After interactive probes (typing into a pane, reconnecting MCPs, choosing a model/menu option), screenshot proactively when the result is visual or user-facing.

Before pressing Enter in any TUI menu, verify the highlighted row first. Use a Computer Use screenshot when the user needs to see the state; use read_screen only when the terminal text clearly exposes the selected row. If the selection is ambiguous, stop and inspect instead of pressing Enter blindly.

The file-based completion pattern (copy this)

Every autonomous worker task must end with a file write and a DONE marker:

# In the worker's prompt:
Write your report to /path/to/output/batch-WORKER.md. The last line of the
file must be exactly: DONE_WORKER_NAME

Orchestrator side, poll the file(s) until all expected outputs exist and each contains its DONE marker:

# run_in_background: true
until [ -f "$PLAN/batches/batch-M1.md" ] && grep -q DONE_MINER_M1 "$PLAN/batches/batch-M1.md" 2>/dev/null \
      && [ -f "$PLAN/batches/batch-M2.md" ] && grep -q DONE_MINER_M2 "$PLAN/batches/batch-M2.md" 2>/dev/null \
      ; do
  sleep 30
done
echo ALL_MINERS_DONE

When the background command completes, you know every miner finished AND wrote a real file (not a partial crash). This survives cmux state-sync bugs, splash-screen false-idles, and pane freezes.

FR-06 fallback when parser and screen disagree

If wait_for times out or send_to_agent rejects with current state: booting, do this in order:

get_agent_state({agent_id})
read_screen(surface: "...", lines: 20) on the linked surface
If the raw pane shows a prompt, trust the pane and file a cmuxlayer bug
If work is blocked, use one surface-level send_input + send_key("return") fallback, then return to the agent_id flow as soon as the registry catches up

The rule is not "surface sends are normal again." The rule is: agent_id is the source of truth; raw pane sends are an escape hatch for parser drift.

STOP — Before Anything Else

1. Spawn workers with mcp__cmux__spawn_agent. No hand-rolled repoGolem typing unless FR-01 forces a temporary launcher fallback.

2. Store the agent_id immediately. Every follow-up, wait, stop, and handoff should key off the durable id.

3. Re-discover live workers with list_agents / my_agents after crashes or layout changes. Do not trust a remembered surface number.

4. AGENT_REGISTRY — maintain after CLAUDE_COUNTER in every response with active agents:

AGENT_REGISTRY:
| Agent ID | Surface | Repo | Task | Status | Last Check |
|----------|---------|------|------|--------|------------|
| agent:abc123 | surface:153 | golems | Digest failures | WORKING | 12:35 |

Add on spawn. Update on check. Remove on kill.

MCP Primitives (use these for low-level ops)

| Operation | MCP Tool | Notes | |-----------|----------|-------| | Spawn worker | mcp__cmux__spawn_agent | Default for visible Claude/Codex/Cursor/Gemini/Kiro peers | | Send follow-up | mcp__cmux__send_to_agent | Replaces send_input + send_key for normal agent messaging | | Wait for state | mcp__cmux__wait_for | Replaces client-side poll loops | | Discover workers | mcp__cmux__list_agents / mcp__cmux__my_agents | Use after topology changes; agent_id survives surface drift | | Inspect state | mcp__cmux__get_agent_state | Registry + parsed screen snapshot | | Stop worker | mcp__cmux__stop_agent | Replaces closing a pane to kill a worker | | Read raw pane | mcp__cmux__read_screen | Fallback for FR-06 or output extraction | | Extract marker output | mcp__cmux__read_agent_output | Preferred when you set ---OUTPUT_START--- / ---OUTPUT_END--- | | Create non-agent tab | mcp__cmux__new_surface | Browsers, shell scratch pads, log tails | | Rename / status / progress | mcp__cmux__rename_tab, mcp__cmux__set_status, mcp__cmux__set_progress | UI polish for the visible pane | | Open browser | mcp__cmux__browser_surface | Non-terminal surfaces | | Surface-level escape hatch | send_input, send_key | Only when parser drift blocks the agent channel |

Use spawn_agent for full worker lifecycle. Keep raw surface tools for non-agent panes and FR-06 recovery.

Response Schemas (PR #76, 2026-04-17 — `list_surfaces`)

Why this shipped: An earlier mining sweep of 10 orcClaude sessions found list_surfaces burned 117,000 tokens across 49 calls (avg 2,387 / max 8,333 per call). Root causes: a duplicate bug (surfaces in workspace:N appeared N times in surfaces[]) and per-entry bloat (screen_preview 51%, UUIDs 15%, full remote blob 11% even for local-only workspaces). PR #76 fixed both: deduped surfaces[] unconditionally, and condensed the default response with opt-in backward compatibility via verbose: true. Default payload is ~89% smaller than the pre-PR shape.

Default response (no verbose):

{
  "ok": true,
  "workspaces": [
    {
      "ref": "workspace:1",
      "title": "🎯 orcClaude",
      "current_directory": "/Users/etanheyman/Gits/brainlayer",
      "remote_state": "local"
    }
  ],
  "surfaces": [
    {
      "ref": "surface:35",
      "title": "orcClaude",
      "type": "terminal",
      "workspace_ref": "workspace:1"
    }
  ]
}

remote_state values:

"local" — no SSH/proxy/daemon hints; ordinary macOS workspace. The common case.
"connected" — SSH session is up and connected.
"disconnected" — remote configured but currently offline.
"unavailable" — remote partially configured but daemon/proxy state is missing.

Pass verbose: true to restore the full historical schema — workspace id (UUID), index, pinned, selected, listening_ports, full remote blob (daemon / proxy / heartbeat / ports), plus every per-surface field (id, pane_id, index_in_pane, selected_in_pane, focused, window_ref, pane_ref). Dedup is still applied in verbose mode — it's a correctness fix, not opt-in.

When to pass verbose: true:

Debugging SSH workspace state (need daemon / proxy / heartbeat detail).
Port-forwarding workflows (need listening_ports / forwarded_ports).
UI layout reasoning that needs focused / selected_in_pane / index.
Migration testing against code that still reads old fields.

When the default is enough (99% of agent work):

Routing: "which surface is worker X on?" → need ref + title + workspace_ref.
Layout checks: "are there 2 panes in this workspace?" → need ref count.
Identifying a target for send_input / read_screen → only need surface:N ref.
Status reports: "list all active workers" → ref + title suffice.

Pre-PR gotcha now fixed: workspace_ref is no longer echoed at the top level of the response unless you explicitly passed workspace: "workspace:N" as a filter. Callers relying on the top-level echo must either pass a filter or read it per-surface.

spawn_agent crash_recover + session auto-capture (PR #77, 2026-04-17)

Why this shipped: When a worker's PTY died unexpectedly (shell exit, mac crash, cmux daemon restart), the agent record was orphaned with no way to resume. PR #77 adds an opt-in recovery loop driven by boot-time session-ID capture.

New crash_recover: boolean param on spawn_agent (default false). When true:

Boot capture window — for the first 30 seconds after spawn, the engine scans up to 80 lines of terminal output on each sweep tick for a UUID pattern ([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}). This is what every major CLI prints as its session header (claude --session-id, codex session, cursor agent --session). First UUID seen wins — the engine does not currently match against CLI-specific context markers, so any earlier UUID in boot output (e.g., a log line printing a workspace UUID, a trace ID, or a dependency version string) can steal the slot. Engine persists the first-seen UUID as cli_session_id on the agent record.
Crash detection — when the PTY dies with a recoverable error (surface disappeared, shell exited unexpectedly) AND crash_recover=true AND user_killed !== true AND cli_session_id is set, the next sweep attempts respawn.
Respawn path — new surface created, same launcher with the per-CLI resume command appended: <repo>Claude -s --resume <id> / <repo>Codex -s resume <id> / cursor agent --session <id>. Up to MAX_RESPAWN_ATTEMPTS = 10 attempts before giving up.
user_killed guard — if the user explicitly killed the agent via stop_agent (userInitiated=true, the default), crash_recover will NOT respawn it. Context-limit evictions pass userInitiated=false, keeping the agent eligible for recovery.

When to pass crash_recover: true:

Long-running Codex workers on PR loops (>30 min expected).
Cursor audit workers that need to survive a cmux daemon restart.
Claude workers executing multi-hour sprints where mac-crash resilience matters.
ANY worker in an overnight / autonomous context where you won't be watching.

When NOT to pass it:

Short-lived helpers (<5 min) where a crash means "task failed, start over".
Workers in ambiguous session-ID formats (older CLIs without UUID headers).
Debugging sessions where a crash IS the signal to investigate.

Caveat — the session-ID heuristic is regex-based. If a worker's early output happens to contain a UUID that isn't the CLI session (e.g., a log line printing a workspace UUID, a trace ID, a dependency version), the wrong ID gets captured and recovery will silently fail later.

Verification procedure for critical spawns:

Within 60 seconds of spawn: get_agent_state(agent_id: "...") → check cli_session_id is not null.
Cross-check against the CLI's own session display via read_screen:
- Claude: look for Session: or --session-id in the header.
- Codex: look for session in the splash / first turn.
- Cursor: look for --session in the boot log.
If cli_session_id does not match the on-screen session — or if it's null after 60s (capture window expired) — call stop_agent(userInitiated: false) to keep recovery eligibility, then respawn fresh and re-verify.
For autonomous / overnight workers: fold this check into your spawn script and fail loudly on mismatch. A wrong cli_session_id means crash_recover will respawn with the wrong --resume argument and the new session won't have your context.

Related new agent-record fields (visible via get_agent_state / list_agents):

cli_session_id — captured UUID, or null if capture window expired before a match.
respawn_attempts — count of recovery attempts made (0 to 10).
user_killed — true if user explicitly stopped via stop_agent.
workspace_id — owning workspace UUID (persisted for cross-session recovery).

Distinct from the -c continue flag on repoGolem launchers. -c resumes an exited session that the user re-launches manually. crash_recover handles unexpected PTY death mid-task, automatically. Use -c when you stop a worker and want to come back to it later; use crash_recover: true when you want the engine to auto-recover without your intervention.

Spawning Agents — launchers only, NEVER a model param (2026-06-05)

All cmux agent sessions inherit their model from the repoGolem launcher. Claude LEADs run on Opus 4.8 1M, pinned by the repoGolem launcher (golems #446). Spawning them any other way breaks the pin:

orc spawns Claude LEADs via proper repoGolem launchers — the focus-first bundle (new_split → send_command("<repo>Claude -s") → focus → read_screen verify model + context_window). NOT via spawn_agent with a model param.
NEVER pass model to cmux spawn_agent for visible pane workers. The param overrides launcher pins and can silently downgrade the pane. If a different model is genuinely needed, fix the launcher or use the CLI's in-session model switch; never the spawn param. Exception (stated once in /agent-routing AP11): internal ephemeral subagents spawned inside another agent session (Task tool / in-session child) may request Spark via explicit -m gpt-5.3-codex-spark or equivalent model field — that is NOT a cmux pane worker spawn.
Always verify post-boot: read_screen parsed model + context_window must show the 1M pin (1,000,000), not the 200K default. Wrong → fix with /model before sending the mission.
LEADs run their own monitor loops on their workers (see /agent-routing LEAD TOPOLOGY) and communicate via metacommlayer collabs (dispatch_to_agent / inbox_check + collab files), not ad-hoc pane sends.

Spawning Agents (MANDATORY: use `spawn_agent`)

mcp__cmux__spawn_agent({
  repo: "golems",
  cli: "codex",
  prompt: "Fix search ranking"
})
→ { agent_id, ... }

mcp__cmux__wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
mcp__cmux__send_to_agent({ agent_id, text: "Narrow scope to ranking only", press_enter: true })
mcp__cmux__wait_for({ agent_id, target_state: "done", timeout_ms: 1800000 })

Options: repo, cli, prompt, workspace, parent_agent_id

Do not pass model params. The repoGolem launcher is the source of truth.

Examples:

spawn_agent({ repo: "golems", cli: "claude", prompt: "Survey search regressions" })
spawn_agent({ repo: "golems", cli: "cursor", prompt: "Audit code quality and write findings to /tmp/audit.md" })
spawn_agent({ repo: "orchestrator", cli: "gemini", prompt: "Survey patterns and summarize in 5 bullets" })
spawn_agent({ repo: "golems", cli: "kiro", prompt: "Draft a concise troubleshooting note" })

Hyphenated repo names: if the repo contains -, verify the real repoGolem launcher in /repogolem first (FR-01). The doc warning is there because cmuxlayer can still mis-resolve the launcher name.

Auto-workspace-categorization by launcher root

When spawning a visible pane via mcp__cmux__new_split + sending a launcher command, the pane MUST land in the workspace whose name contains (case-insensitive substring) the launcher's repo root. Examples:

coachClaude -s → workspace whose title contains "Coach"
orcClaude -s → workspace whose title contains "orc" (or workspace:2 by default)
voicelayerCodex -s → workspace whose title contains "voicelayer"
brainlayerCodex -s → workspace whose title contains "brainlayer" or "orc-buddy"

Lookup protocol BEFORE new_split:

Identify the launcher (coachClaude, orcClaude, etc.).
Extract the repo root (coach, orc, voicelayer, brainlayer).
Call mcp__cmux__list_surfaces() to enumerate live workspaces.
Pick the workspace whose title contains the repo-root substring (case-insensitive).
Pass that workspace ref to new_split.
After spawn, verify via list_surfaces that the new surface ended up in the intended workspace. If not, IMMEDIATELY call mcp__cmux__move_surface(surface, workspace) to fix.

Why post-spawn verify is mandatory: the current new_split MCP call's workspace arg is advisory — actual placement may follow current focus, not the arg. Until cmux MCP enforces the arg, agents MUST verify + move post-spawn. Skipping the verify step is how the live 2026-05-17 incident happened.

Evidence: Live 2026-05-17 ~02:32 IDT — skillCreator s:3 spawned 8 Batch D eval panes via new_split(workspace:"workspace:1"); 4 coach-launcher panes landed in workspace:2 anyway. orc had to manually move_surface to recover. Cost: ~5 minutes of orc context burning on layout repair instead of dispatch.

spawn_agent({workspace}) already routes correctly in most cases — this rule specifically covers the lower-level new_split path used when you need a non-agent terminal or a launcher invocation that isn't a cli enum value.

NEVER Use These Commands (Common Mistakes)

Root cause (April 5 2026): mehayomClaude used cursor-cli in a cmux pane — command not found. Wasted a surface and debugging time.

| WRONG | RIGHT | Why | |-------|-------|-----| | cursor-cli "prompt" | cursor agent --trust "prompt" | cursor-cli does not exist as a command | | cursor --print --output-format text | cursor agent --output-format text | --print is not a cursor flag | | cursor agent --output-format (in cmux) | cursor agent --trust (in cmux) | Interactive cmux agents need --trust for permissions, not --output-format text which is for piped/batch output | | cursor "prompt" (bare) | <repo>Cursor -s launcher | Launcher functions handle cd, env, permissions | | <repo>Codex -s -p "prompt" | <repo>Codex -s interactive pane, then send prompt | repoGolem -p/print mode is not for agent sessions or verification gates |

For visible Cursor agents: Use spawn_agent({cli:"cursor"}) from the parent orchestrator. These workers should boot as addressable agent_ids, not as manually typed launchers. For batch/piped output: Use cursor agent --output-format text --model "<cursor-max-mode-model>" "PROMPT" directly. Substitute Cursor's current Max Mode model ID (verify via cursor agent --help or the Cursor changelog — IDs change between versions). Default to no --model flag (Auto) per /agent-routing AP3. For repoGolem launchers: Use <repo>Cursor -s (e.g., golemsCursor -s, brainlayerCursor -s). The -s flag skips permissions.

Cursor `/auto-run` fallback

Default path: put the task in spawn_agent.prompt and talk to the worker via send_to_agent({agent_id,...}).

If a live Cursor pane still stalls on approvals, send /auto-run as a follow-up first:

wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
send_to_agent({ agent_id, text: "/auto-run", press_enter: true })
send_to_agent({ agent_id, text: "<task prompt>", press_enter: true })

If FR-06 blocks send_to_agent, confirm the raw pane with read_screen and use one surface-level fallback send.

Task Routing

| Need | Best CLI | Why | |------|----------|-----| | Deep reasoning, multi-file | Claude | Best reasoning, MCP, native worktrees | | Codebase-wide audit | Cursor | @codebase indexing, text output | | Fast structured output | Codex | Output contracts | | Large context research | Gemini | 1M tokens, free | | Quick PRs | Codex | Fast, low overhead | | Big refactors | Claude | --worktree, session resume | | Cross-model verification | Claude + Cursor | Different blind spots |

Agent Selection Guide

Use /agent-routing for who should do the work, and /repogolem for the exact launcher flags. This table is the cmux placement/CLI quick reference:

| Task | Launcher form | Notes | |------|---------------|-------| | Plan auditing, perspectives | {repo}Cursor -s | Read-only gather; default/Auto model, no model flag | | Code review, codebase analysis | {repo}Cursor -s | Uses codebase context; verify findings before acting | | Quick research, comparisons | {repo}Gemini -s | Large context and visual/OCR strength | | Parallel implementation | {repo}Codex -s | Default Codex model; override only when asked | | Collab agent writing to collab file | {repo}Claude -s | Use Sonnet for audit collabs when appropriate | | Deep reasoning, architecture | {repo}Claude -s | Opus/default only when the task needs it |

Task type routing (R28 — invoke /agent-routing):

| Task Type | Route To | NOT To | |-----------|----------|--------| | SQL queries, grep, file scanning | Cursor (read-only) | Claude (burns context) | | Code changes, bug fixes, PRs | Codex | Cursor (read-only!) | | Orchestration, decisions, monitoring | Claude | Cursor or Codex |

When spawning agents for a collab: assign Cursor workers for data gathering and Codex workers for implementation. The Claude agent coordinates. If a Claude agent is burning context on SQL or grep work — that's a routing violation.

Full CLI syntax and capabilities: adapters/ directory + adapters/capabilities.yaml.

Post-Restart Truth-vs-Display (2026-06-06)

After ANY cmux restart (daemon bounce, Mac wake, manual relaunch), every lead follows the proven checkpoint pattern:

checkpoint (record agent_ids + last-known state)
  → restart event
  → VERIFY (list_agents + read_screen per worker)
  → report liveness FROM EVIDENCE ONLY

Rules:

list_agents / registry alone is not enough — pair with read_screen scrollback on each worker you report as alive.
Pre-restart claims ("Codex s:63 still working") are stale until re-verified. Operator-direct catch: "I don't see any worker of yours working still."
Status reports to collab/Etan must cite what read_screen showed, not what you remembered before the restart.
Claims-lag window is real (fleet may repopulate over minutes) — say "unverified post-restart" until VERIFY completes.

Worker-Brief Authoring Contract (2026-06-06)

Every dispatch brief MUST satisfy all three before send (see workflows/prompt-audit.md §8):

Absolute verified paths — inline the full path to every artifact the worker must read, especially the collab file. Never "collab file above" with no path; workers lose orientation without it.
Truthful environment — state what exists (worktree path, branch, deps, Playwright, .mcp.json). If the worker must create something, say "create it yourself" with the exact command — never imply a promised path/deps exist when they don't.
Live data contracts — before describing schemas, field names, or queue shapes in the brief, verify them against the live artifact (Read / grep / query). Five brief-content drift incidents in one evening came from stale or invented data shapes.

Pre-Spawn Prompt Checklist

Every agent task prompt MUST include:

Max output length ("2-3 sentences" / "under 200 words")
Output format ("use this template: ...")
Audience — internal or client-facing (agents leak investigation details if not told)
What NOT to include — internal deliberation, false alarms
Done signal ("end with DONE_SIGNAL_NAME on its own line, right before CLAUDE_COUNTER")
Response markers — "Wrap your final deliverable in ---RESPONSE_START--- and ---RESPONSE_END--- markers"

Full audit: workflows/prompt-audit.md

Prompt Size Ceiling — `send_input` freezes surfaces above ~2000 chars

Why this exists: A 72-hour JSONL sweep (2026-04-12 → 2026-04-15) across 5 Claude Code orchestrator sessions found that send_input calls over 2,000 characters correlated strongly with surface freezes. In one Mehayom-app session: 868 total send_input calls, 13 exceeded 2,000 chars (max 4,382), and those 13 large calls preceded 6 frozen / surface-unresponsive incidents. In a control session (coach repo) every send_input stayed under 1,900 chars and there were zero freezes. The coach control case is the proof: discipline eliminates the symptom.

Direct evidence (from the Mehayom-app session):

"You're right — the cmux surface was frozen, my send_input commands weren't taking effect (sent 4 commands, none showed up). I burned ~3 minutes trying before deciding to implement directly rather than waste more time debugging cmux."

The rule (hard cap + warn)

| Payload length | Action | |----------------|--------| | < 1,500 chars | Send directly. Safe zone. | | 1,500–1,800 | Warn threshold. Trim if possible, then send. | | ≥ 1,800 | HARD CAP. Do NOT send_input directly. Use the file-based handoff pattern below. |

1,800 chars gives a ~17% safety margin under the smallest observed freeze point (~2,100 chars). Above the cap, surfaces freeze silently — send_input still returns ok:true but the target pane never sees the input.

File-based handoff pattern (for prompts > 1,800 chars)

Three steps. The first one is a shell command you run via the Bash tool. The next two are cmux MCP calls.

Step 1 — write the long prompt to disk via Bash (or the Write tool):

printf '%s' "$LONG_PROMPT" > /Users/etanheyman/Gits/orchestrator/collab/surface-N-$(date +%s).md

Prefer the Write tool when the prompt is already in-context — it avoids shell quoting pitfalls on multi-line prompts.

Step 2 — type the cat command into the target surface, then press Return:

send_input(surface: "surface:N", text: "cat /Users/etanheyman/Gits/orchestrator/collab/surface-N-<stamp>.md")
send_key(surface: "surface:N", key: "Return")

send_input on its own only TYPES the text — without send_key("Return") the command sits on the shell line unexecuted. This is the same execution contract used everywhere else in this skill.

Step 3 — verify the handoff landed: after a few seconds, read_screen the surface and confirm you see the file contents (or the agent's response to them). If the pane is frozen despite the short pointer, the file is still safe on disk — spawn a fresh surface and re-run Step 2 against the new surface.

Why the file pattern beats splitting the prompt into multiple sends:

Splitting into N sub-2000 chunks still hits the freeze path if any chunk pushes the pane state over some cumulative limit, and loses atomicity (the agent sees a partial prompt).
The file is independent of the pane's mutable state. If the pane freezes, spawn a new one and re-run the cat — the prompt is preserved.
File handoff also survives pane crashes, restarts, and compactions.

On a failed send_input: check the surface before retrying

A retry without root-cause investigation is almost always wrong. After any send_input that appears to have been lost (no output, no activity, retry instinct kicking in):

read_screen(surface: "surface:N", lines: 10) — is the pane frozen, scrolled back, or just slow?
If frozen → follow Step 3 of Surface Health Check (close, respawn, salvage).
If the payload you just sent was near or above the 1,800-char cap → that was the root cause. Don't retry at the same size; switch to file-based handoff.
If unsure → list_agents / my_agents to confirm the worker still exists.

Do not fire a second send_input of the same large payload hoping it "works this time." It won't, and you'll burn the same 3 minutes the earlier session logged.

Boot + deliver: FOCUS-FIRST (the reliable bundle for "send the prompt once booted")

Root cause (Etan, recurring — 2026-05-30): a cmux pane/agent does NOT initiate until its workspace/pane is FOCUSED. So boot_prompt_path and a bare wait_for({target_state:"ready"}) on an unfocused pane NEVER RESOLVE — they hang on a ready-state that can't arrive because the agent hasn't started. Etan: "It never fucking resolves — focus the pane for 3 seconds, then read, then if ready send the prompt." Do not lean on boot_prompt_path / blind long wait_for to deliver a boot prompt. Focus is the missing precondition.

The reliable bundle (proven spawning orc-gen-12 on 1M, 2026-05-30):

1. CREATE the pane:   new_split({direction, role, workspace, focus:true})  → surface
                      (focus:true is REQUIRED — role-based tab placement rejects focus:false)
2. LAUNCH (no boot_prompt_path):  send_command({surface, command:"<repo>Claude -s [-m '<model>']"})
                      (quote any model with brackets, e.g. -m 'claude-opus-4-8[1m]' — zsh globs [1m] otherwise)
3. FOCUS so it initiates:  select_workspace({workspace})  +  `cmux focus-pane --pane <pane>`
4. WAIT ~3s, then READ:  read_screen({surface, parsed_only:true})
                      → confirm status ready/idle/working AND verify `model` / `context_window`
                        (catch the L4 default: 200K vs the 1,000,000 you asked for)
5. IF READY, deliver:  send_command({surface, command:"<the prompt>"})  — pane is focused, it lands.

Verify the model in step 4, not just readiness. A launcher can boot Opus 4.8 at the 200K default when you wanted [1m] (1,000,000). read_screen parsed model
- context_window is the gate; if wrong, fix with /model in-session before sending.
boot_prompt_path is for already-focused/foreground spawns only. When you're driving from another pane, it will hang — use this focus-first bundle instead.
Upstream fix (route to cmuxLayer-LEAD): boot_prompt_path / wait_for should auto-focus the target pane before waiting for readiness — then this 5-step bundle collapses back into a single reliable primitive. Until then, focus-first is mandatory.

Composer-Wedge Runtime Doctrine (2026-06-06)

Why this exists: boot_prompt_delivered, submit_verified, working-status, and token_count have all returned false positives — including submit_verified:true on a Claude pane with unsubmitted text still sitting after the › prompt marker. Doc edits alone did not stop this class: seven observer-window catches occurred all post-#478-merge (tallies vary 3/4/5/7 across observers — no canonical ledger). This section is mitigation doctrine; the cure is the cmuxlayer composer-is-empty code fix (dispatched separately as D6).

Untrusted submit signals — treat ALL of these as hints only, never ground truth:

boot_prompt_delivered
submit_verified
parsed working / idle / ready status from registry
token_count (including phantom climbing counts on never-started sessions)

The only reliable submit check: prompt text visible in SCROLLBACK above the working line. Text still sitting after the › prompt marker = unsubmitted, regardless of what any delivery field says.

Mandatory post-dispatch read: read_screen (or get_agent_state + scrollback read on dispute) ≤15 seconds after EVERY dispatch — boot prompt, follow-up, or send_to_agent.

Anomaly triggers — force a full scrollback read:

'Queued follow-up inputs' visible on a worker that registry says is working
Token count unchanged across two reads after a send
Cost/token meter not moving while status says working

Idle Codex sends: when the pane is genuinely idle (at › / $ prompt), use send_command (atomic) or send_input + verified status flip — never fire-and-forget without the ≤15s read.

Surface Health Check (MANDATORY — before ANY prompt delivery)

Why this exists: Frozen/dead surfaces are a top-3 frustration source. Agents send prompts to unresponsive surfaces, losing work and burning ~3 minutes per incident on manual fallback. send_input returns ok:true even on frozen terminals.

Step 1 — After spawning a worker: wait_for({agent_id, target_state:"ready", timeout_ms:120000}) is the default health gate. If it times out, inspect the raw pane with read_screen. Two failures = dead worker — stop it and respawn. If wait_for hangs to timeout and the pane looks un-booted, the pane is probably UNFOCUSED — see "Boot + deliver: FOCUS-FIRST" above; focus it, then re-check.

# After spawn_agent → got agent_id + linked surface
wait_for({ agent_id: "agent:abc123", target_state: "ready", timeout_ms: 120000 })
# ✅ Ready/working state = worker booted
# ❌ Timeout: read_screen(surface: "...", lines: 10) to check for FR-06 parser drift

Step 2 — Before sending ANY prompt to an existing surface: Always read_screen first to confirm the surface is responsive. Never fire-and-forget a send_input without checking state first.

# Before sending prompt to surface:N
read_screen(surface: "surface:N", lines: 5)
# ✅ Responsive: shows shell prompt, cursor output, or agent activity
# ❌ Unresponsive: blank screen, no change from last check, or error output

Step 3 — Unresponsive surface recovery: If a surface fails 2 consecutive read_screen checks (blank, frozen, or no shell prompt):

read_screen(surface: "surface:N", lines: 80, scrollback: true) — salvage any partial work
brain_store any salvaged progress
stop_agent({agent_id})
spawn_agent({...same task...}) — create a fresh worker
Re-verify with Step 1 before sending the prompt

Step 4 — Safe prompt delivery (special character escaping): Never send special characters (backticks, quotes, markdown formatting) directly via send_input. They get interpreted by the shell and corrupt the prompt.

# WRONG — backticks and quotes break in send_input:
send_input(surface: "surface:N", text: "Fix the `processQueue` function")

# RIGHT — use heredoc pattern:
send_input(surface: "surface:N", text: "cat <<'PROMPT_EOF' | clipboard\nFix the processQueue function\nPROMPT_EOF")

# RIGHT — escape special characters:
send_input(surface: "surface:N", text: "Fix the \\`processQueue\\` function")

# SAFEST — write prompt to a temp file, then cat it:
# 1. Write prompt to /tmp/agent-prompt-N.txt via Bash
# 2. send_input: "cat /tmp/agent-prompt-N.txt"

Checklist (run mentally before every send_input):

[ ] Did I read_screen this surface in the last 60 seconds?
[ ] Does it show a shell prompt or active agent?
[ ] Does my prompt contain backticks, quotes, or markdown? → escape or heredoc
[ ] Is this a fresh surface? → did I wait 3s and verify the prompt?

Parsing Agent Output

When reading agent output via read_screen, search for ---RESPONSE_START--- to find the structured response. Everything between START and END is the deliverable — ignore terminal noise, tool calls, and deliberation outside the markers.

# Pattern: read enough scrollback to capture the full response
read_screen(surface: "surface:N", lines: 80, scrollback: true)
# Then look for ---RESPONSE_START--- ... ---RESPONSE_END--- in the output

If markers are missing, fall back to reading the last 50 lines + done signal. But if you wrote the prompt correctly (checklist above), markers will be there.

Monitoring Protocol

"I'll monitor them" is NOT monitoring. Monitoring = explicit waits on explicit agent_ids.

This applies at EVERY tier: each LEAD that dispatches a worker owns a monitor loop on that worker. orc's fleet monitor is a backstop for lead-busy/codex-idle inversion, not a substitute (gen-10 weave #26).

After spawning:

Update AGENT_REGISTRY
wait_for({agent_id, target_state:"ready"|"working", timeout_ms:120000}) to verify boot
For long tasks, require an output file + DONE marker and wait on wait_for({agent_id, target_state:"done"})
After any topology change or crash, re-run list_agents / my_agents before sending follow-ups — surface ids drift
If wait_for or get_agent_state disagrees with the visible pane, use read_screen to adjudicate FR-06
When the agent finishes, read output IMMEDIATELY — don't wait for the user to ask
After system events (Mac wake, BrainBar restart, network change, ANY cmux restart): inspect active workers via list_agents + read_screen on every worker you intend to report on. Agents can lose MCP silently. Never carry pre-restart liveness claims forward — see Post-Restart Truth-vs-Display below.

Worker utilization check (R28 enforcement): During monitoring, also check worker utilization:

Claude context >50% AND its Cursor/Codex worker has 0 tokens → ROUTING VIOLATION
Action: nudge Claude to delegate remaining data work to its idle worker
Worker surface gone (crashed/closed) → respawn immediately on new surface + resend task
After sprint: if Claude did >30% of data gathering itself → flag for process improvement

Anti-patterns:

Fire and forget — spawning without wait_for. You WILL miss failures.
Checking only when user asks — agent may have been stuck 20 min.
Not reading output files — agent finished, you never read results.
Using Task tool when user says "cmux agents" — Task agents are invisible. cmux agents are VISIBLE. The user wants to SEE them.
Sending task prompt to Cursor without /auto-run first — worker silently stalls on permission prompts.
Sending follow-ups to remembered surface numbers instead of agent_ids — surface drift will eventually burn you.
Claude doing SQL/grep while Cursor worker sits idle — route to Cursor (R28).

Envelope-vs-Delivery Pairing (MANDATORY)

Why this exists: orphan envelopes — [FROM=X TO=Y TYPE=Z] blocks written to the author's OWN pane and never actually delivered to Y — are a top friction source in multi-agent collabs. 64 orphan envelopes were observed in one Codex session (wave3-codex-bulk Block B). Verbatim user friction: "Why do you have these messages in your chat but no one enters them?" Recurred across Wave 1-3 (3+ logged occurrences).

The rule

Any [FROM=<self> TO=<target> TYPE=<status|mission|task_done|ack>] envelope block you emit to your own pane output MUST be paired with a mcp__cmux__send_to_agent({agent_id: <target>, text: ...}) or mcp__cmux__send_to(...) call in the SAME turn.

Rule of thumb: if you wrote TO=X in plaintext, you wrote it FOR X — so deliver it to X. An envelope in your own pane that wasn't sent is a message in a bottle, not communication.

Same-turn pairing pattern (copy this)

# 1. Compose the envelope you want X to see:
envelope = """
[FROM=orcClaude TO=coachClaude TYPE=task_done]
Audit finished, 14 findings, no commits.
[/FROM=orcClaude TO=coachClaude TYPE=task_done]
"""

# 2. SAME TURN — actually deliver it:
mcp__cmux__send_to_agent({
  agent_id: "agent:coach-...",
  text: envelope,
  press_enter: true
})

Both must appear in the same model turn. If you only emit step 1 (the plaintext envelope in your own pane), the message is undelivered — X never sees it.

Anti-patterns (these are the orphans)

| Anti-pattern | Why it fails | |---|---| | Writing [FROM=X TO=Y ...] in your pane "for the log" without send_to_agent | Y never sees it; orchestrator-side log is not a communication channel | | Splitting envelope and delivery across turns ("I'll send it next turn") | Forgetting is the default; next turn rarely happens | | Calling send_to_agent(text: "hi") after composing a FROM=/TO= envelope but sending only the bare message text | Recipient loses the FROM/TO/TYPE metadata the envelope was built to carry | | Emitting envelope to a CLAUDE_COUNTER summary instead of calling send_to_agent | The summary is yours, not the recipient's inbox |

Acceptance test

If your turn output contains a [FROM=...TO=...TYPE=...] block, your turn's tool calls MUST also contain a matching mcp__cmux__send_to_agent (or mcp__cmux__send_to) call whose agent_id corresponds to the TO= target and whose text includes the envelope contents. No pair → orphan envelope → rule violation.

Done Signals

Instruct agents to put the signal as the very last line before CLAUDE_COUNTER — not buried above a summary. Otherwise read_screen won't catch it.

Collab Pattern

Copy ~/Gits/orchestrator/collab/TEMPLATE.md first — never write a collab from scratch. The collab-guard.py hook WILL block you.

# 1. Write collab file from template
# 2. Spawn agents with collab instructions
for entry in "search:Agent1" "perf:Agent2" "security:Agent3"; do
  angle="${entry%%:*}"; name="${entry#*:}"
  spawn_agent({
    repo: "TARGET_REPO",
    cli: "claude",
    prompt: "Read collab/FILE.md — you are " name ". Claim " angle ". Update collab when done."
  })
done

Log every action in collab: spawns, completions, blockers. No silent work.

Git Worktree Isolation

Parallel agents MUST use separate worktrees — without them, agents clobber each other's git state.

Claude subagents: Agent(isolation="worktree") (built-in)
cmux agents: Create worktrees manually before spawning, then launch the worker via spawn_agent. See worktrees skill for details.
Codex apply_patch: when your assigned worktree differs from the session cwd, every apply_patch filename MUST be an absolute path inside the worktree. exec_command.workdir does not apply to apply_patch; relative patch paths resolve against the session cwd and can mutate the main checkout silently.

Pane Hygiene — harvest, review, close

For finished one-shots and worker panes, treat harvest → review → close pane as one sequence. Capture the output, confirm the task result, then close/stop the pane. Only panes hosting live processes stay open (for example a dev server, log tail, or active long-running worker), and the collab/status should say why that pane is still live.

Essential Rules

spawn_agent for every worker — no hand-rolled launch flow
Discover workers before acting — use list_agents / my_agents
NEVER read_screen your own surface — recursive output
Verify after launch — wait_for first, get_agent_state second, read_screen only for disputes
Workers = current-workspace splits, audits/research = separate named workspaces unless the user explicitly asks for same-workspace visibility
Sequential launch — brief spacing still helps on first-run biometric prompts
Cross-workspace: always pass workspace ref — surfaces are workspace-scoped
Name everything — mcp__cmux__rename_tab on every surface
Update collab after every action — spawns, completions, blockers
AGENT_REGISTRY is mandatory — maintains state across compaction
BrainLayer agents are sequential — stagger by 10s+ (SQLite = single writer)
wait_for on spawn, not client-side polling — set up monitoring immediately
Don't give hour-scale estimates for <500-line tasks — parallel agents are 8-10x faster
Before the user leaves, every active worker needs either wait_for coverage or a file-based completion contract — no silent "I'll monitor."
Gems to ALL active agents — when you receive new research/context while agents run, send to ALL active workers immediately via send_to_agent. Contextualize per agent: include a 1-line "why this matters to YOUR task" — raw paste is noise. Don't hoard, don't defer, don't send to only the "obvious" recipient.
Respawn > absorb — frozen agent → read_screen(lines:80) to salvage partial work → brain_store progress → stop_agent → spawn_agent → resend SAME task with "already completed: [X]" context. Exception: remaining work <5 min AND you log "ABSORBING: [reason]" to collab. Default is ALWAYS respawn.
Brief agents about external events — Mac restart, user away, network outage → explicitly tell agents. They can't infer from timestamps.
Post-sprint retrospective — after every multi-agent sprint, brain_store what failed, what was corrected, what user said. Tags: ["orchestration", "incident", "cmux-agents"]. Without this, future sessions start from zero.
Max 2-3 concurrent worker agents — more causes AP watchdog kernel panics (April 5 crash: 9+ agents + enrichment + metro bundler = SoC watchdog expired). Before spawning a new agent, count running agents via list_agents. If >=3 workers active, wait for one to finish or kill a stale one. Enrichment daemon + build tools count toward the limit.
Skills don't hot-reload in long sessions — a Claude agent spawned 4+ hours ago runs stale skill versions. If a skill was updated mid-sprint, the running agent won't see the changes. Mitigation: (a) /compact forces re-read of skills, (b) orcClaude should notify long-running agents of critical skill changes via send_to_agent, (c) for critical fixes, respawn the agent instead of nudging.
Collab file BEFORE spawning workers — when dispatching 2+ agents on related tasks, write a collab file from TEMPLATE.md FIRST, then spawn workers with "Read collab/<file>.md" in their prompt. Do NOT dispatch agents via remembered surface numbers or bare launcher typing. Collab files are the coordination channel — without them, agents work in isolation and duplicate effort.
Agent health check BEFORE every follow-up — never message a worker without a recent get_agent_state or wait_for on its agent_id. If the registry looks wrong, use read_screen as the FR-06 fallback.
No model params to spawns — launcher defaults are authoritative; stale params silently downgrade panes.
No --print/-p for agent sessions — all workers/leads/verification gates are interactive repoGolem launcher sessions.
Finished panes get closed — harvest, review, close. Only live processes stay.

Known Issues — cmux rename hooks (design proposal, do NOT edit without orcClaude review)

Background (2026-04-11): the cmux tab-rename auto-hook (launcher name → display name + color) has 5 observed bugs. Code changes are OUT OF SCOPE for skill-creator — this section is a design proposal for the next orcClaude session to dispatch.

| # | Bug | Symptom | Proposed fix | |---|---|---|---| | 5.1 | Flat tab colors | All tabs use the same (or default) color; agent type not visually distinguishable | Map launcher function → color in a dict (claude=blue, codex=orange, cursor=purple, gemini=green, kiro=red). Set via mcp__cmux__rename_tab(color=...) on spawn. | | 5.2 | Red-to-cyan weirdness | Some tabs flip from red (error/warning) to cyan unexpectedly; color state machine has a bad transition | Debug: instrument the rename hook to log every color-set call with timestamp + reason. Most likely cause: one code path sets color from agent state, another sets it from default, last write wins. | | 5.3 | Nested naming collapses | Tab names for agents spawned in worktrees or nested panes lose their parent context (e.g. "golemsClaude > feat-X" → "claude") | When generating the display name, walk the surface parent chain and prepend up to 1 level of context. Truncate via ellipsis if longer than the tab width budget. | | 5.4 | Weak semantic tag extraction | Tab names don't reflect what the agent is actually WORKING on (they just say "claude" instead of e.g. "claude: PR#232 fix") | Parse the task prompt on spawn — extract PR numbers (PR#\d+), issue refs (#\d+), and the first 3-5 imperative words. Fall back to launcher name if none found. | | 5.5 | Launcher-to-display-name mapping missing | Tab shows raw function name (golemsClaude -s) instead of a friendly display (golems • Claude) | Add a LAUNCHER_DISPLAY lookup table keyed on the base launcher pattern ({repo}Claude → {repo} • Claude, {repo}Cursor → {repo} • Cursor, etc.). Regex extract repo + agent. |

Where the hook lives: NOT located as of 2026-04-11 by skillCreatorBuddy. Not in ~/.claude/hooks/, not in ~/Gits/golems/hooks, not in ~/.config/cmux/settings.json. Likely candidates to check next session:

Swift cmux client source (may be built-in tab rename behavior, not a Python hook)
~/Gits/cmuxlayer/src/**/rename*
A LaunchAgent plist that watches cmux sockets
Part of the spawn_agent implementation / agent-registry wiring in cmuxlayer

Next action (for orcClaude, NOT for skill-creator):

Locate the actual rename hook source
Reproduce each bug in a safe surface
Fix behind a feature flag
A/B test against the current hook
Ship via /pr-loop

Until then: manually mcp__cmux__rename_tab + mcp__cmux__set_status after spawn, using the color/display-name conventions in the table above.

cmux-agents

Orchestration layer for AI agents in cmux panes. Low-level pane operations (splits, reads, sends) use cmux MCP tools — this skill handles the workflow on top.

Default Lifecycle (2026-04-19)

Visible worker peers now use the agent-based cmux MCP tools by default:

mcp__cmux__spawn_agent({repo, cli, prompt, workspace?, parent_agent_id?})
Capture the returned agent_id immediately and write it into your collab / AGENT_REGISTRY.
mcp__cmux__wait_for({agent_id, target_state, timeout_ms}) for lifecycle gates.
mcp__cmux__send_to_agent({agent_id, text, press_enter}) for follow-ups.
mcp__cmux__get_agent_state({agent_id}) for a combined registry + parsed-screen snapshot.
mcp__cmux__list_agents(...) or mcp__cmux__my_agents(...) whenever topology changes.
mcp__cmux__stop_agent({agent_id}) to stop or recycle a worker.

repoGolem Launcher Contract

Default path is still spawn_agent. When FR-01, non-agent terminals, or recovery flows force a manual launcher, use the repoGolem one true form:

{repo}{Tool} -s "prompt"

brainlayerCursor -s "audit X"      # gather/read-only
brainlayerCodex  -s "fix Y"        # implement
brainlayerClaude -s "coordinate Z" # orchestrate
brainlayerGemini -s "visual task"  # visual/OCR

Model + Headless Gates (2026-06-05)

repoGolem launchers are the only authoritative model source for visible cmux pane workers. Do not pass model to cmux spawn_agent, and do not bake stale model strings into launcher commands. Stale overrides silently downgraded real panes (including a Claude 1M-context pin falling back to 200K, and stale Codex strings while the pane was actually on a newer default). Internal ephemeral subagents: see /agent-routing AP11 (canonical model-param rule).
Agent sessions are interactive. Do not use --print, repoGolem -p, or other headless/print modes for workers, leads, or verification gates. Run a labeled interactive pane and harvest the result from the live session.

Worker vs Audit Placement

Name audit/research workspaces clearly, e.g. Audit: golems or Research: auth. Use same-workspace down splits only when the user explicitly asks to keep the audit visible next to the lead.

Current Caveats

FR-01 — hyphenated repo names can mis-resolve to the wrong launcher

FR-06 — state parser ambiguity (`booting` vs idle-with-spinner)

First-run Touch ID note

First launch of a newly signed launcher binary can still trigger a one-time Touch ID prompt. Treat this as a first-run caveat, not a primary orchestration strategy.

Completion Signals — file > wait_for > get_agent_state > read_screen

Why this exists: surface-based polling burned hundreds of read_screen calls and still missed real state transitions. The new default is event-driven waits on stable agent_ids, with raw screen reads reserved for ambiguity or output extraction.

Ranked reliability of completion signals (use the highest that fits the job):

Visual Proof / Screenshots

The file-based completion pattern (copy this)

Every autonomous worker task must end with a file write and a DONE marker:

# In the worker's prompt:
Write your report to /path/to/output/batch-WORKER.md. The last line of the
file must be exactly: DONE_WORKER_NAME

Orchestrator side, poll the file(s) until all expected outputs exist and each contains its DONE marker:

# run_in_background: true
until [ -f "$PLAN/batches/batch-M1.md" ] && grep -q DONE_MINER_M1 "$PLAN/batches/batch-M1.md" 2>/dev/null \
      && [ -f "$PLAN/batches/batch-M2.md" ] && grep -q DONE_MINER_M2 "$PLAN/batches/batch-M2.md" 2>/dev/null \
      ; do
  sleep 30
done
echo ALL_MINERS_DONE

When the background command completes, you know every miner finished AND wrote a real file (not a partial crash). This survives cmux state-sync bugs, splash-screen false-idles, and pane freezes.

FR-06 fallback when parser and screen disagree

If wait_for times out or send_to_agent rejects with current state: booting, do this in order:

get_agent_state({agent_id})
read_screen(surface: "...", lines: 20) on the linked surface
If the raw pane shows a prompt, trust the pane and file a cmuxlayer bug
If work is blocked, use one surface-level send_input + send_key("return") fallback, then return to the agent_id flow as soon as the registry catches up

The rule is not "surface sends are normal again." The rule is: agent_id is the source of truth; raw pane sends are an escape hatch for parser drift.

STOP — Before Anything Else

1. Spawn workers with mcp__cmux__spawn_agent. No hand-rolled repoGolem typing unless FR-01 forces a temporary launcher fallback.

2. Store the agent_id immediately. Every follow-up, wait, stop, and handoff should key off the durable id.

3. Re-discover live workers with list_agents / my_agents after crashes or layout changes. Do not trust a remembered surface number.

4. AGENT_REGISTRY — maintain after CLAUDE_COUNTER in every response with active agents:

AGENT_REGISTRY:
| Agent ID | Surface | Repo | Task | Status | Last Check |
|----------|---------|------|------|--------|------------|
| agent:abc123 | surface:153 | golems | Digest failures | WORKING | 12:35 |

Add on spawn. Update on check. Remove on kill.

MCP Primitives (use these for low-level ops)

Use spawn_agent for full worker lifecycle. Keep raw surface tools for non-agent panes and FR-06 recovery.

Response Schemas (PR #76, 2026-04-17 — `list_surfaces`)

Why this shipped: An earlier mining sweep of 10 orcClaude sessions found list_surfaces burned 117,000 tokens across 49 calls (avg 2,387 / max 8,333 per call). Root causes: a duplicate bug (surfaces in workspace:N appeared N times in surfaces[]) and per-entry bloat (screen_preview 51%, UUIDs 15%, full remote blob 11% even for local-only workspaces). PR #76 fixed both: deduped surfaces[] unconditionally, and condensed the default response with opt-in backward compatibility via verbose: true. Default payload is ~89% smaller than the pre-PR shape.

Default response (no verbose):

{
  "ok": true,
  "workspaces": [
    {
      "ref": "workspace:1",
      "title": "🎯 orcClaude",
      "current_directory": "/Users/etanheyman/Gits/brainlayer",
      "remote_state": "local"
    }
  ],
  "surfaces": [
    {
      "ref": "surface:35",
      "title": "orcClaude",
      "type": "terminal",
      "workspace_ref": "workspace:1"
    }
  ]
}

remote_state values:

"local" — no SSH/proxy/daemon hints; ordinary macOS workspace. The common case.
"connected" — SSH session is up and connected.
"disconnected" — remote configured but currently offline.
"unavailable" — remote partially configured but daemon/proxy state is missing.

When to pass verbose: true:

Debugging SSH workspace state (need daemon / proxy / heartbeat detail).
Port-forwarding workflows (need listening_ports / forwarded_ports).
UI layout reasoning that needs focused / selected_in_pane / index.
Migration testing against code that still reads old fields.

When the default is enough (99% of agent work):

Routing: "which surface is worker X on?" → need ref + title + workspace_ref.
Layout checks: "are there 2 panes in this workspace?" → need ref count.
Identifying a target for send_input / read_screen → only need surface:N ref.
Status reports: "list all active workers" → ref + title suffice.

spawn_agent crash_recover + session auto-capture (PR #77, 2026-04-17)

Why this shipped: When a worker's PTY died unexpectedly (shell exit, mac crash, cmux daemon restart), the agent record was orphaned with no way to resume. PR #77 adds an opt-in recovery loop driven by boot-time session-ID capture.

New crash_recover: boolean param on spawn_agent (default false). When true:

Boot capture window — for the first 30 seconds after spawn, the engine scans up to 80 lines of terminal output on each sweep tick for a UUID pattern ([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}). This is what every major CLI prints as its session header (claude --session-id, codex session, cursor agent --session). First UUID seen wins — the engine does not currently match against CLI-specific context markers, so any earlier UUID in boot output (e.g., a log line printing a workspace UUID, a trace ID, or a dependency version string) can steal the slot. Engine persists the first-seen UUID as cli_session_id on the agent record.
Crash detection — when the PTY dies with a recoverable error (surface disappeared, shell exited unexpectedly) AND crash_recover=true AND user_killed !== true AND cli_session_id is set, the next sweep attempts respawn.
Respawn path — new surface created, same launcher with the per-CLI resume command appended: <repo>Claude -s --resume <id> / <repo>Codex -s resume <id> / cursor agent --session <id>. Up to MAX_RESPAWN_ATTEMPTS = 10 attempts before giving up.
user_killed guard — if the user explicitly killed the agent via stop_agent (userInitiated=true, the default), crash_recover will NOT respawn it. Context-limit evictions pass userInitiated=false, keeping the agent eligible for recovery.

When to pass crash_recover: true:

Long-running Codex workers on PR loops (>30 min expected).
Cursor audit workers that need to survive a cmux daemon restart.
Claude workers executing multi-hour sprints where mac-crash resilience matters.
ANY worker in an overnight / autonomous context where you won't be watching.

When NOT to pass it:

Short-lived helpers (<5 min) where a crash means "task failed, start over".
Workers in ambiguous session-ID formats (older CLIs without UUID headers).
Debugging sessions where a crash IS the signal to investigate.

Verification procedure for critical spawns:

Within 60 seconds of spawn: get_agent_state(agent_id: "...") → check cli_session_id is not null.
Cross-check against the CLI's own session display via read_screen:
- Claude: look for Session: or --session-id in the header.
- Codex: look for session in the splash / first turn.
- Cursor: look for --session in the boot log.
If cli_session_id does not match the on-screen session — or if it's null after 60s (capture window expired) — call stop_agent(userInitiated: false) to keep recovery eligibility, then respawn fresh and re-verify.
For autonomous / overnight workers: fold this check into your spawn script and fail loudly on mismatch. A wrong cli_session_id means crash_recover will respawn with the wrong --resume argument and the new session won't have your context.

Related new agent-record fields (visible via get_agent_state / list_agents):

cli_session_id — captured UUID, or null if capture window expired before a match.
respawn_attempts — count of recovery attempts made (0 to 10).
user_killed — true if user explicitly stopped via stop_agent.
workspace_id — owning workspace UUID (persisted for cross-session recovery).

Spawning Agents — launchers only, NEVER a model param (2026-06-05)

orc spawns Claude LEADs via proper repoGolem launchers — the focus-first bundle (new_split → send_command("<repo>Claude -s") → focus → read_screen verify model + context_window). NOT via spawn_agent with a model param.
NEVER pass model to cmux spawn_agent for visible pane workers. The param overrides launcher pins and can silently downgrade the pane. If a different model is genuinely needed, fix the launcher or use the CLI's in-session model switch; never the spawn param. Exception (stated once in /agent-routing AP11): internal ephemeral subagents spawned inside another agent session (Task tool / in-session child) may request Spark via explicit -m gpt-5.3-codex-spark or equivalent model field — that is NOT a cmux pane worker spawn.
Always verify post-boot: read_screen parsed model + context_window must show the 1M pin (1,000,000), not the 200K default. Wrong → fix with /model before sending the mission.
LEADs run their own monitor loops on their workers (see /agent-routing LEAD TOPOLOGY) and communicate via metacommlayer collabs (dispatch_to_agent / inbox_check + collab files), not ad-hoc pane sends.

Spawning Agents (MANDATORY: use `spawn_agent`)

mcp__cmux__spawn_agent({
  repo: "golems",
  cli: "codex",
  prompt: "Fix search ranking"
})
→ { agent_id, ... }

mcp__cmux__wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
mcp__cmux__send_to_agent({ agent_id, text: "Narrow scope to ranking only", press_enter: true })
mcp__cmux__wait_for({ agent_id, target_state: "done", timeout_ms: 1800000 })

Options: repo, cli, prompt, workspace, parent_agent_id

Do not pass model params. The repoGolem launcher is the source of truth.

Examples:

spawn_agent({ repo: "golems", cli: "claude", prompt: "Survey search regressions" })
spawn_agent({ repo: "golems", cli: "cursor", prompt: "Audit code quality and write findings to /tmp/audit.md" })
spawn_agent({ repo: "orchestrator", cli: "gemini", prompt: "Survey patterns and summarize in 5 bullets" })
spawn_agent({ repo: "golems", cli: "kiro", prompt: "Draft a concise troubleshooting note" })

Auto-workspace-categorization by launcher root

coachClaude -s → workspace whose title contains "Coach"
orcClaude -s → workspace whose title contains "orc" (or workspace:2 by default)
voicelayerCodex -s → workspace whose title contains "voicelayer"
brainlayerCodex -s → workspace whose title contains "brainlayer" or "orc-buddy"

Lookup protocol BEFORE new_split:

Identify the launcher (coachClaude, orcClaude, etc.).
Extract the repo root (coach, orc, voicelayer, brainlayer).
Call mcp__cmux__list_surfaces() to enumerate live workspaces.
Pick the workspace whose title contains the repo-root substring (case-insensitive).
Pass that workspace ref to new_split.
After spawn, verify via list_surfaces that the new surface ended up in the intended workspace. If not, IMMEDIATELY call mcp__cmux__move_surface(surface, workspace) to fix.

Why post-spawn verify is mandatory: the current new_split MCP call's workspace arg is advisory — actual placement may follow current focus, not the arg. Until cmux MCP enforces the arg, agents MUST verify + move post-spawn. Skipping the verify step is how the live 2026-05-17 incident happened.

NEVER Use These Commands (Common Mistakes)

Root cause (April 5 2026): mehayomClaude used cursor-cli in a cmux pane — command not found. Wasted a surface and debugging time.

Cursor `/auto-run` fallback

Default path: put the task in spawn_agent.prompt and talk to the worker via send_to_agent({agent_id,...}).

If a live Cursor pane still stalls on approvals, send /auto-run as a follow-up first:

wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
send_to_agent({ agent_id, text: "/auto-run", press_enter: true })
send_to_agent({ agent_id, text: "<task prompt>", press_enter: true })

If FR-06 blocks send_to_agent, confirm the raw pane with read_screen and use one surface-level fallback send.

Task Routing

Agent Selection Guide

Use /agent-routing for who should do the work, and /repogolem for the exact launcher flags. This table is the cmux placement/CLI quick reference:

Task type routing (R28 — invoke /agent-routing):

Full CLI syntax and capabilities: adapters/ directory + adapters/capabilities.yaml.

Post-Restart Truth-vs-Display (2026-06-06)

After ANY cmux restart (daemon bounce, Mac wake, manual relaunch), every lead follows the proven checkpoint pattern:

checkpoint (record agent_ids + last-known state)
  → restart event
  → VERIFY (list_agents + read_screen per worker)
  → report liveness FROM EVIDENCE ONLY

Rules:

list_agents / registry alone is not enough — pair with read_screen scrollback on each worker you report as alive.
Pre-restart claims ("Codex s:63 still working") are stale until re-verified. Operator-direct catch: "I don't see any worker of yours working still."
Status reports to collab/Etan must cite what read_screen showed, not what you remembered before the restart.
Claims-lag window is real (fleet may repopulate over minutes) — say "unverified post-restart" until VERIFY completes.

Worker-Brief Authoring Contract (2026-06-06)

Every dispatch brief MUST satisfy all three before send (see workflows/prompt-audit.md §8):

Absolute verified paths — inline the full path to every artifact the worker must read, especially the collab file. Never "collab file above" with no path; workers lose orientation without it.
Truthful environment — state what exists (worktree path, branch, deps, Playwright, .mcp.json). If the worker must create something, say "create it yourself" with the exact command — never imply a promised path/deps exist when they don't.
Live data contracts — before describing schemas, field names, or queue shapes in the brief, verify them against the live artifact (Read / grep / query). Five brief-content drift incidents in one evening came from stale or invented data shapes.

Pre-Spawn Prompt Checklist

Every agent task prompt MUST include:

Max output length ("2-3 sentences" / "under 200 words")
Output format ("use this template: ...")
Audience — internal or client-facing (agents leak investigation details if not told)
What NOT to include — internal deliberation, false alarms
Done signal ("end with DONE_SIGNAL_NAME on its own line, right before CLAUDE_COUNTER")
Response markers — "Wrap your final deliverable in ---RESPONSE_START--- and ---RESPONSE_END--- markers"

Full audit: workflows/prompt-audit.md

Prompt Size Ceiling — `send_input` freezes surfaces above ~2000 chars

Why this exists: A 72-hour JSONL sweep (2026-04-12 → 2026-04-15) across 5 Claude Code orchestrator sessions found that send_input calls over 2,000 characters correlated strongly with surface freezes. In one Mehayom-app session: 868 total send_input calls, 13 exceeded 2,000 chars (max 4,382), and those 13 large calls preceded 6 frozen / surface-unresponsive incidents. In a control session (coach repo) every send_input stayed under 1,900 chars and there were zero freezes. The coach control case is the proof: discipline eliminates the symptom.

Direct evidence (from the Mehayom-app session):

"You're right — the cmux surface was frozen, my send_input commands weren't taking effect (sent 4 commands, none showed up). I burned ~3 minutes trying before deciding to implement directly rather than waste more time debugging cmux."

The rule (hard cap + warn)

File-based handoff pattern (for prompts > 1,800 chars)

Three steps. The first one is a shell command you run via the Bash tool. The next two are cmux MCP calls.

Step 1 — write the long prompt to disk via Bash (or the Write tool):

printf '%s' "$LONG_PROMPT" > /Users/etanheyman/Gits/orchestrator/collab/surface-N-$(date +%s).md

Prefer the Write tool when the prompt is already in-context — it avoids shell quoting pitfalls on multi-line prompts.

Step 2 — type the cat command into the target surface, then press Return:

send_input(surface: "surface:N", text: "cat /Users/etanheyman/Gits/orchestrator/collab/surface-N-<stamp>.md")
send_key(surface: "surface:N", key: "Return")

send_input on its own only TYPES the text — without send_key("Return") the command sits on the shell line unexecuted. This is the same execution contract used everywhere else in this skill.

Why the file pattern beats splitting the prompt into multiple sends:

Splitting into N sub-2000 chunks still hits the freeze path if any chunk pushes the pane state over some cumulative limit, and loses atomicity (the agent sees a partial prompt).
The file is independent of the pane's mutable state. If the pane freezes, spawn a new one and re-run the cat — the prompt is preserved.
File handoff also survives pane crashes, restarts, and compactions.

On a failed send_input: check the surface before retrying

A retry without root-cause investigation is almost always wrong. After any send_input that appears to have been lost (no output, no activity, retry instinct kicking in):

read_screen(surface: "surface:N", lines: 10) — is the pane frozen, scrolled back, or just slow?
If frozen → follow Step 3 of Surface Health Check (close, respawn, salvage).
If the payload you just sent was near or above the 1,800-char cap → that was the root cause. Don't retry at the same size; switch to file-based handoff.
If unsure → list_agents / my_agents to confirm the worker still exists.

Do not fire a second send_input of the same large payload hoping it "works this time." It won't, and you'll burn the same 3 minutes the earlier session logged.

Boot + deliver: FOCUS-FIRST (the reliable bundle for "send the prompt once booted")

Root cause (Etan, recurring — 2026-05-30): a cmux pane/agent does NOT initiate until its workspace/pane is FOCUSED. So boot_prompt_path and a bare wait_for({target_state:"ready"}) on an unfocused pane NEVER RESOLVE — they hang on a ready-state that can't arrive because the agent hasn't started. Etan: "It never fucking resolves — focus the pane for 3 seconds, then read, then if ready send the prompt." Do not lean on boot_prompt_path / blind long wait_for to deliver a boot prompt. Focus is the missing precondition.

The reliable bundle (proven spawning orc-gen-12 on 1M, 2026-05-30):

1. CREATE the pane:   new_split({direction, role, workspace, focus:true})  → surface
                      (focus:true is REQUIRED — role-based tab placement rejects focus:false)
2. LAUNCH (no boot_prompt_path):  send_command({surface, command:"<repo>Claude -s [-m '<model>']"})
                      (quote any model with brackets, e.g. -m 'claude-opus-4-8[1m]' — zsh globs [1m] otherwise)
3. FOCUS so it initiates:  select_workspace({workspace})  +  `cmux focus-pane --pane <pane>`
4. WAIT ~3s, then READ:  read_screen({surface, parsed_only:true})
                      → confirm status ready/idle/working AND verify `model` / `context_window`
                        (catch the L4 default: 200K vs the 1,000,000 you asked for)
5. IF READY, deliver:  send_command({surface, command:"<the prompt>"})  — pane is focused, it lands.

Verify the model in step 4, not just readiness. A launcher can boot Opus 4.8 at the 200K default when you wanted [1m] (1,000,000). read_screen parsed model
- context_window is the gate; if wrong, fix with /model in-session before sending.
boot_prompt_path is for already-focused/foreground spawns only. When you're driving from another pane, it will hang — use this focus-first bundle instead.
Upstream fix (route to cmuxLayer-LEAD): boot_prompt_path / wait_for should auto-focus the target pane before waiting for readiness — then this 5-step bundle collapses back into a single reliable primitive. Until then, focus-first is mandatory.

Composer-Wedge Runtime Doctrine (2026-06-06)

Why this exists: boot_prompt_delivered, submit_verified, working-status, and token_count have all returned false positives — including submit_verified:true on a Claude pane with unsubmitted text still sitting after the › prompt marker. Doc edits alone did not stop this class: seven observer-window catches occurred all post-#478-merge (tallies vary 3/4/5/7 across observers — no canonical ledger). This section is mitigation doctrine; the cure is the cmuxlayer composer-is-empty code fix (dispatched separately as D6).

Untrusted submit signals — treat ALL of these as hints only, never ground truth:

boot_prompt_delivered
submit_verified
parsed working / idle / ready status from registry
token_count (including phantom climbing counts on never-started sessions)

Mandatory post-dispatch read: read_screen (or get_agent_state + scrollback read on dispute) ≤15 seconds after EVERY dispatch — boot prompt, follow-up, or send_to_agent.

Anomaly triggers — force a full scrollback read:

'Queued follow-up inputs' visible on a worker that registry says is working
Token count unchanged across two reads after a send
Cost/token meter not moving while status says working

Idle Codex sends: when the pane is genuinely idle (at › / $ prompt), use send_command (atomic) or send_input + verified status flip — never fire-and-forget without the ≤15s read.

Surface Health Check (MANDATORY — before ANY prompt delivery)

Why this exists: Frozen/dead surfaces are a top-3 frustration source. Agents send prompts to unresponsive surfaces, losing work and burning ~3 minutes per incident on manual fallback. send_input returns ok:true even on frozen terminals.

# After spawn_agent → got agent_id + linked surface
wait_for({ agent_id: "agent:abc123", target_state: "ready", timeout_ms: 120000 })
# ✅ Ready/working state = worker booted
# ❌ Timeout: read_screen(surface: "...", lines: 10) to check for FR-06 parser drift

Step 2 — Before sending ANY prompt to an existing surface: Always read_screen first to confirm the surface is responsive. Never fire-and-forget a send_input without checking state first.

# Before sending prompt to surface:N
read_screen(surface: "surface:N", lines: 5)
# ✅ Responsive: shows shell prompt, cursor output, or agent activity
# ❌ Unresponsive: blank screen, no change from last check, or error output

Step 3 — Unresponsive surface recovery: If a surface fails 2 consecutive read_screen checks (blank, frozen, or no shell prompt):

read_screen(surface: "surface:N", lines: 80, scrollback: true) — salvage any partial work
brain_store any salvaged progress
stop_agent({agent_id})
spawn_agent({...same task...}) — create a fresh worker
Re-verify with Step 1 before sending the prompt

# WRONG — backticks and quotes break in send_input:
send_input(surface: "surface:N", text: "Fix the `processQueue` function")

# RIGHT — use heredoc pattern:
send_input(surface: "surface:N", text: "cat <<'PROMPT_EOF' | clipboard\nFix the processQueue function\nPROMPT_EOF")

# RIGHT — escape special characters:
send_input(surface: "surface:N", text: "Fix the \\`processQueue\\` function")

# SAFEST — write prompt to a temp file, then cat it:
# 1. Write prompt to /tmp/agent-prompt-N.txt via Bash
# 2. send_input: "cat /tmp/agent-prompt-N.txt"

Checklist (run mentally before every send_input):

[ ] Did I read_screen this surface in the last 60 seconds?
[ ] Does it show a shell prompt or active agent?
[ ] Does my prompt contain backticks, quotes, or markdown? → escape or heredoc
[ ] Is this a fresh surface? → did I wait 3s and verify the prompt?

Parsing Agent Output

# Pattern: read enough scrollback to capture the full response
read_screen(surface: "surface:N", lines: 80, scrollback: true)
# Then look for ---RESPONSE_START--- ... ---RESPONSE_END--- in the output

If markers are missing, fall back to reading the last 50 lines + done signal. But if you wrote the prompt correctly (checklist above), markers will be there.

Monitoring Protocol

"I'll monitor them" is NOT monitoring. Monitoring = explicit waits on explicit agent_ids.

After spawning:

Update AGENT_REGISTRY
wait_for({agent_id, target_state:"ready"|"working", timeout_ms:120000}) to verify boot
For long tasks, require an output file + DONE marker and wait on wait_for({agent_id, target_state:"done"})
After any topology change or crash, re-run list_agents / my_agents before sending follow-ups — surface ids drift
If wait_for or get_agent_state disagrees with the visible pane, use read_screen to adjudicate FR-06
When the agent finishes, read output IMMEDIATELY — don't wait for the user to ask
After system events (Mac wake, BrainBar restart, network change, ANY cmux restart): inspect active workers via list_agents + read_screen on every worker you intend to report on. Agents can lose MCP silently. Never carry pre-restart liveness claims forward — see Post-Restart Truth-vs-Display below.

Worker utilization check (R28 enforcement): During monitoring, also check worker utilization:

Claude context >50% AND its Cursor/Codex worker has 0 tokens → ROUTING VIOLATION
Action: nudge Claude to delegate remaining data work to its idle worker
Worker surface gone (crashed/closed) → respawn immediately on new surface + resend task
After sprint: if Claude did >30% of data gathering itself → flag for process improvement

Anti-patterns:

Fire and forget — spawning without wait_for. You WILL miss failures.
Checking only when user asks — agent may have been stuck 20 min.
Not reading output files — agent finished, you never read results.
Using Task tool when user says "cmux agents" — Task agents are invisible. cmux agents are VISIBLE. The user wants to SEE them.
Sending task prompt to Cursor without /auto-run first — worker silently stalls on permission prompts.
Sending follow-ups to remembered surface numbers instead of agent_ids — surface drift will eventually burn you.
Claude doing SQL/grep while Cursor worker sits idle — route to Cursor (R28).

Envelope-vs-Delivery Pairing (MANDATORY)

Why this exists: orphan envelopes — [FROM=X TO=Y TYPE=Z] blocks written to the author's OWN pane and never actually delivered to Y — are a top friction source in multi-agent collabs. 64 orphan envelopes were observed in one Codex session (wave3-codex-bulk Block B). Verbatim user friction: "Why do you have these messages in your chat but no one enters them?" Recurred across Wave 1-3 (3+ logged occurrences).

The rule

Rule of thumb: if you wrote TO=X in plaintext, you wrote it FOR X — so deliver it to X. An envelope in your own pane that wasn't sent is a message in a bottle, not communication.

Same-turn pairing pattern (copy this)

# 1. Compose the envelope you want X to see:
envelope = """
[FROM=orcClaude TO=coachClaude TYPE=task_done]
Audit finished, 14 findings, no commits.
[/FROM=orcClaude TO=coachClaude TYPE=task_done]
"""

# 2. SAME TURN — actually deliver it:
mcp__cmux__send_to_agent({
  agent_id: "agent:coach-...",
  text: envelope,
  press_enter: true
})

Both must appear in the same model turn. If you only emit step 1 (the plaintext envelope in your own pane), the message is undelivered — X never sees it.

Anti-patterns (these are the orphans)

Acceptance test

Done Signals

Instruct agents to put the signal as the very last line before CLAUDE_COUNTER — not buried above a summary. Otherwise read_screen won't catch it.

Collab Pattern

Copy ~/Gits/orchestrator/collab/TEMPLATE.md first — never write a collab from scratch. The collab-guard.py hook WILL block you.

# 1. Write collab file from template
# 2. Spawn agents with collab instructions
for entry in "search:Agent1" "perf:Agent2" "security:Agent3"; do
  angle="${entry%%:*}"; name="${entry#*:}"
  spawn_agent({
    repo: "TARGET_REPO",
    cli: "claude",
    prompt: "Read collab/FILE.md — you are " name ". Claim " angle ". Update collab when done."
  })
done

Log every action in collab: spawns, completions, blockers. No silent work.

Git Worktree Isolation

Parallel agents MUST use separate worktrees — without them, agents clobber each other's git state.

Claude subagents: Agent(isolation="worktree") (built-in)
cmux agents: Create worktrees manually before spawning, then launch the worker via spawn_agent. See worktrees skill for details.
Codex apply_patch: when your assigned worktree differs from the session cwd, every apply_patch filename MUST be an absolute path inside the worktree. exec_command.workdir does not apply to apply_patch; relative patch paths resolve against the session cwd and can mutate the main checkout silently.

Pane Hygiene — harvest, review, close

Essential Rules

spawn_agent for every worker — no hand-rolled launch flow
Discover workers before acting — use list_agents / my_agents
NEVER read_screen your own surface — recursive output
Verify after launch — wait_for first, get_agent_state second, read_screen only for disputes
Workers = current-workspace splits, audits/research = separate named workspaces unless the user explicitly asks for same-workspace visibility
Sequential launch — brief spacing still helps on first-run biometric prompts
Cross-workspace: always pass workspace ref — surfaces are workspace-scoped
Name everything — mcp__cmux__rename_tab on every surface
Update collab after every action — spawns, completions, blockers
AGENT_REGISTRY is mandatory — maintains state across compaction
BrainLayer agents are sequential — stagger by 10s+ (SQLite = single writer)
wait_for on spawn, not client-side polling — set up monitoring immediately
Don't give hour-scale estimates for <500-line tasks — parallel agents are 8-10x faster
Before the user leaves, every active worker needs either wait_for coverage or a file-based completion contract — no silent "I'll monitor."
Gems to ALL active agents — when you receive new research/context while agents run, send to ALL active workers immediately via send_to_agent. Contextualize per agent: include a 1-line "why this matters to YOUR task" — raw paste is noise. Don't hoard, don't defer, don't send to only the "obvious" recipient.
Respawn > absorb — frozen agent → read_screen(lines:80) to salvage partial work → brain_store progress → stop_agent → spawn_agent → resend SAME task with "already completed: [X]" context. Exception: remaining work <5 min AND you log "ABSORBING: [reason]" to collab. Default is ALWAYS respawn.
Brief agents about external events — Mac restart, user away, network outage → explicitly tell agents. They can't infer from timestamps.
Post-sprint retrospective — after every multi-agent sprint, brain_store what failed, what was corrected, what user said. Tags: ["orchestration", "incident", "cmux-agents"]. Without this, future sessions start from zero.
Max 2-3 concurrent worker agents — more causes AP watchdog kernel panics (April 5 crash: 9+ agents + enrichment + metro bundler = SoC watchdog expired). Before spawning a new agent, count running agents via list_agents. If >=3 workers active, wait for one to finish or kill a stale one. Enrichment daemon + build tools count toward the limit.
Skills don't hot-reload in long sessions — a Claude agent spawned 4+ hours ago runs stale skill versions. If a skill was updated mid-sprint, the running agent won't see the changes. Mitigation: (a) /compact forces re-read of skills, (b) orcClaude should notify long-running agents of critical skill changes via send_to_agent, (c) for critical fixes, respawn the agent instead of nudging.
Collab file BEFORE spawning workers — when dispatching 2+ agents on related tasks, write a collab file from TEMPLATE.md FIRST, then spawn workers with "Read collab/<file>.md" in their prompt. Do NOT dispatch agents via remembered surface numbers or bare launcher typing. Collab files are the coordination channel — without them, agents work in isolation and duplicate effort.
Agent health check BEFORE every follow-up — never message a worker without a recent get_agent_state or wait_for on its agent_id. If the registry looks wrong, use read_screen as the FR-06 fallback.
No model params to spawns — launcher defaults are authoritative; stale params silently downgrade panes.
No --print/-p for agent sessions — all workers/leads/verification gates are interactive repoGolem launcher sessions.
Finished panes get closed — harvest, review, close. Only live processes stay.

Known Issues — cmux rename hooks (design proposal, do NOT edit without orcClaude review)

Swift cmux client source (may be built-in tab rename behavior, not a Python hook)
~/Gits/cmuxlayer/src/**/rename*
A LaunchAgent plist that watches cmux sockets
Part of the spawn_agent implementation / agent-registry wiring in cmuxlayer

Next action (for orcClaude, NOT for skill-creator):

Locate the actual rename hook source
Reproduce each bug in a safe surface
Fix behind a feature flag
A/B test against the current hook
Ship via /pr-loop

Until then: manually mcp__cmux__rename_tab + mcp__cmux__set_status after spawn, using the color/display-name conventions in the table above.

Adoption

etanhey/cmux-agents

$ install --global

Security Scan Results

SKILL.md

cmux-agents

Default Lifecycle (2026-04-19)

repoGolem Launcher Contract

Model + Headless Gates (2026-06-05)

Worker vs Audit Placement

Current Caveats

FR-01 — hyphenated repo names can mis-resolve to the wrong launcher

FR-06 — state parser ambiguity (booting vs idle-with-spinner)

First-run Touch ID note

Completion Signals — file > wait_for > get_agent_state > read_screen

Visual Proof / Screenshots

The file-based completion pattern (copy this)

FR-06 fallback when parser and screen disagree

STOP — Before Anything Else

MCP Primitives (use these for low-level ops)

Response Schemas (PR #76, 2026-04-17 — list_surfaces)

spawn_agent crash_recover + session auto-capture (PR #77, 2026-04-17)

Spawning Agents — launchers only, NEVER a model param (2026-06-05)

Spawning Agents (MANDATORY: use spawn_agent)

Auto-workspace-categorization by launcher root

NEVER Use These Commands (Common Mistakes)

Cursor /auto-run fallback

Task Routing

Agent Selection Guide

Post-Restart Truth-vs-Display (2026-06-06)

Worker-Brief Authoring Contract (2026-06-06)

Pre-Spawn Prompt Checklist

Prompt Size Ceiling — send_input freezes surfaces above ~2000 chars

The rule (hard cap + warn)

File-based handoff pattern (for prompts > 1,800 chars)

On a failed send_input: check the surface before retrying

Boot + deliver: FOCUS-FIRST (the reliable bundle for "send the prompt once booted")

Composer-Wedge Runtime Doctrine (2026-06-06)

Surface Health Check (MANDATORY — before ANY prompt delivery)

Parsing Agent Output

Monitoring Protocol

Envelope-vs-Delivery Pairing (MANDATORY)

The rule

Same-turn pairing pattern (copy this)

Anti-patterns (these are the orphans)

Acceptance test

Done Signals

Collab Pattern

Git Worktree Isolation

Pane Hygiene — harvest, review, close

Essential Rules

Known Issues — cmux rename hooks (design proposal, do NOT edit without orcClaude review)

See Also

Related Skills

etanhey/phoenix-human-view

etanhey/mac-systems

etanhey/judge-fleet

etanhey/fleet-wrap

etanhey/cmux-agents

$ install --global

Security Scan Results

SKILL.md

cmux-agents

Default Lifecycle (2026-04-19)

repoGolem Launcher Contract

Model + Headless Gates (2026-06-05)

Worker vs Audit Placement

Current Caveats

FR-01 — hyphenated repo names can mis-resolve to the wrong launcher

FR-06 — state parser ambiguity (booting vs idle-with-spinner)

First-run Touch ID note

Completion Signals — file > wait_for > get_agent_state > read_screen

Visual Proof / Screenshots

The file-based completion pattern (copy this)

FR-06 fallback when parser and screen disagree

STOP — Before Anything Else

MCP Primitives (use these for low-level ops)

Response Schemas (PR #76, 2026-04-17 — list_surfaces)

spawn_agent crash_recover + session auto-capture (PR #77, 2026-04-17)

Spawning Agents — launchers only, NEVER a model param (2026-06-05)

FR-06 — state parser ambiguity (`booting` vs idle-with-spinner)

Response Schemas (PR #76, 2026-04-17 — `list_surfaces`)

Spawning Agents (MANDATORY: use `spawn_agent`)

Cursor `/auto-run` fallback

Prompt Size Ceiling — `send_input` freezes surfaces above ~2000 chars

FR-06 — state parser ambiguity (`booting` vs idle-with-spinner)

Response Schemas (PR #76, 2026-04-17 — `list_surfaces`)

Spawning Agents (MANDATORY: use `spawn_agent`)

Cursor `/auto-run` fallback

Prompt Size Ceiling — `send_input` freezes surfaces above ~2000 chars