skills/golem-powers/cmux-agents/SKILL.md
Spawn AI agents in cmux panes/workspaces through MCP tools and repoGolem launchers. Covers Claude, Cursor, Gemini, Codex, Kiro, external CLI agents, worker splits, audit/research workspaces, monitoring, prompt delivery, and collab patterns. Use when spawning visible AI workers, terminal agents, or multi-agent orchestration.
npx skillsauth add etanhey/golems cmux-agentsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Orchestration layer for AI agents in cmux panes. Low-level pane operations (splits, reads, sends) use cmux MCP tools — this skill handles the workflow on top.
Visible worker peers now use the agent-based cmux MCP tools by default:
mcp__cmux__spawn_agent({repo, cli, model, prompt, workspace?, parent_agent_id?})agent_id immediately and write it into your collab / AGENT_REGISTRY.mcp__cmux__wait_for({agent_id, target_state, timeout_ms}) for lifecycle gates.mcp__cmux__send_to_agent({agent_id, text, press_enter}) for follow-ups.mcp__cmux__get_agent_state({agent_id}) for a combined registry + parsed-screen snapshot.mcp__cmux__list_agents(...) or mcp__cmux__my_agents(...) whenever topology changes.mcp__cmux__stop_agent({agent_id}) to stop or recycle a worker.Reference agents by agent_id, not surface index. Surface IDs drift after crashes, respawns, and pane reuse; agent_id is the durable handle. Surface refs still matter for raw inspection (read_screen) and non-agent panes (new_surface, browser tabs), but not for the default worker lifecycle.
Default path is still spawn_agent. When FR-01, non-agent terminals, or recovery
flows force a manual launcher, use the repoGolem one true form:
{repo}{Tool} -s "prompt"
brainlayerCursor -s "audit X" # gather/read-only
brainlayerCodex -s "fix Y" # implement
brainlayerClaude -s "coordinate Z" # orchestrate
brainlayerGemini -s "visual task" # visual/OCR
The launcher already handles cd, MCP wiring, env vars, iTerm profile,
1Password-backed secrets, and tab title. Do not add cd ~/Gits/...,
MCP_CONNECTION_NONBLOCKING=1, CLAUDE_CODE_NO_FLICKER=1,
source ~/.zshrc &&, raw cursor/codex/claude, or --fast. See
/repogolem for launcher flags and /agent-routing AP11 for the incident
history.
| Intent | Placement | Why | |--------|-----------|-----| | Worker: code, implementation, collab | Split in current workspace | Visible beside the lead session; easy to monitor and nudge | | Audit/research: read-only analysis, perspective, discovery | Separate named workspace | Keeps the working view uncluttered and easy to find in the sidebar |
Name audit/research workspaces clearly, e.g. Audit: golems or
Research: auth. Use same-workspace down splits only when the user explicitly
asks to keep the audit visible next to the lead.
Known live failure: spawn_agent({repo:"skill-creator"}) can guess skill-creatorClaude instead of the real repoGolem launcher skillcreatorClaude. The immediate workaround lives in /repogolem; verify the launcher there before relying on spawn_agent for a new hyphenated repo.
booting vs idle-with-spinner)wait_for and get_agent_state depend on cmux's parser. In rare cases the registry still says booting while read_screen shows a perfectly usable prompt. If wait_for times out but the raw pane clearly shows the agent is ready, trust read_screen, file a cmuxlayer bug, and use one surface-level fallback send only if you must unblock delivery.
First launch of a newly signed launcher binary can still trigger a one-time Touch ID prompt. Treat this as a first-run caveat, not a primary orchestration strategy.
Why this exists: surface-based polling burned hundreds of
read_screencalls and still missed real state transitions. The new default is event-driven waits on stableagent_ids, with raw screen reads reserved for ambiguity or output extraction.
Ranked reliability of completion signals (use the highest that fits the job):
| Signal | Reliability | When to use |
|--------|-------------|-------------|
| 1. Output file with DONE marker | Ground truth. File either exists + contains marker, or not. | Every multi-minute autonomous worker task. |
| 2. wait_for({agent_id, target_state:"done"}) | Default event-driven lifecycle gate. | Standard worker completion, especially when you already have the agent_id. |
| 3. get_agent_state({agent_id}) | Good current snapshot of registry state + parsed screen. | Quick checks without a full raw read. |
| 4. read_agent_output / read_screen | Best adjudicator when parser and pane disagree. | FR-06, output extraction, or manual troubleshooting. |
| 5. list_agents / my_agents | Discovery only. | "What is alive?" not "is this task complete?" |
Every autonomous worker task must end with a file write and a DONE marker:
# In the worker's prompt:
Write your report to /path/to/output/batch-WORKER.md. The last line of the
file must be exactly: DONE_WORKER_NAME
Orchestrator side, poll the file(s) until all expected outputs exist and each contains its DONE marker:
# run_in_background: true
until [ -f "$PLAN/batches/batch-M1.md" ] && grep -q DONE_MINER_M1 "$PLAN/batches/batch-M1.md" 2>/dev/null \
&& [ -f "$PLAN/batches/batch-M2.md" ] && grep -q DONE_MINER_M2 "$PLAN/batches/batch-M2.md" 2>/dev/null \
; do
sleep 30
done
echo ALL_MINERS_DONE
When the background command completes, you know every miner finished AND wrote a real file (not a partial crash). This survives cmux state-sync bugs, splash-screen false-idles, and pane freezes.
If wait_for times out or send_to_agent rejects with current state: booting, do this in order:
get_agent_state({agent_id})read_screen(surface: "...", lines: 20) on the linked surfacesend_input + send_key("return") fallback, then return to the agent_id flow as soon as the registry catches upThe rule is not "surface sends are normal again." The rule is: agent_id is the source of truth; raw pane sends are an escape hatch for parser drift.
1. Spawn workers with mcp__cmux__spawn_agent. No hand-rolled repoGolem typing unless FR-01 forces a temporary launcher fallback.
2. Store the agent_id immediately. Every follow-up, wait, stop, and handoff should key off the durable id.
3. Re-discover live workers with list_agents / my_agents after crashes or layout changes. Do not trust a remembered surface number.
4. AGENT_REGISTRY — maintain after CLAUDE_COUNTER in every response with active agents:
AGENT_REGISTRY:
| Agent ID | Surface | Repo | Task | Status | Last Check |
|----------|---------|------|------|--------|------------|
| agent:abc123 | surface:153 | golems | Digest failures | WORKING | 12:35 |
Add on spawn. Update on check. Remove on kill.
| Operation | MCP Tool | Notes |
|-----------|----------|-------|
| Spawn worker | mcp__cmux__spawn_agent | Default for visible Claude/Codex/Cursor/Gemini/Kiro peers |
| Send follow-up | mcp__cmux__send_to_agent | Replaces send_input + send_key for normal agent messaging |
| Wait for state | mcp__cmux__wait_for | Replaces client-side poll loops |
| Discover workers | mcp__cmux__list_agents / mcp__cmux__my_agents | Use after topology changes; agent_id survives surface drift |
| Inspect state | mcp__cmux__get_agent_state | Registry + parsed screen snapshot |
| Stop worker | mcp__cmux__stop_agent | Replaces closing a pane to kill a worker |
| Read raw pane | mcp__cmux__read_screen | Fallback for FR-06 or output extraction |
| Extract marker output | mcp__cmux__read_agent_output | Preferred when you set ---OUTPUT_START--- / ---OUTPUT_END--- |
| Create non-agent tab | mcp__cmux__new_surface | Browsers, shell scratch pads, log tails |
| Rename / status / progress | mcp__cmux__rename_tab, mcp__cmux__set_status, mcp__cmux__set_progress | UI polish for the visible pane |
| Open browser | mcp__cmux__browser_surface | Non-terminal surfaces |
| Surface-level escape hatch | send_input, send_key | Only when parser drift blocks the agent channel |
Use spawn_agent for full worker lifecycle. Keep raw surface tools for non-agent panes and FR-06 recovery.
list_surfaces)Why this shipped: An earlier mining sweep of 10 orcClaude sessions found
list_surfacesburned 117,000 tokens across 49 calls (avg 2,387 / max 8,333 per call). Root causes: a duplicate bug (surfaces in workspace:N appeared N times insurfaces[]) and per-entry bloat (screen_preview 51%, UUIDs 15%, fullremoteblob 11% even for local-only workspaces). PR #76 fixed both: dedupedsurfaces[]unconditionally, and condensed the default response with opt-in backward compatibility viaverbose: true. Default payload is ~89% smaller than the pre-PR shape.
Default response (no verbose):
{
"ok": true,
"workspaces": [
{
"ref": "workspace:1",
"title": "🎯 orcClaude",
"current_directory": "/Users/etanheyman/Gits/brainlayer",
"remote_state": "local"
}
],
"surfaces": [
{
"ref": "surface:35",
"title": "orcClaude",
"type": "terminal",
"workspace_ref": "workspace:1"
}
]
}
remote_state values:
"local" — no SSH/proxy/daemon hints; ordinary macOS workspace. The common case."connected" — SSH session is up and connected."disconnected" — remote configured but currently offline."unavailable" — remote partially configured but daemon/proxy state is missing.Pass verbose: true to restore the full historical schema — workspace id (UUID), index, pinned, selected, listening_ports, full remote blob (daemon / proxy / heartbeat / ports), plus every per-surface field (id, pane_id, index_in_pane, selected_in_pane, focused, window_ref, pane_ref). Dedup is still applied in verbose mode — it's a correctness fix, not opt-in.
When to pass verbose: true:
listening_ports / forwarded_ports).focused / selected_in_pane / index.When the default is enough (99% of agent work):
ref + title + workspace_ref.ref count.send_input / read_screen → only need surface:N ref.ref + title suffice.Pre-PR gotcha now fixed: workspace_ref is no longer echoed at the top level of the response unless you explicitly passed workspace: "workspace:N" as a filter. Callers relying on the top-level echo must either pass a filter or read it per-surface.
Why this shipped: When a worker's PTY died unexpectedly (shell exit, mac crash, cmux daemon restart), the agent record was orphaned with no way to resume. PR #77 adds an opt-in recovery loop driven by boot-time session-ID capture.
New crash_recover: boolean param on spawn_agent (default false). When true:
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}). This is what every major CLI prints as its session header (claude --session-id, codex session, cursor agent --session). First UUID seen wins — the engine does not currently match against CLI-specific context markers, so any earlier UUID in boot output (e.g., a log line printing a workspace UUID, a trace ID, or a dependency version string) can steal the slot. Engine persists the first-seen UUID as cli_session_id on the agent record.crash_recover=true AND user_killed !== true AND cli_session_id is set, the next sweep attempts respawn.<repo>Claude -s --resume <id> / <repo>Codex -s resume <id> / cursor agent --session <id>. Up to MAX_RESPAWN_ATTEMPTS = 10 attempts before giving up.user_killed guard — if the user explicitly killed the agent via stop_agent (userInitiated=true, the default), crash_recover will NOT respawn it. Context-limit evictions pass userInitiated=false, keeping the agent eligible for recovery.When to pass crash_recover: true:
When NOT to pass it:
Caveat — the session-ID heuristic is regex-based. If a worker's early output happens to contain a UUID that isn't the CLI session (e.g., a log line printing a workspace UUID, a trace ID, a dependency version), the wrong ID gets captured and recovery will silently fail later.
Verification procedure for critical spawns:
get_agent_state(agent_id: "...") → check cli_session_id is not null.read_screen:
Session: or --session-id in the header.session in the splash / first turn.--session in the boot log.cli_session_id does not match the on-screen session — or if it's null after 60s (capture window expired) — call stop_agent(userInitiated: false) to keep recovery eligibility, then respawn fresh and re-verify.cli_session_id means crash_recover will respawn with the wrong --resume argument and the new session won't have your context.Related new agent-record fields (visible via get_agent_state / list_agents):
cli_session_id — captured UUID, or null if capture window expired before a match.respawn_attempts — count of recovery attempts made (0 to 10).user_killed — true if user explicitly stopped via stop_agent.workspace_id — owning workspace UUID (persisted for cross-session recovery).Distinct from the -c continue flag on repoGolem launchers. -c resumes an exited session that the user re-launches manually. crash_recover handles unexpected PTY death mid-task, automatically. Use -c when you stop a worker and want to come back to it later; use crash_recover: true when you want the engine to auto-recover without your intervention.
spawn_agent)mcp__cmux__spawn_agent({
repo: "golems",
cli: "codex",
model: "codex",
prompt: "Fix search ranking"
})
→ { agent_id, ... }
mcp__cmux__wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
mcp__cmux__send_to_agent({ agent_id, text: "Narrow scope to ranking only", press_enter: true })
mcp__cmux__wait_for({ agent_id, target_state: "done", timeout_ms: 1800000 })
Options: repo, cli, model, prompt, workspace, parent_agent_id
Examples:
spawn_agent({ repo: "golems", cli: "claude", model: "sonnet", prompt: "Survey search regressions" })
spawn_agent({ repo: "golems", cli: "cursor", model: "codex", prompt: "Audit code quality and write findings to /tmp/audit.md" })
spawn_agent({ repo: "orchestrator", cli: "gemini", model: "gemini", prompt: "Survey patterns and summarize in 5 bullets" })
spawn_agent({ repo: "golems", cli: "kiro", model: "kiro", prompt: "Draft a concise troubleshooting note" })
Hyphenated repo names: if the repo contains -, verify the real repoGolem launcher in /repogolem first (FR-01). The doc warning is there because cmuxlayer can still mis-resolve the launcher name.
When spawning a visible pane via mcp__cmux__new_split + sending a launcher
command, the pane MUST land in the workspace whose name contains
(case-insensitive substring) the launcher's repo root. Examples:
coachClaude -s → workspace whose title contains "Coach"orcClaude -s → workspace whose title contains "orc" (or workspace:2 by default)voicelayerCodex -s → workspace whose title contains "voicelayer"brainlayerCodex -s → workspace whose title contains "brainlayer" or "orc-buddy"Lookup protocol BEFORE new_split:
coachClaude, orcClaude, etc.).coach, orc, voicelayer, brainlayer).mcp__cmux__list_surfaces() to enumerate live workspaces.new_split.list_surfaces that the new surface ended up in the
intended workspace. If not, IMMEDIATELY call
mcp__cmux__move_surface(surface, workspace) to fix.Why post-spawn verify is mandatory: the current
new_splitMCP call'sworkspacearg is advisory — actual placement may follow current focus, not the arg. Until cmux MCP enforces the arg, agents MUST verify + move post-spawn. Skipping the verify step is how the live 2026-05-17 incident happened.
Evidence: Live 2026-05-17 ~02:32 IDT — skillCreator s:3 spawned 8 Batch D
eval panes via new_split(workspace:"workspace:1"); 4 coach-launcher panes
landed in workspace:2 anyway. orc had to manually move_surface to recover.
Cost: ~5 minutes of orc context burning on layout repair instead of dispatch.
spawn_agent({workspace}) already routes correctly in most cases — this rule
specifically covers the lower-level new_split path used when you need a
non-agent terminal or a launcher invocation that isn't a cli enum value.
Root cause (April 5 2026): mehayomClaude used
cursor-cliin a cmux pane — command not found. Wasted a surface and debugging time.
| WRONG | RIGHT | Why |
|-------|-------|-----|
| cursor-cli "prompt" | cursor agent --trust "prompt" | cursor-cli does not exist as a command |
| cursor --print --output-format text | cursor agent --output-format text | --print is not a cursor flag |
| cursor agent --output-format (in cmux) | cursor agent --trust (in cmux) | Interactive cmux agents need --trust for permissions, not --output-format text which is for piped/batch output |
| cursor "prompt" (bare) | <repo>Cursor -s launcher | Launcher functions handle cd, env, permissions |
For visible Cursor agents: Use spawn_agent({cli:"cursor"}) from the parent orchestrator. These workers should boot as addressable agent_ids, not as manually typed launchers.
For batch/piped output: Use cursor agent --output-format text --model "<cursor-max-mode-model>" "PROMPT" directly. Substitute Cursor's current Max Mode model ID (verify via cursor agent --help or the Cursor changelog — IDs change between versions). Default to no --model flag (Auto) per /agent-routing AP3.
For repoGolem launchers: Use <repo>Cursor -s (e.g., golemsCursor -s, brainlayerCursor -s). The -s flag skips permissions.
/auto-run fallbackDefault path: put the task in spawn_agent.prompt and talk to the worker via send_to_agent({agent_id,...}).
If a live Cursor pane still stalls on approvals, send /auto-run as a follow-up first:
wait_for({ agent_id, target_state: "ready", timeout_ms: 120000 })
send_to_agent({ agent_id, text: "/auto-run", press_enter: true })
send_to_agent({ agent_id, text: "<task prompt>", press_enter: true })
If FR-06 blocks send_to_agent, confirm the raw pane with read_screen and use one surface-level fallback send.
| Need | Best CLI | Why |
|------|----------|-----|
| Deep reasoning, multi-file | Claude | Best reasoning, MCP, native worktrees |
| Codebase-wide audit | Cursor | @codebase indexing, text output |
| Fast structured output | Codex | Output contracts, GPT-5.4 |
| Large context research | Gemini | 1M tokens, free |
| Quick PRs | Codex | Fast, low overhead |
| Big refactors | Claude | --worktree, session resume |
| Cross-model verification | Claude + Cursor | Different blind spots |
Use /agent-routing for who should do the work, and /repogolem for the exact
launcher flags. This table is the cmux placement/CLI quick reference:
| Task | Launcher form | Notes |
|------|---------------|-------|
| Plan auditing, perspectives | {repo}Cursor -s | Read-only gather; default/Auto model, no model flag |
| Code review, codebase analysis | {repo}Cursor -s | Uses codebase context; verify findings before acting |
| Quick research, comparisons | {repo}Gemini -s | Large context and visual/OCR strength |
| Parallel implementation | {repo}Codex -s | Default Codex model; override only when asked |
| Collab agent writing to collab file | {repo}Claude -s | Use Sonnet for audit collabs when appropriate |
| Deep reasoning, architecture | {repo}Claude -s | Opus/default only when the task needs it |
Task type routing (R28 — invoke /agent-routing):
| Task Type | Route To | NOT To | |-----------|----------|--------| | SQL queries, grep, file scanning | Cursor (read-only) | Claude (burns context) | | Code changes, bug fixes, PRs | Codex | Cursor (read-only!) | | Orchestration, decisions, monitoring | Claude | Cursor or Codex |
When spawning agents for a collab: assign Cursor workers for data gathering and Codex workers for implementation. The Claude agent coordinates. If a Claude agent is burning context on SQL or grep work — that's a routing violation.
Full CLI syntax and capabilities: adapters/ directory + adapters/capabilities.yaml.
Every agent task prompt MUST include:
---RESPONSE_START--- and ---RESPONSE_END--- markers"Full audit: workflows/prompt-audit.md
send_input freezes surfaces above ~2000 charsWhy this exists: A 72-hour JSONL sweep (2026-04-12 → 2026-04-15) across 5 Claude Code orchestrator sessions found that
send_inputcalls over 2,000 characters correlated strongly with surface freezes. In one Mehayom-app session: 868 totalsend_inputcalls, 13 exceeded 2,000 chars (max 4,382), and those 13 large calls preceded 6 frozen / surface-unresponsive incidents. In a control session (coachrepo) everysend_inputstayed under 1,900 chars and there were zero freezes. Thecoachcontrol case is the proof: discipline eliminates the symptom.
Direct evidence (from the Mehayom-app session):
"You're right — the cmux surface was frozen, my
send_inputcommands weren't taking effect (sent 4 commands, none showed up). I burned ~3 minutes trying before deciding to implement directly rather than waste more time debugging cmux."
| Payload length | Action |
|----------------|--------|
| < 1,500 chars | Send directly. Safe zone. |
| 1,500–1,800 | Warn threshold. Trim if possible, then send. |
| ≥ 1,800 | HARD CAP. Do NOT send_input directly. Use the file-based handoff pattern below. |
1,800 chars gives a ~17% safety margin under the smallest observed freeze point (~2,100 chars). Above the cap, surfaces freeze silently — send_input still returns ok:true but the target pane never sees the input.
Three steps. The first one is a shell command you run via the Bash tool. The next two are cmux MCP calls.
Step 1 — write the long prompt to disk via Bash (or the Write tool):
printf '%s' "$LONG_PROMPT" > /Users/etanheyman/Gits/orchestrator/collab/surface-N-$(date +%s).md
Prefer the Write tool when the prompt is already in-context — it avoids shell quoting pitfalls on multi-line prompts.
Step 2 — type the cat command into the target surface, then press Return:
send_input(surface: "surface:N", text: "cat /Users/etanheyman/Gits/orchestrator/collab/surface-N-<stamp>.md")
send_key(surface: "surface:N", key: "Return")
send_input on its own only TYPES the text — without send_key("Return") the command sits on the shell line unexecuted. This is the same execution contract used everywhere else in this skill.
Step 3 — verify the handoff landed: after a few seconds, read_screen the surface and confirm you see the file contents (or the agent's response to them). If the pane is frozen despite the short pointer, the file is still safe on disk — spawn a fresh surface and re-run Step 2 against the new surface.
Why the file pattern beats splitting the prompt into multiple sends:
cat — the prompt is preserved.A retry without root-cause investigation is almost always wrong. After any send_input that appears to have been lost (no output, no activity, retry instinct kicking in):
read_screen(surface: "surface:N", lines: 10) — is the pane frozen, scrolled back, or just slow?list_agents / my_agents to confirm the worker still exists.Do not fire a second send_input of the same large payload hoping it "works this time." It won't, and you'll burn the same 3 minutes the earlier session logged.
Root cause (Etan, recurring — 2026-05-30): a cmux pane/agent does NOT initiate until its workspace/pane is FOCUSED. So
boot_prompt_pathand a barewait_for({target_state:"ready"})on an unfocused pane NEVER RESOLVE — they hang on a ready-state that can't arrive because the agent hasn't started. Etan: "It never fucking resolves — focus the pane for 3 seconds, then read, then if ready send the prompt." Do not lean onboot_prompt_path/ blind longwait_forto deliver a boot prompt. Focus is the missing precondition.
The reliable bundle (proven spawning orc-gen-12 on 1M, 2026-05-30):
1. CREATE the pane: new_split({direction, role, workspace, focus:true}) → surface
(focus:true is REQUIRED — role-based tab placement rejects focus:false)
2. LAUNCH (no boot_prompt_path): send_command({surface, command:"<repo>Claude -s [-m '<model>']"})
(quote any model with brackets, e.g. -m 'claude-opus-4-8[1m]' — zsh globs [1m] otherwise)
3. FOCUS so it initiates: select_workspace({workspace}) + `cmux focus-pane --pane <pane>`
4. WAIT ~3s, then READ: read_screen({surface, parsed_only:true})
→ confirm status ready/idle/working AND verify `model` / `context_window`
(catch the L4 default: 200K vs the 1,000,000 you asked for)
5. IF READY, deliver: send_command({surface, command:"<the prompt>"}) — pane is focused, it lands.
[1m] (1,000,000). read_screen parsed model
context_window is the gate; if wrong, fix with /model in-session before sending.boot_prompt_path is for already-focused/foreground spawns only. When you're
driving from another pane, it will hang — use this focus-first bundle instead.boot_prompt_path / wait_for should
auto-focus the target pane before waiting for readiness — then this 5-step bundle
collapses back into a single reliable primitive. Until then, focus-first is mandatory.Why this exists: Frozen/dead surfaces are a top-3 frustration source. Agents send prompts to unresponsive surfaces, losing work and burning ~3 minutes per incident on manual fallback.
send_inputreturnsok:trueeven on frozen terminals.
Step 1 — After spawning a worker:
wait_for({agent_id, target_state:"ready", timeout_ms:120000}) is the default health gate. If it times out, inspect the raw pane with read_screen. Two failures = dead worker — stop it and respawn. If wait_for hangs to timeout and the pane looks un-booted, the pane is probably UNFOCUSED — see "Boot + deliver: FOCUS-FIRST" above; focus it, then re-check.
# After spawn_agent → got agent_id + linked surface
wait_for({ agent_id: "agent:abc123", target_state: "ready", timeout_ms: 120000 })
# ✅ Ready/working state = worker booted
# ❌ Timeout: read_screen(surface: "...", lines: 10) to check for FR-06 parser drift
Step 2 — Before sending ANY prompt to an existing surface:
Always read_screen first to confirm the surface is responsive. Never fire-and-forget a send_input without checking state first.
# Before sending prompt to surface:N
read_screen(surface: "surface:N", lines: 5)
# ✅ Responsive: shows shell prompt, cursor output, or agent activity
# ❌ Unresponsive: blank screen, no change from last check, or error output
Step 3 — Unresponsive surface recovery:
If a surface fails 2 consecutive read_screen checks (blank, frozen, or no shell prompt):
read_screen(surface: "surface:N", lines: 80, scrollback: true) — salvage any partial workbrain_store any salvaged progressstop_agent({agent_id})spawn_agent({...same task...}) — create a fresh workerStep 4 — Safe prompt delivery (special character escaping):
Never send special characters (backticks, quotes, markdown formatting) directly via send_input. They get interpreted by the shell and corrupt the prompt.
# WRONG — backticks and quotes break in send_input:
send_input(surface: "surface:N", text: "Fix the `processQueue` function")
# RIGHT — use heredoc pattern:
send_input(surface: "surface:N", text: "cat <<'PROMPT_EOF' | clipboard\nFix the processQueue function\nPROMPT_EOF")
# RIGHT — escape special characters:
send_input(surface: "surface:N", text: "Fix the \\`processQueue\\` function")
# SAFEST — write prompt to a temp file, then cat it:
# 1. Write prompt to /tmp/agent-prompt-N.txt via Bash
# 2. send_input: "cat /tmp/agent-prompt-N.txt"
Checklist (run mentally before every send_input):
read_screen this surface in the last 60 seconds?When reading agent output via read_screen, search for ---RESPONSE_START--- to find the structured response. Everything between START and END is the deliverable — ignore terminal noise, tool calls, and deliberation outside the markers.
# Pattern: read enough scrollback to capture the full response
read_screen(surface: "surface:N", lines: 80, scrollback: true)
# Then look for ---RESPONSE_START--- ... ---RESPONSE_END--- in the output
If markers are missing, fall back to reading the last 50 lines + done signal. But if you wrote the prompt correctly (checklist above), markers will be there.
"I'll monitor them" is NOT monitoring. Monitoring = explicit waits on explicit agent_ids.
After spawning:
wait_for({agent_id, target_state:"ready"|"working", timeout_ms:120000}) to verify bootwait_for({agent_id, target_state:"done"})list_agents / my_agents before sending follow-ups — surface ids driftwait_for or get_agent_state disagrees with the visible pane, use read_screen to adjudicate FR-06list_agents / get_agent_state, then use read_screen only where the registry looks suspect. Agents can lose MCP silently.Worker utilization check (R28 enforcement): During monitoring, also check worker utilization:
Anti-patterns:
wait_for. You WILL miss failures./auto-run first — worker silently stalls on permission prompts.agent_ids — surface drift will eventually burn you.Why this exists: orphan envelopes —
[FROM=X TO=Y TYPE=Z]blocks written to the author's OWN pane and never actually delivered to Y — are a top friction source in multi-agent collabs. 64 orphan envelopes were observed in one Codex session (wave3-codex-bulk Block B). Verbatim user friction: "Why do you have these messages in your chat but no one enters them?" Recurred across Wave 1-3 (3+ logged occurrences).
Any [FROM=<self> TO=<target> TYPE=<status|mission|task_done|ack>] envelope
block you emit to your own pane output MUST be paired with a
mcp__cmux__send_to_agent({agent_id: <target>, text: ...}) or
mcp__cmux__send_to(...) call in the SAME turn.
Rule of thumb: if you wrote TO=X in plaintext, you wrote it FOR X — so
deliver it to X. An envelope in your own pane that wasn't sent is a message in
a bottle, not communication.
# 1. Compose the envelope you want X to see:
envelope = """
[FROM=orcClaude TO=coachClaude TYPE=task_done]
Audit finished, 14 findings, no commits.
[/FROM=orcClaude TO=coachClaude TYPE=task_done]
"""
# 2. SAME TURN — actually deliver it:
mcp__cmux__send_to_agent({
agent_id: "agent:coach-...",
text: envelope,
press_enter: true
})
Both must appear in the same model turn. If you only emit step 1 (the plaintext envelope in your own pane), the message is undelivered — X never sees it.
| Anti-pattern | Why it fails |
|---|---|
| Writing [FROM=X TO=Y ...] in your pane "for the log" without send_to_agent | Y never sees it; orchestrator-side log is not a communication channel |
| Splitting envelope and delivery across turns ("I'll send it next turn") | Forgetting is the default; next turn rarely happens |
| Calling send_to_agent(text: "hi") after composing a FROM=/TO= envelope but sending only the bare message text | Recipient loses the FROM/TO/TYPE metadata the envelope was built to carry |
| Emitting envelope to a CLAUDE_COUNTER summary instead of calling send_to_agent | The summary is yours, not the recipient's inbox |
If your turn output contains a [FROM=...TO=...TYPE=...] block, your turn's
tool calls MUST also contain a matching mcp__cmux__send_to_agent (or
mcp__cmux__send_to) call whose agent_id corresponds to the TO= target
and whose text includes the envelope contents. No pair → orphan envelope →
rule violation.
Instruct agents to put the signal as the very last line before CLAUDE_COUNTER — not buried above a summary. Otherwise read_screen won't catch it.
Copy ~/Gits/orchestrator/collab/TEMPLATE.md first — never write a collab from scratch. The collab-guard.py hook WILL block you.
# 1. Write collab file from template
# 2. Spawn agents with collab instructions
for entry in "search:Agent1" "perf:Agent2" "security:Agent3"; do
angle="${entry%%:*}"; name="${entry#*:}"
spawn_agent({
repo: "TARGET_REPO",
cli: "claude",
model: "sonnet",
prompt: "Read collab/FILE.md — you are " name ". Claim " angle ". Update collab when done."
})
done
Log every action in collab: spawns, completions, blockers. No silent work.
Parallel agents MUST use separate worktrees — without them, agents clobber each other's git state.
Agent(isolation="worktree") (built-in)spawn_agent. See worktrees skill for details.spawn_agent for every worker — no hand-rolled launch flowlist_agents / my_agentswait_for first, get_agent_state second, read_screen only for disputesmcp__cmux__rename_tab on every surfacewait_for on spawn, not client-side polling — set up monitoring immediatelywait_for coverage or a file-based completion contract — no silent "I'll monitor."send_to_agent. Contextualize per agent: include a 1-line "why this matters to YOUR task" — raw paste is noise. Don't hoard, don't defer, don't send to only the "obvious" recipient.stop_agent → spawn_agent → resend SAME task with "already completed: [X]" context. Exception: remaining work <5 min AND you log "ABSORBING: [reason]" to collab. Default is ALWAYS respawn.brain_store what failed, what was corrected, what user said. Tags: ["orchestration", "incident", "cmux-agents"]. Without this, future sessions start from zero.list_agents. If >=3 workers active, wait for one to finish or kill a stale one. Enrichment daemon + build tools count toward the limit./compact forces re-read of skills, (b) orcClaude should notify long-running agents of critical skill changes via send_to_agent, (c) for critical fixes, respawn the agent instead of nudging."Read collab/<file>.md" in their prompt. Do NOT dispatch agents via remembered surface numbers or bare launcher typing. Collab files are the coordination channel — without them, agents work in isolation and duplicate effort.get_agent_state or wait_for on its agent_id. If the registry looks wrong, use read_screen as the FR-06 fallback.Background (2026-04-11): the cmux tab-rename auto-hook (launcher name → display name + color) has 5 observed bugs. Code changes are OUT OF SCOPE for skill-creator — this section is a design proposal for the next orcClaude session to dispatch.
| # | Bug | Symptom | Proposed fix |
|---|---|---|---|
| 5.1 | Flat tab colors | All tabs use the same (or default) color; agent type not visually distinguishable | Map launcher function → color in a dict (claude=blue, codex=orange, cursor=purple, gemini=green, kiro=red). Set via mcp__cmux__rename_tab(color=...) on spawn. |
| 5.2 | Red-to-cyan weirdness | Some tabs flip from red (error/warning) to cyan unexpectedly; color state machine has a bad transition | Debug: instrument the rename hook to log every color-set call with timestamp + reason. Most likely cause: one code path sets color from agent state, another sets it from default, last write wins. |
| 5.3 | Nested naming collapses | Tab names for agents spawned in worktrees or nested panes lose their parent context (e.g. "golemsClaude > feat-X" → "claude") | When generating the display name, walk the surface parent chain and prepend up to 1 level of context. Truncate via ellipsis if longer than the tab width budget. |
| 5.4 | Weak semantic tag extraction | Tab names don't reflect what the agent is actually WORKING on (they just say "claude" instead of e.g. "claude: PR#232 fix") | Parse the task prompt on spawn — extract PR numbers (PR#\d+), issue refs (#\d+), and the first 3-5 imperative words. Fall back to launcher name if none found. |
| 5.5 | Launcher-to-display-name mapping missing | Tab shows raw function name (golemsClaude -s) instead of a friendly display (golems • Claude) | Add a LAUNCHER_DISPLAY lookup table keyed on the base launcher pattern ({repo}Claude → {repo} • Claude, {repo}Cursor → {repo} • Cursor, etc.). Regex extract repo + agent. |
Where the hook lives: NOT located as of 2026-04-11 by skillCreatorBuddy. Not in ~/.claude/hooks/, not in ~/Gits/golems/hooks, not in ~/.config/cmux/settings.json. Likely candidates to check next session:
~/Gits/cmuxlayer/src/**/rename*spawn_agent implementation / agent-registry wiring in cmuxlayerNext action (for orcClaude, NOT for skill-creator):
Until then: manually mcp__cmux__rename_tab + mcp__cmux__set_status after spawn, using the color/display-name conventions in the table above.
/agent-routing — task routing matrix: Cursor=gather, Codex=implement, Claude=orchestrate (R28)/cmux — low-level pane operations (splits, reads, sends, browser)/orc — orchestration decisions, state machine, collab protocols/pr-loop — every agent working on code must invoke this for every PRdevelopment
Create, edit, and verify golem-powers skills using the standard SKILL.md structure, workflow files, adapters, templates, and eval fixtures. Use for new skills, structural edits, workflows/adapters, and pre-deploy validation. NOT for invoking existing skills, superpowers skills, or skill-creator agent workflows.
testing
Extract structured knowledge from any video source — YouTube URLs or local screen recordings. YouTube → gems workflow (yt-dlp transcript → keyword hotspots → frame extract → brain_digest → structured gems). Screen recordings → QA workflow (reuses /qa-video stalker pipeline). Use when user shares a YouTube link wanting deep extraction with frames, shares a .mov/.mp4 for QA processing, says "extract from video", "video gems", "process this recording", or mentions gem extraction from video content.
testing
Use when running or reviewing any recurring monitor loop for merge queues, worker queues, collab tails, or agent completion. Enforces drive-to-completion ticks: every tick must query live state with `!`, classify whether real progress happened, and then dispatch, verify-and-decrement, or escalate-park. Triggers on: monitor loop, /loop, recurring tick, keep monitoring, silent autonomous, merge gate, blocked review, no-progress loop.
tools
MeHayom freelance client management — daily updates, decision tracking, time logging. Use when drafting Yuval updates, logging scope changes, tracking hours, or any MeHayom client communication. Triggers: 'draft Yuval update', 'client update', 'daily update', 'log decision', 'track time', 'mehayom'.