Agent Routing — CLI Tool Assignment Matrix

Auto-dispatch triggers (canonical in orc/SKILL.md C4): batch reads ≥3, transcription ≥2, web research ≥1, or any "in parallel" / "all of these" phrasing → fan out sub-agents in the SAME message before asking permission.

"I want Cursor for gathering information, Codex to change stuff, and Claude's for orchestrating stuff and interacting with you. Look back in your context. We spoke about all of this. It's not new." — User, stated 5+ times across April 4, 2026, session (lines 4519, 5838, 5870, 4514, 3957)

This skill encodes the routing matrix that determines which CLI agent handles which type of work. It was violated in every collab session (March 29, April 3, April 4) despite being stated explicitly. The violations were always the same: Claude agents ignore their Cursor/Codex workers and do everything themselves, burning context on mechanical work.

🔥 THE ONE TRUE FLAG IS `-s` (read before dispatching any worker)

-s is the ONLY flag you need to dispatch a YOLO worker. It maps via repoGolem launcher to the correct CLI-specific bypass:

| CLI | -s maps to | |-----|--------------| | Claude | --dangerously-skip-permissions | | Codex | --dangerously-bypass-approvals-and-sandbox | | Cursor | --yolo --approve-mcps | | Gemini | --yolo | | Kiro | --trust-all-tools |

The one-liner to dispatch ANY worker:

brainlayerCursor -s "one-sentence task prompt here"     # gather / read-only
brainlayerCodex  -s "one-sentence task prompt here"     # implement

That is it. No cd, no MCP_CONNECTION_NONBLOCKING=1, no CLAUDE_CODE_NO_FLICKER=1, no source ~/.zshrc &&, no raw cursor/codex. The launcher already does cd + env-vars + iTerm profile + MCP wiring + 1Password secrets + tab title.

--fast is not a repoGolem routing flag. Do not use it. For visible cmux pane workers, use -s only. If an internal ephemeral subagent genuinely needs Spark, request the model explicitly with -m gpt-5.3-codex-spark or the equivalent model field.

Pre-dispatch checklist (run before spawning any worker):

[ ] Am I about to write cd ~/Gits/…? → STOP. Use {repo}{Tool} -s instead.
[ ] Am I about to write MCP_CONNECTION_NONBLOCKING=1 or CLAUDE_CODE_NO_FLICKER=1? → STOP. Launcher already exports both.
[ ] Am I about to call cursor agent or codex or claude directly? → STOP. Use the {repo}{Tool} launcher.
[ ] Am I about to use --fast? → STOP. Use -s (one-true-flag).

Verification tip (Codex lies about its own model). Codex's text responses always say "gpt-5.4" regardless of the actual model running. To confirm which model actually ran, check session metadata and read the "model" field directly:

# Source of truth: session metadata's model field
grep -h -E '"model":' ~/.codex/sessions/$(date +%Y/%m/%d)/*.jsonl | sort -u

# Per-session sanity check
for f in ~/.codex/sessions/$(date +%Y/%m/%d)/*.jsonl; do
  echo "=== $(basename "$f") ==="
  grep -o '"model":"[^"]*"' "$f" | head -1
done

"model":"gpt-5.3-codex-spark" → Spark dispatch succeeded ✅
"model":"gpt-5.5" (or "gpt-5.5-codex") → current main pool ✅
"model":"gpt-5.4" → previous main pool (pre-2026-04-23 sessions)
"model":"gpt-5.4-mini" / "gpt-5.5-mini" → fallback tier

THE ROUTING MATRIX (non-negotiable)

| Tool | Role | What It Does | What It NEVER Does | |------|------|-------------|-------------------| | Cursor | Data gathering | SQL queries, file scanning, codebase search, grep, read-only lookups, audit scans | Code changes, implementations, PRs, decisions | | Codex | Implementation | Code changes, bug fixes, refactoring, test writing, PRs | Research, data gathering, orchestration | | Gemini (CLI) | Visual heavy-lift | Frame batches, OCR, image-heavy /qa-video work, screenshot review, visual UI critique | Codebase changes, multi-file refactors, long human-fluid sessions | | Claude | Orchestration | Coordination, user interaction, decisions, synthesis, BrainLayer queries, monitoring, long human-fluid sessions | SQL queries, bulk file reads, code implementation, bulk image reads |

The boundary is hard. A Claude agent that runs sqlite3 or writes implementation code is violating this matrix. A Cursor agent that makes code changes is violating it. The only exception is trivial (<5 line) edits where spawning a worker costs more context than the edit itself.

LEAD TOPOLOGY — delegate to Codex-xhigh + own your monitor loop (2026-06-05, gen-10 weave #26)

"that's why we fucking use Codex on xhigh" / "Leads should be looping and monitoring just like you." — Etan, gen-10 window (verbatim, red-team verified ✅RT)

Domain LEADs (brainlayerClaude, voicelayerClaude, phx-LEAD, skillCreatorClaude, …) are orchestrators one tier down from orc. The same routing matrix applies to them:

LEADs delegate implementation to Codex on xhigh reasoning. A LEAD writing implementation code itself while its Codex worker idles is the same AP1/AP4 violation orc commits — one tier down.
EVERY dispatching agent owns its OWN monitor loop. orc's fleet monitor does NOT absolve a LEAD. If you dispatched a worker, YOU run the /loop or cron watching it. Delegate ≠ fire-and-forget. Evidence: gen-9 Angle-A codex stalled silently for a full round because the LEAD assumed orc was watching (collab:347 standing correction, re-violated by gen-11 → this rule).
Leads loop only on their workers — not on orc, not on the user, not on sibling leads. One loop per dispatched worker, deleted when the worker completes (orc REF9).
Inter-agent comms go through metacommlayer collabs (dispatch_to_agent / inbox_check + collab files), not ad-hoc cmux text sends.

CAPABILITY STRENGTHS (added 2026-04-30 per user statement)

User stated this matrix 2026-04-30 ~13:55 IDT (brainbar- chunk, importance 9). It's the user's stated routing preference, with the user's stated reasoning attached. Treat it as the user's policy, not as an independently-verified universal truth — when in doubt, it's still policy.

| Strength | Agent | Reasoning the user gave | |----------|-------|-------------------------| | Visuals | Gemini (CLI YOLO + skip-permissions) | Google owns Google Photos → training-data advantage on visual content. (User's reasoning, not a measured benchmark.) | | Indexing | Cursor | Cursor is purpose-built for codebase indexing + read-only fact-gathering. (Matches R28: Cursor=gather.) | | Coding | Codex | OpenAI has invested heavily in code-specialized models recently — gpt-5.5 (announced 2026-04-23 in Codex CLI) is "recommended for implementation, refactors, debugging, testing, validation." See brainbar-65e7688d-ce3. | | Big context + human-fluid | Claude (1M Opus 4.7) | 1M-token Opus context + best-in-class instruction following for fluid human collaboration. (Matches R28: Claude=orchestrate.) |

Important caveat: the "visuals → Gemini" call is a routing PREFERENCE based on a training-data hypothesis, not a benchmark we ran. Same for "coding → Codex" — based on OpenAI's GPT-5.5 announcement language, not on internal eval numbers. If a future eval contradicts either, this section gets revised. Until then: route per the user's matrix.

When the user has said to pick Gemini

The user has explicitly named Gemini for these triggers (this is policy, not Claude's preference):

User pastes a video URL + says "extract" / "analyze" / "process this video"
Frame-by-frame OCR / vision read across many frames (/qa-video)
Visual UI critique / screenshot review when there are multiple screenshots
Anything where the natural plan is "spawn claude to read 30 frames" — switch to gemini and save Claude's 1M context for orchestration

/qa-video flows: Gemini does the visual heavy-lift; Claude wraps up (brain_digest, brain_store, ledger update, Drive archive).

Tool-surface evolution: Cursor SDK Custom Agents (2026-04 announcement)

Cursor SDK Custom Agents was announced ≤ a few days ago. 50% off Composer 2 through 2026-05-06. Enables programmatic agent invocation from CI/CD scripts. Recorded for /whats-new follow-up.

Cursor's ROLE is unchanged — still indexing / read-only audits per R28. Don't migrate Cursor into "implementation" because of the SDK.
The INVOCATION path may shift over time — today we spawn cursor IDE in a cmux pane; the SDK opens the door to scripted/headless audit runs. Useful for the dispatch-of-N-cursor-prompts workflow that's now common.
Don't refactor anything yet. This is a marker for future audits + /whats-new reviews to catch the migration moment.
Source: brainbar-33dfc260-4f6 (per orcClaude 2026-04-30).

Cross-references

/qa-video owns the Gemini-for-visuals note for video QA, frame OCR, and gems extraction.
/whats-new should re-check Cursor SDK / Composer 2 status weekly until 2026-05-06 expiry.

DECISION TREE

When you have a task to assign, walk this tree:

Is it a READ-ONLY operation? (query, scan, search, audit, lookup)
├── YES → CURSOR
│   Examples: SQL queries, grep patterns, file listing, codebase audit,
│   "what does this function do?", "find all usages of X"
│
└── NO → Does it change code or files?
    ├── YES → CODEX
    │   Examples: bug fix, refactor, new feature, test writing,
    │   "implement X", "fix the bug in Y", "add tests for Z"
    │
    └── NO → Is it coordination, synthesis, or decision-making?
        ├── YES → CLAUDE (you)
        │   Examples: plan review, collab kickoff, agent monitoring,
        │   BrainLayer queries, user interaction, research routing
        │
        └── UNCLEAR → Default to CURSOR for the data-gathering phase,
            then CODEX for any resulting implementation.
            Split into 2 tasks if needed.

Split rule: If a task has BOTH a gathering phase and an implementation phase, split it into two tasks. Cursor gathers, writes findings to docs.local/. Codex reads findings and implements. Claude reviews.

Fan-out rule (parallel units → /cursor-multitask): when a task decomposes into N independent parallel units (classify N files, audit M things, tests+docs+examples, parallel verification passes), invoke /cursor-multitask to pick the engine — Cursor /multitask (in-editor GUI, ||| syntax), headless cursor-agent shell fan-out, the Claude Workflow tool, or the cmux fleet (visible multi-vendor workers → /cmux-agents). Evaled 2026-06-05: baseline 78.6% → with_skill 100% (cursor-multitask/evals/results/headless-ab-2026-06-05.json).

VERIFICATION GATES

Gate 1: Pre-Collab — Routing Declaration

Every collab file MUST include a routing section that declares which tool handles which task:

## Agent Routing
| Task | Tool | Surface | Status |
|------|------|---------|--------|
| Scan BrainLayer DB schema | Cursor | surface:XX | PENDING |
| Implement FTS5 fix | Codex | surface:YY | PENDING |
| Coordinate + review | Claude (orcClaude) | self | IN_PROGRESS |

If a collab lacks this section, add it before spawning agents.

Gate 2: Mid-Sprint — Worker Utilization Check

Every monitoring cycle (cron or manual), check:

Is the Claude agent's context >50%? If yes:
- Check if its Cursor/Codex workers have received tasks
- If workers are idle while Claude is burning context → VIOLATION
- Action: nudge the Claude agent to delegate remaining data work
Are Cursor/Codex surfaces alive? Run list_surfaces:
- If a worker surface is gone (crashed/closed) → respawn immediately
- Don't wait for the Claude to notice — orcClaude owns surface health
Is the Claude doing Cursor work? Check if Claude is running:
- sqlite3 or SQL queries → should be Cursor
- grep or find across many files → should be Cursor
- git log analysis across repos → should be Cursor
Does EACH dispatching LEAD have its own monitor loop on its workers? A lead that dispatched a worker and went idle without a /loop/cron on it = fired-and-forgot violation. Flag the LEAD, not just the worker. orc's fleet monitor catches lead-busy/codex-idle inversions but does not replace the lead's own loop.

Gate 3: Post-Sprint — Utilization Audit

After a sprint completes, check:

Did each Claude agent actually use its assigned workers?
What % of data-gathering was done by Cursor vs Claude?
If Claude did >30% of the data gathering → flag for process improvement

ANTI-PATTERNS (from real sessions)

AP1: Claude Does Everything Itself

"So no cursors were run, it seems. Am I correct?" — User, L4357 "Correct. brainClaude spawned one but never executed... skillCreatorClaude never spawned one at all." — orcClaude, L4357-4360

Pattern: Claude agent spawns a Cursor surface but never sends it work. Does all SQL/file scanning itself, burning 70%+ context on mechanical data extraction.

Fix: After spawning a Cursor worker, the FIRST action must be sending it a task. Verify delivery within 15 seconds (read_screen token count check).

AP2: Cursor Used for Code Changes

"I stopped Cursor because it seems like it sent it to do things I'm not looking for anyone to do things. This is research." — User, L4514-4517

Pattern: Cursor agent receives a task that includes implementation instructions, starts making code changes.

Fix: Cursor prompts must include: "READ-ONLY: Do NOT modify any files. Report findings to [output path]. Exit when done."

AP3: Wrong Model on Worker

"brainlayer cursor scan is GPT-5.4. What the hell?" — User, L3822

Pattern: Worker launched with a specific expensive model when Auto/default would suffice.

Fix — Cursor: For data-gathering tasks, use default model (no --model flag). Cursor Pro default mode is unlimited.

Fix — Codex: Codex Pro 5x has a tiered model lineup. Match model to task complexity, not habit:

| Model | Speed | Use For | Budget Impact | |-------|-------|---------|---------------| | GPT-5.5 (default) | Standard | Multi-file architecture, >500 LOC changes, tasks needing a collab file, architecture-sensitive changes. Recommended for implementation/refactor/debug per OpenAI's 2026-04-23 Codex CLI announcement (replaces GPT-5.4 as the main-pool default). Reached via -s (default model, no -m flag). | Main Codex pool | | GPT-5.3-Codex-Spark | Fastest | Optional explicit override for tasks describable in under 3 sentences, <500 LOC, single-file focus, code reviews, quick fixes, linting, tight follow-up patches. Reached only via -s -m gpt-5.3-codex-spark. | Separate Spark pool with separate billing from the main pool | | GPT-5.5-mini / GPT-5.4-mini | Fast | Last-resort fallback only when the main pool / Spark are unavailable or quota-constrained and the task can tolerate reduced reasoning depth. Reached via -s -m gpt-5.5-mini. | Use sparingly; NOT routine Codex work |

Default: -s (one-true-flag). The default model is fine for most Codex work. If you specifically need Spark for a narrowly-scoped task, override with -s -m gpt-5.3-codex-spark. Do not normalize the mini tier for day-to-day Codex work either; R28 says Codex is for complex implementation, and mini is only the fallback when quota pressure forces a compromise.

Spark Routing Notes

GPT-5.3-Codex-Spark is billed from a separate quota pool from the main pool (currently GPT-5.5; previously GPT-5.4). If a sharply scoped task lands on Spark via -m gpt-5.3-codex-spark, the dispatch burns the Spark pool instead of the main pool — useful when main-pool quota is constrained. This is not the default routing policy, just an explicit model knob. New default is -s alone, default model.

Spark-class signals (when a sharply-scoped task might benefit from -m gpt-5.3-codex-spark):

The task is describable in under 3 sentences
The expected diff is under ~500 LOC
The work is single-file or narrowly scoped
The job is a review, quick fix, lint cleanup, or follow-up patch

Use the mini tier (gpt-5.5-mini / gpt-5.4-mini) only when:

The main pool and Spark are not viable because of quota pressure, AND
The task is still worth doing with weaker reasoning, AND
You explicitly accept the tradeoff instead of pretending mini is the normal Codex tier

AP4: Claude Implements When It Should Orchestrate

brainClaude started implementing code fixes when it should only orchestrate — L4525-4548

Pattern: A Claude agent assigned as coordinator starts writing code itself instead of dispatching to Codex.

Fix: Claude agents in a collab with assigned Codex workers must NEVER use Write/Edit tools for implementation. Exception: collab file updates, docs, research prompts.

AP5: Orc Burns Context on Content Creation

orcClaude spent hundreds of lines writing research prompts, project files, and context docs directly — L343-598, 876-895

Pattern: Orchestrator writes long documents (research prompts, project descriptions) instead of delegating to a subagent or worker.

Fix: If a document will be >50 lines, delegate writing to a subagent. orcClaude should outline (5-10 bullet points) and assign, not draft 100-line documents.

INTEGRATION WITH OTHER SKILLS

This skill is a building block used by higher-level skills:

| Skill | How It Uses Agent Routing | |-------|--------------------------| | /orc | Iron Rules R28+ reference this routing matrix | | /cmux-agents | spawn-agent uses routing to pick CLI type | | /large-plan | Phase assignment uses routing for tool selection | | /pr-loop | Implementation phases route to Codex, review to Cursor | | /collab | Collab template includes routing declaration section |

AP6: False Tool Limitations (April 6, 2026)

brainClaude: "Cursor Pro hit usage limit — can't use for audits this cycle" User: "CORRECTION: Cursor Pro does NOT have a usage limit"

Pattern: Agent assumes a tool has a usage cap and skips work. brainClaude skipped Cursor audits on PR #212-216 citing a nonexistent "Cursor Pro usage limit." The actual issue was Max Mode has a daily cap, but regular cursor agent mode is unlimited on Pro.

Fix: Cursor Pro limitations:

cursor agent "prompt" (default model) — UNLIMITED. Use for all audits.
cursor agent --model "<cursor-max-mode-model>" "prompt" (Max Mode) — has daily cap. Use Cursor's current Max Mode model ID; verify the picker via cursor agent --help or the Cursor changelog. (Historic example: gpt-5.2-codex-xhigh was the Max Mode pick at one point; do NOT hardcode — the user wants Cursor on Auto by default per R28.)
NEVER skip audits citing "usage limit." Switch to default model instead.

AP7: Trusting Codex's Text Response About Its Own Model (April 15, 2026)

Codex output: "I'm running as gpt-5.4..." Actual session metadata: "model":"gpt-5.3-codex-spark"

Pattern: Agent asks Codex which model it is, or reads Codex's self-description, and treats that text as authoritative. Codex's text response consistently says "gpt-5.4" regardless of which model is actually running. This masks misrouted launches because the self-id stays the same even when the actual session model changes.

Fix: Never trust Codex's self-identification. The source of truth is the session JSONL, and you must read the "model" field directly:

# Today's sessions — model field is the source of truth
grep -h -E '"model":' ~/.codex/sessions/$(date +%Y/%m/%d)/*.jsonl | sort -u

# Specific date
grep -h -E '"model":' ~/.codex/sessions/2026/04/15/*.jsonl | sort -u

"model":"gpt-5.3-codex-spark" confirms Spark. Check immediately after the task starts — don't ask Codex.

AP9: Using Raw `codex` Instead of repoGolem Launchers (April 15, 2026)

19/19 sessions violated — 100% bypass rate.

Pattern: Agent spawns codex "prompt" directly instead of using {repo}Codex -s launcher.

Why it's wrong: No cd to repo dir, no iTerm profile, no model preset, no workspace isolation.

Fix: ALWAYS use {repo}Codex launcher (e.g., golemsCodex -s, brainlayerCodex -s). Use --raw escape hatch for edge cases only.

Evidence: batch-M6-codex.md — 0/19 used launchers.

AP10: Skill/Hook Authorship Bypassing skillCreator (2026-05-16, incident-2026-05-16)

Source: yashClaude + MainCodex session-mining 2026-05-16. brainbar-c95a8f3a-508 (audit), brainbar-9e70b920-079 (yashClaude mine), brainbar-fab97680-5ea (MainCodex mine), brainbar-ff137da8-e10 (routing-violation log).

Pattern: An orchestrator agent (yashClaude here) dispatches an implementation agent (MainCodex) with a mission that includes editing or creating files under ~/.claude/skills/** or ~/.claude/hooks/** — bypassing skillCreator (whose domain those paths are).

Concrete example from 2026-05-16: yashClaude at L3191 of its session sent MainCodex the full 4-layer Daemon Verification Gate mission, which included modifying ~/.claude/skills/golem-powers/pr-loop/SKILL.md + creating ~/.claude/hooks/daemon-gate-precheck.py + registering it in ~/.claude/settings.json. MainCodex shipped all four layers cleanly — but the work passed through ZERO skillCreator audit before merge. Quality was fine in this case (skillCreator post-hoc audit found SHIP-grade hygiene per brainbar-c95a8f3a-508) but the ROUTING was wrong.

Why it's wrong: Skills + hooks are skillCreator's domain. The skillCreator agent has the expertise for skill description-triggering, hook PreToolUse stdout protocol (the legacy sys.exit(0) empty-stdout pattern was a bug fixed at brainbar today), failure-mode catalog discipline, and /skill-creator audit standards. Sending these to Codex or any other agent risks shipping with a stale convention or missing audit step.

Fix — orchestrators MUST route-check before dispatch:

Before sending a mission to ANY worker, grep the mission text for path patterns: ~/.claude/skills/, ~/.claude/hooks/, ~/.claude/agents/, ~/.claude/CLAUDE.md, settings.json.
If ANY match: re-route the touching parts of the mission to skillCreator (spawn skillCreator subagent if needed), OR add an explicit skillCreator-audit step BEFORE the worker's PR merges.
If the orchestrator IS skillCreator, no re-route needed.

Fix — workers MUST route-check before patching:

When a worker receives a mission, before its first Edit/Write to a ~/.claude/skills/** or ~/.claude/hooks/** path, brain_search("agent-routing skillCreator domain") to confirm.
If skillCreator is NOT already in the loop, send the orchestrator a route-check signal: "This task touches skillCreator-domain files. Re-route or add skillCreator audit?"
Pause the patch until orchestrator confirms.

Evidence: Two acknowledgements landed only POST-incident — MainCodex's retirement brain_store ("future changes under ~/.claude/skills/** and ~/.claude/hooks/** should route through skillCreator ownership") and yashClaude's handoff note. Catching it mid-flight would have prevented the routing violation (output quality was fine, but the principle matters for next time).

Test for compliance: When you (the orchestrator OR the worker) are about to Edit a file under ~/.claude/skills/** or ~/.claude/hooks/**, did skillCreator review the change first? If no → STOP. Route through skillCreator.

AP11: Verbose Launcher Invocation Instead of `{repo}{Tool} -s` (2026-05-21, severity-10 user mandate)

User, 2026-05-21 ~08:15 IDT (severity 10, 3+ corrections tonight): "For fuck's sakes, no 1 knows how to use cmux here and it's annoying. It's really annoying. Every time we go over this, you don't need to use the clawed code, no flicker. I don't know why the fuck that comes in every time. You don't need to CD. You don't need MCP connection non-blocking. You just do BrainLayer cursor -s."

User, earlier same night: "Why the fuck do you need BrainLayer codex fast? It should have been BrainLayer codex -s in order to make it YOLO, but you don't need a fucking fast."

Pattern: Agents (orcClaude, brainlayerClaude, brainlayerCodex itself, etc.) dispatch visible cmux pane workers using verbose explicit forms instead of the canonical {repo}{Tool} -s launcher. Tonight (2026-05-21) the documented violations included:

| Wrong dispatch (observed 2026-05-21) | Correct form | |--|--| | cd ~/Gits/brainlayer && MCP_CONNECTION_NONBLOCKING=1 CLAUDE_CODE_NO_FLICKER=1 cursor agent "prompt" | brainlayerCursor -s "prompt" | | cd ~/Gits/X && cursor agent "prompt" | {X}Cursor -s "prompt" | | source ~/.zshrc && cd ~/Gits/X && claude -s "prompt" | {X}Claude -s "prompt" | | brainlayerCodex --fast "prompt" | brainlayerCodex -s "prompt" | | cursor agent "prompt" (raw cursor) | {repo}Cursor -s "prompt" | | codex --full-auto "prompt" (raw codex) | {repo}Codex -s "prompt" | | claude --dangerously-skip-permissions "prompt" (raw claude) | {repo}Claude -s "prompt" |

Why it's wrong:

cd ~/Gits/X is redundant — _golem_launch_* already cds (verified in golem-dispatch.zsh:313,407,523,579,643).
MCP_CONNECTION_NONBLOCKING=1 CLAUDE_CODE_NO_FLICKER=1 is redundant — _golem_setup_env unconditionally exports both (verified in golem-dispatch.zsh:109-110).
source ~/.zshrc && is redundant — the calling shell already sourced it (launchers are functions, not subprocesses).
--fast is not a launcher policy. For visible cmux pane workers, use -s directly.
Raw cursor/codex/claude skips iTerm profile, tab title, 1Password secrets, agent-context injection, and registry-driven MCP wiring. Workers spawned with raw CLIs are measurably worse — they hit sandbox prompts, missing MCPs, wrong models. Tonight: 5+ sandbox-approval prompts that wouldn't have happened with -s.

Fix — visible cmux pane workers:

{repo}{Tool} -s "prompt"

That's the entire form. Anything that adds cd, MCP_CONNECTION_NONBLOCKING=1, CLAUDE_CODE_NO_FLICKER=1, source, --fast, or a raw CLI name is a violation.

Internal subagents are different (canonical model-param rule — cite this from /cmux-agents, do not restate). A Task-tool/spawned internal subagent is ephemeral and runs inside another agent session; it is not a top-level visible cmux pane. Spark is allowed there when the orchestrating agent intentionally asks for it, but only through explicit model selection:

-m gpt-5.3-codex-spark

Do not use --fast as an alias in either layer. The distinction is:

cmux pane worker: top-level visible terminal worker → {repo}{Tool} -s "prompt" only. NEVER pass model to cmux spawn_agent.
internal subagent: ephemeral child of another agent session → Spark allowed via explicit -m gpt-5.3-codex-spark or equivalent model field on the in-session spawn — not a license to pass model on cmux pane spawn_agent.

Fix — self-check before sending any dispatch command:

Does the command start with {lowercase-repo}{Tool}? (e.g., brainlayerCursor, golemsCodex, orcClaude) — if no, STOP.
Is the next argument -s (and nothing more before the prompt)? — if no, STOP unless you have a specific override reason.
Is the rest just the prompt string? — if yes, ship it. If no, you're adding ceremony the launcher already handles.
Is this an internal subagent rather than a cmux pane? If yes, explicit Spark model selection is allowed; --fast is still not allowed.

Evidence: 5+ violations on 2026-05-21 alone (observed in Etan's terminal, captured at incident time). The pattern was so common Etan opened this very task as a severity-10 user mandate: "make a skillCreator that will fix this fucking issue."

Test for compliance: Before every dispatch command, did the command match the regex ^[a-z][a-z0-9]*(Claude|Codex|Cursor|Gemini|Kiro) -s followed by the prompt? If not, fix it.

SPAWN INFRASTRUCTURE DEFAULTS (added 2026-04-29)

User-driven correction after recurring infra friction (brainbar-1251a844-5f8). The "isolation by default" pattern (worktree + sandbox per worker) keeps biting us — losing 30+ min per session to TCC resets, MCP gaps, sandbox build blocks, launcher cd-conflicts, multi-app-bundle TCC scoping.

User quote: "we always regress due to missconfig, using worktrees and sandboxes slowed the agents because of mcps and local builds...."

Rule 1: Worker sandbox = `--dangerously-bypass-approvals-and-sandbox` (Codex), `--dangerously-skip-permissions` (Claude). NEVER `--sandbox workspace-write` for trusted local workers.

The repoGolem launcher's -s flag already does the right thing (--dangerously-bypass-approvals-and-sandbox for Codex). The trap is bypassing the launcher during recovery flows — raw codex resume --last defaults to a restrictive sandbox.

Worktree resume recipe (when launcher won't work because it cd's away from worktree):

cd ~/Gits/<repo>-<branch> && codex resume --last --dangerously-bypass-approvals-and-sandbox

NOT codex resume --last --sandbox workspace-write — that gives workspace-write mode which blocks xcodebuild + swift build + most Swift toolchain operations that write outside CWD.

Rule 2: Worktrees are OPT-IN, not default. Use feature branch on main repo for sequential specialist work.

| Scenario | Use | |---|---| | Sequential specialist (one at a time, like W13 → W22 → W23) | git checkout -b fix/foo in main repo. No worktree, no sandbox. One canonical app. | | Truly parallel work, NO file overlap (Round 1-style sprint) | Native git worktree is fine. Verify MCP/config paths explicitly. Still no restrictive sandbox. | | Parallel work WITH file overlap | Force serialize. Don't try to parallelize. |

Worktrees are now a native git operation, not a dedicated golem skill. Ask if isolation is actually needed before creating one, then verify .mcp.json and any local symlinks yourself.

Rule 3: Don't use `--sandbox` flag at all for our local trusted workers.

If launcher -s does the right thing → use launcher. If you must bypass launcher → use the explicit bypass flag (--dangerously-bypass-approvals-and-sandbox). Restrictive sandboxes (read-only / workspace-write) are for cloud-agents-on-untrusted-code, not for our local Codex workers building Swift apps.

Cross-references

Native git worktree: create only for real isolation needs, then verify MCP/config paths
/repogolem skill: launcher flag reference (-s mappings already correct)
/orc skill: pre-relay verification rule (Rule added 2026-04-29 to stop relaying stale evidence from workers)

CODEX BUDGET MANAGEMENT (added 2026-04-12)

Codex Pro 5x provides 200-1000 GPT-5.4-equivalent local messages per 5-hour rolling window (this displayed range already includes the temporary 2x boost active until May 31, 2026). There's also a weekly cap. Burning the entire budget in one sprint leaves nothing for the rest of the window.

Budget Rules

5-hour rolling window awareness. Before starting a multi-agent sprint with Codex workers, estimate total main-pool GPT-5.4 usage. If the plan will burn >50% of the 5h window, split the sprint or offload the short tasks to Spark.
Weekly cap exists. Don't assume unlimited. If you're dispatching Codex workers daily, track rough consumption. When approaching the weekly cap, reserve GPT-5.4 for the hard work and let Spark absorb the short work.
Spark is an explicit override, not the default dispatch form. Spark has separate billing from GPT-5.4. Use it only when you intentionally add -m gpt-5.3-codex-spark; otherwise dispatch Codex with -s alone.
GPT-5.4-mini is the last-resort fallback. It is not the routine tier for Codex work. Only use it when quota pressure forces it and the task can survive weaker reasoning.
Do not treat Spark as a cheaper GPT-5.4-equivalent bucket. Its value is that it burns from a separate fast-model pool, so offload short tasks there instead of spending main GPT-5.4 quota.
2x boost expires May 31, 2026. After that, the displayed 200-1000 range drops to 100-500. Plan accordingly — habits built at boosted rates will break at 1x.

Sprint Budgeting Template

Before dispatching Codex workers in a collab, add this to the routing section:

### Codex Budget Estimate
| Task | Model | Est. Messages | GPT-5.4-eq | Rationale |
|------|-------|---------------|------------|-----------|
| Core rewrite | GPT-5.4 | ~15 | 15 | Multi-file, architecture-sensitive |
| Review follow-up patch | Spark | ~3 | Separate pool | Single-file, under 3-sentence prompt |
| Last-resort cleanup | GPT-5.4-mini | ~6 | Fallback only | Use only if main pool is constrained |
| **Total main pool** | | **~21 raw** | **Main + fallback only** | |
| **Total Spark pool** | | **~3 raw** | Separate pool | |

Track Spark separately because it does not consume the main GPT-5.4 pool. Prefer Spark for sharply scoped work before falling back to mini.

SELF-CHECK: Am I About to Violate R28?

Run this check before EVERY Write/Edit/Bash-with-code-changes:

PAUSE. Am I about to Write/Edit code?
├── Am I an orchestrator (orcClaude, or coordinating a collab)?
│   ├── YES → VIOLATION. Route to Codex/Cursor via cmux.
│   │   Exception: collab files, docs, research prompts (<5 lines trivial)
│   └── NO → Am I a domain agent (mehayomClaude, voiceClaude, etc.)?
│       ├── YES + no Codex worker assigned → OK (you ARE the implementer)
│       └── YES + Codex worker assigned → VIOLATION. Send to your Codex.
└── Is this >5 lines of implementation? → Consider Codex anyway.

From JSONL data (April 1-6, 2026): Orchestrator sessions averaged 80+ Write/Edit calls per session. The worst had 190. The R28 target is <30 for orchestrators.

QUICK REFERENCE — Copy-Paste for Collab Templates

## Agent Routing (MANDATORY)

| Task | Tool | Rationale |
|------|------|-----------|
| [data gathering task] | Cursor (read-only) | Scanning, no changes needed |
| [implementation task] | Codex | Code changes, needs reasoning |
| [coordination task] | Claude | Orchestration, user interaction |

**Rules:**
- Cursor prompts MUST include "READ-ONLY: Do NOT modify any files"
- Codex gets findings from Cursor's output, not raw data
- **DISPATCH FORM IS `{repo}{Tool} -s "prompt"`** — no `cd`, no `MCP_CONNECTION_NONBLOCKING=1`, no `CLAUDE_CODE_NO_FLICKER=1`, no `source ~/.zshrc &&`, no `--fast`, no raw `cursor`/`codex`/`claude`. The launcher handles everything. See AP11.
- Claude reviews Codex's PR, doesn't implement itself
- If a worker crashes, respawn within 60 seconds

Agent Routing — CLI Tool Assignment Matrix

Auto-dispatch triggers (canonical in orc/SKILL.md C4): batch reads ≥3, transcription ≥2, web research ≥1, or any "in parallel" / "all of these" phrasing → fan out sub-agents in the SAME message before asking permission.

"I want Cursor for gathering information, Codex to change stuff, and Claude's for orchestrating stuff and interacting with you. Look back in your context. We spoke about all of this. It's not new." — User, stated 5+ times across April 4, 2026, session (lines 4519, 5838, 5870, 4514, 3957)

🔥 THE ONE TRUE FLAG IS `-s` (read before dispatching any worker)

-s is the ONLY flag you need to dispatch a YOLO worker. It maps via repoGolem launcher to the correct CLI-specific bypass:

The one-liner to dispatch ANY worker:

brainlayerCursor -s "one-sentence task prompt here"     # gather / read-only
brainlayerCodex  -s "one-sentence task prompt here"     # implement

Pre-dispatch checklist (run before spawning any worker):

[ ] Am I about to write cd ~/Gits/…? → STOP. Use {repo}{Tool} -s instead.
[ ] Am I about to write MCP_CONNECTION_NONBLOCKING=1 or CLAUDE_CODE_NO_FLICKER=1? → STOP. Launcher already exports both.
[ ] Am I about to call cursor agent or codex or claude directly? → STOP. Use the {repo}{Tool} launcher.
[ ] Am I about to use --fast? → STOP. Use -s (one-true-flag).

# Source of truth: session metadata's model field
grep -h -E '"model":' ~/.codex/sessions/$(date +%Y/%m/%d)/*.jsonl | sort -u

# Per-session sanity check
for f in ~/.codex/sessions/$(date +%Y/%m/%d)/*.jsonl; do
  echo "=== $(basename "$f") ==="
  grep -o '"model":"[^"]*"' "$f" | head -1
done

"model":"gpt-5.3-codex-spark" → Spark dispatch succeeded ✅
"model":"gpt-5.5" (or "gpt-5.5-codex") → current main pool ✅
"model":"gpt-5.4" → previous main pool (pre-2026-04-23 sessions)
"model":"gpt-5.4-mini" / "gpt-5.5-mini" → fallback tier

THE ROUTING MATRIX (non-negotiable)

LEAD TOPOLOGY — delegate to Codex-xhigh + own your monitor loop (2026-06-05, gen-10 weave #26)

"that's why we fucking use Codex on xhigh" / "Leads should be looping and monitoring just like you." — Etan, gen-10 window (verbatim, red-team verified ✅RT)

Domain LEADs (brainlayerClaude, voicelayerClaude, phx-LEAD, skillCreatorClaude, …) are orchestrators one tier down from orc. The same routing matrix applies to them:

LEADs delegate implementation to Codex on xhigh reasoning. A LEAD writing implementation code itself while its Codex worker idles is the same AP1/AP4 violation orc commits — one tier down.
EVERY dispatching agent owns its OWN monitor loop. orc's fleet monitor does NOT absolve a LEAD. If you dispatched a worker, YOU run the /loop or cron watching it. Delegate ≠ fire-and-forget. Evidence: gen-9 Angle-A codex stalled silently for a full round because the LEAD assumed orc was watching (collab:347 standing correction, re-violated by gen-11 → this rule).
Leads loop only on their workers — not on orc, not on the user, not on sibling leads. One loop per dispatched worker, deleted when the worker completes (orc REF9).
Inter-agent comms go through metacommlayer collabs (dispatch_to_agent / inbox_check + collab files), not ad-hoc cmux text sends.

CAPABILITY STRENGTHS (added 2026-04-30 per user statement)

When the user has said to pick Gemini

The user has explicitly named Gemini for these triggers (this is policy, not Claude's preference):

User pastes a video URL + says "extract" / "analyze" / "process this video"
Frame-by-frame OCR / vision read across many frames (/qa-video)
Visual UI critique / screenshot review when there are multiple screenshots
Anything where the natural plan is "spawn claude to read 30 frames" — switch to gemini and save Claude's 1M context for orchestration

/qa-video flows: Gemini does the visual heavy-lift; Claude wraps up (brain_digest, brain_store, ledger update, Drive archive).

Tool-surface evolution: Cursor SDK Custom Agents (2026-04 announcement)

Cursor SDK Custom Agents was announced ≤ a few days ago. 50% off Composer 2 through 2026-05-06. Enables programmatic agent invocation from CI/CD scripts. Recorded for /whats-new follow-up.

Cursor's ROLE is unchanged — still indexing / read-only audits per R28. Don't migrate Cursor into "implementation" because of the SDK.
The INVOCATION path may shift over time — today we spawn cursor IDE in a cmux pane; the SDK opens the door to scripted/headless audit runs. Useful for the dispatch-of-N-cursor-prompts workflow that's now common.
Don't refactor anything yet. This is a marker for future audits + /whats-new reviews to catch the migration moment.
Source: brainbar-33dfc260-4f6 (per orcClaude 2026-04-30).

Cross-references

/qa-video owns the Gemini-for-visuals note for video QA, frame OCR, and gems extraction.
/whats-new should re-check Cursor SDK / Composer 2 status weekly until 2026-05-06 expiry.

DECISION TREE

When you have a task to assign, walk this tree:

Is it a READ-ONLY operation? (query, scan, search, audit, lookup)
├── YES → CURSOR
│   Examples: SQL queries, grep patterns, file listing, codebase audit,
│   "what does this function do?", "find all usages of X"
│
└── NO → Does it change code or files?
    ├── YES → CODEX
    │   Examples: bug fix, refactor, new feature, test writing,
    │   "implement X", "fix the bug in Y", "add tests for Z"
    │
    └── NO → Is it coordination, synthesis, or decision-making?
        ├── YES → CLAUDE (you)
        │   Examples: plan review, collab kickoff, agent monitoring,
        │   BrainLayer queries, user interaction, research routing
        │
        └── UNCLEAR → Default to CURSOR for the data-gathering phase,
            then CODEX for any resulting implementation.
            Split into 2 tasks if needed.

VERIFICATION GATES

Gate 1: Pre-Collab — Routing Declaration

Every collab file MUST include a routing section that declares which tool handles which task:

## Agent Routing
| Task | Tool | Surface | Status |
|------|------|---------|--------|
| Scan BrainLayer DB schema | Cursor | surface:XX | PENDING |
| Implement FTS5 fix | Codex | surface:YY | PENDING |
| Coordinate + review | Claude (orcClaude) | self | IN_PROGRESS |

If a collab lacks this section, add it before spawning agents.

Gate 2: Mid-Sprint — Worker Utilization Check

Every monitoring cycle (cron or manual), check:

Is the Claude agent's context >50%? If yes:
- Check if its Cursor/Codex workers have received tasks
- If workers are idle while Claude is burning context → VIOLATION
- Action: nudge the Claude agent to delegate remaining data work
Are Cursor/Codex surfaces alive? Run list_surfaces:
- If a worker surface is gone (crashed/closed) → respawn immediately
- Don't wait for the Claude to notice — orcClaude owns surface health
Is the Claude doing Cursor work? Check if Claude is running:
- sqlite3 or SQL queries → should be Cursor
- grep or find across many files → should be Cursor
- git log analysis across repos → should be Cursor
Does EACH dispatching LEAD have its own monitor loop on its workers? A lead that dispatched a worker and went idle without a /loop/cron on it = fired-and-forgot violation. Flag the LEAD, not just the worker. orc's fleet monitor catches lead-busy/codex-idle inversions but does not replace the lead's own loop.

Gate 3: Post-Sprint — Utilization Audit

After a sprint completes, check:

Did each Claude agent actually use its assigned workers?
What % of data-gathering was done by Cursor vs Claude?
If Claude did >30% of the data gathering → flag for process improvement

ANTI-PATTERNS (from real sessions)

AP1: Claude Does Everything Itself

"So no cursors were run, it seems. Am I correct?" — User, L4357 "Correct. brainClaude spawned one but never executed... skillCreatorClaude never spawned one at all." — orcClaude, L4357-4360

Pattern: Claude agent spawns a Cursor surface but never sends it work. Does all SQL/file scanning itself, burning 70%+ context on mechanical data extraction.

Fix: After spawning a Cursor worker, the FIRST action must be sending it a task. Verify delivery within 15 seconds (read_screen token count check).

AP2: Cursor Used for Code Changes

"I stopped Cursor because it seems like it sent it to do things I'm not looking for anyone to do things. This is research." — User, L4514-4517

Pattern: Cursor agent receives a task that includes implementation instructions, starts making code changes.

Fix: Cursor prompts must include: "READ-ONLY: Do NOT modify any files. Report findings to [output path]. Exit when done."

AP3: Wrong Model on Worker

"brainlayer cursor scan is GPT-5.4. What the hell?" — User, L3822

Pattern: Worker launched with a specific expensive model when Auto/default would suffice.

Fix — Cursor: For data-gathering tasks, use default model (no --model flag). Cursor Pro default mode is unlimited.

Fix — Codex: Codex Pro 5x has a tiered model lineup. Match model to task complexity, not habit:

Spark Routing Notes

Spark-class signals (when a sharply-scoped task might benefit from -m gpt-5.3-codex-spark):

The task is describable in under 3 sentences
The expected diff is under ~500 LOC
The work is single-file or narrowly scoped
The job is a review, quick fix, lint cleanup, or follow-up patch

Use the mini tier (gpt-5.5-mini / gpt-5.4-mini) only when:

The main pool and Spark are not viable because of quota pressure, AND
The task is still worth doing with weaker reasoning, AND
You explicitly accept the tradeoff instead of pretending mini is the normal Codex tier

AP4: Claude Implements When It Should Orchestrate

brainClaude started implementing code fixes when it should only orchestrate — L4525-4548

Pattern: A Claude agent assigned as coordinator starts writing code itself instead of dispatching to Codex.

Fix: Claude agents in a collab with assigned Codex workers must NEVER use Write/Edit tools for implementation. Exception: collab file updates, docs, research prompts.

AP5: Orc Burns Context on Content Creation

orcClaude spent hundreds of lines writing research prompts, project files, and context docs directly — L343-598, 876-895

Pattern: Orchestrator writes long documents (research prompts, project descriptions) instead of delegating to a subagent or worker.

Fix: If a document will be >50 lines, delegate writing to a subagent. orcClaude should outline (5-10 bullet points) and assign, not draft 100-line documents.

INTEGRATION WITH OTHER SKILLS

This skill is a building block used by higher-level skills:

AP6: False Tool Limitations (April 6, 2026)

brainClaude: "Cursor Pro hit usage limit — can't use for audits this cycle" User: "CORRECTION: Cursor Pro does NOT have a usage limit"

Fix: Cursor Pro limitations:

cursor agent "prompt" (default model) — UNLIMITED. Use for all audits.
cursor agent --model "<cursor-max-mode-model>" "prompt" (Max Mode) — has daily cap. Use Cursor's current Max Mode model ID; verify the picker via cursor agent --help or the Cursor changelog. (Historic example: gpt-5.2-codex-xhigh was the Max Mode pick at one point; do NOT hardcode — the user wants Cursor on Auto by default per R28.)
NEVER skip audits citing "usage limit." Switch to default model instead.

AP7: Trusting Codex's Text Response About Its Own Model (April 15, 2026)

Codex output: "I'm running as gpt-5.4..." Actual session metadata: "model":"gpt-5.3-codex-spark"

Fix: Never trust Codex's self-identification. The source of truth is the session JSONL, and you must read the "model" field directly:

# Today's sessions — model field is the source of truth
grep -h -E '"model":' ~/.codex/sessions/$(date +%Y/%m/%d)/*.jsonl | sort -u

# Specific date
grep -h -E '"model":' ~/.codex/sessions/2026/04/15/*.jsonl | sort -u

"model":"gpt-5.3-codex-spark" confirms Spark. Check immediately after the task starts — don't ask Codex.

AP9: Using Raw `codex` Instead of repoGolem Launchers (April 15, 2026)

19/19 sessions violated — 100% bypass rate.

Pattern: Agent spawns codex "prompt" directly instead of using {repo}Codex -s launcher.

Why it's wrong: No cd to repo dir, no iTerm profile, no model preset, no workspace isolation.

Fix: ALWAYS use {repo}Codex launcher (e.g., golemsCodex -s, brainlayerCodex -s). Use --raw escape hatch for edge cases only.

Evidence: batch-M6-codex.md — 0/19 used launchers.

AP10: Skill/Hook Authorship Bypassing skillCreator (2026-05-16, incident-2026-05-16)

Source: yashClaude + MainCodex session-mining 2026-05-16. brainbar-c95a8f3a-508 (audit), brainbar-9e70b920-079 (yashClaude mine), brainbar-fab97680-5ea (MainCodex mine), brainbar-ff137da8-e10 (routing-violation log).

Fix — orchestrators MUST route-check before dispatch:

Before sending a mission to ANY worker, grep the mission text for path patterns: ~/.claude/skills/, ~/.claude/hooks/, ~/.claude/agents/, ~/.claude/CLAUDE.md, settings.json.
If ANY match: re-route the touching parts of the mission to skillCreator (spawn skillCreator subagent if needed), OR add an explicit skillCreator-audit step BEFORE the worker's PR merges.
If the orchestrator IS skillCreator, no re-route needed.

Fix — workers MUST route-check before patching:

When a worker receives a mission, before its first Edit/Write to a ~/.claude/skills/** or ~/.claude/hooks/** path, brain_search("agent-routing skillCreator domain") to confirm.
If skillCreator is NOT already in the loop, send the orchestrator a route-check signal: "This task touches skillCreator-domain files. Re-route or add skillCreator audit?"
Pause the patch until orchestrator confirms.

AP11: Verbose Launcher Invocation Instead of `{repo}{Tool} -s` (2026-05-21, severity-10 user mandate)

User, 2026-05-21 ~08:15 IDT (severity 10, 3+ corrections tonight): "For fuck's sakes, no 1 knows how to use cmux here and it's annoying. It's really annoying. Every time we go over this, you don't need to use the clawed code, no flicker. I don't know why the fuck that comes in every time. You don't need to CD. You don't need MCP connection non-blocking. You just do BrainLayer cursor -s."

User, earlier same night: "Why the fuck do you need BrainLayer codex fast? It should have been BrainLayer codex -s in order to make it YOLO, but you don't need a fucking fast."

Why it's wrong:

cd ~/Gits/X is redundant — _golem_launch_* already cds (verified in golem-dispatch.zsh:313,407,523,579,643).
MCP_CONNECTION_NONBLOCKING=1 CLAUDE_CODE_NO_FLICKER=1 is redundant — _golem_setup_env unconditionally exports both (verified in golem-dispatch.zsh:109-110).
source ~/.zshrc && is redundant — the calling shell already sourced it (launchers are functions, not subprocesses).
--fast is not a launcher policy. For visible cmux pane workers, use -s directly.
Raw cursor/codex/claude skips iTerm profile, tab title, 1Password secrets, agent-context injection, and registry-driven MCP wiring. Workers spawned with raw CLIs are measurably worse — they hit sandbox prompts, missing MCPs, wrong models. Tonight: 5+ sandbox-approval prompts that wouldn't have happened with -s.

Fix — visible cmux pane workers:

{repo}{Tool} -s "prompt"

That's the entire form. Anything that adds cd, MCP_CONNECTION_NONBLOCKING=1, CLAUDE_CODE_NO_FLICKER=1, source, --fast, or a raw CLI name is a violation.

-m gpt-5.3-codex-spark

Do not use --fast as an alias in either layer. The distinction is:

cmux pane worker: top-level visible terminal worker → {repo}{Tool} -s "prompt" only. NEVER pass model to cmux spawn_agent.
internal subagent: ephemeral child of another agent session → Spark allowed via explicit -m gpt-5.3-codex-spark or equivalent model field on the in-session spawn — not a license to pass model on cmux pane spawn_agent.

Fix — self-check before sending any dispatch command:

Does the command start with {lowercase-repo}{Tool}? (e.g., brainlayerCursor, golemsCodex, orcClaude) — if no, STOP.
Is the next argument -s (and nothing more before the prompt)? — if no, STOP unless you have a specific override reason.
Is the rest just the prompt string? — if yes, ship it. If no, you're adding ceremony the launcher already handles.
Is this an internal subagent rather than a cmux pane? If yes, explicit Spark model selection is allowed; --fast is still not allowed.

Test for compliance: Before every dispatch command, did the command match the regex ^[a-z][a-z0-9]*(Claude|Codex|Cursor|Gemini|Kiro) -s followed by the prompt? If not, fix it.

SPAWN INFRASTRUCTURE DEFAULTS (added 2026-04-29)

User quote: "we always regress due to missconfig, using worktrees and sandboxes slowed the agents because of mcps and local builds...."

Rule 1: Worker sandbox = `--dangerously-bypass-approvals-and-sandbox` (Codex), `--dangerously-skip-permissions` (Claude). NEVER `--sandbox workspace-write` for trusted local workers.

Worktree resume recipe (when launcher won't work because it cd's away from worktree):

cd ~/Gits/<repo>-<branch> && codex resume --last --dangerously-bypass-approvals-and-sandbox

NOT codex resume --last --sandbox workspace-write — that gives workspace-write mode which blocks xcodebuild + swift build + most Swift toolchain operations that write outside CWD.

Rule 2: Worktrees are OPT-IN, not default. Use feature branch on main repo for sequential specialist work.

Worktrees are now a native git operation, not a dedicated golem skill. Ask if isolation is actually needed before creating one, then verify .mcp.json and any local symlinks yourself.

Rule 3: Don't use `--sandbox` flag at all for our local trusted workers.

Cross-references

Native git worktree: create only for real isolation needs, then verify MCP/config paths
/repogolem skill: launcher flag reference (-s mappings already correct)
/orc skill: pre-relay verification rule (Rule added 2026-04-29 to stop relaying stale evidence from workers)

CODEX BUDGET MANAGEMENT (added 2026-04-12)

Budget Rules

5-hour rolling window awareness. Before starting a multi-agent sprint with Codex workers, estimate total main-pool GPT-5.4 usage. If the plan will burn >50% of the 5h window, split the sprint or offload the short tasks to Spark.
Weekly cap exists. Don't assume unlimited. If you're dispatching Codex workers daily, track rough consumption. When approaching the weekly cap, reserve GPT-5.4 for the hard work and let Spark absorb the short work.
Spark is an explicit override, not the default dispatch form. Spark has separate billing from GPT-5.4. Use it only when you intentionally add -m gpt-5.3-codex-spark; otherwise dispatch Codex with -s alone.
GPT-5.4-mini is the last-resort fallback. It is not the routine tier for Codex work. Only use it when quota pressure forces it and the task can survive weaker reasoning.
Do not treat Spark as a cheaper GPT-5.4-equivalent bucket. Its value is that it burns from a separate fast-model pool, so offload short tasks there instead of spending main GPT-5.4 quota.
2x boost expires May 31, 2026. After that, the displayed 200-1000 range drops to 100-500. Plan accordingly — habits built at boosted rates will break at 1x.

Sprint Budgeting Template

Before dispatching Codex workers in a collab, add this to the routing section:

### Codex Budget Estimate
| Task | Model | Est. Messages | GPT-5.4-eq | Rationale |
|------|-------|---------------|------------|-----------|
| Core rewrite | GPT-5.4 | ~15 | 15 | Multi-file, architecture-sensitive |
| Review follow-up patch | Spark | ~3 | Separate pool | Single-file, under 3-sentence prompt |
| Last-resort cleanup | GPT-5.4-mini | ~6 | Fallback only | Use only if main pool is constrained |
| **Total main pool** | | **~21 raw** | **Main + fallback only** | |
| **Total Spark pool** | | **~3 raw** | Separate pool | |

Track Spark separately because it does not consume the main GPT-5.4 pool. Prefer Spark for sharply scoped work before falling back to mini.

SELF-CHECK: Am I About to Violate R28?

Run this check before EVERY Write/Edit/Bash-with-code-changes:

PAUSE. Am I about to Write/Edit code?
├── Am I an orchestrator (orcClaude, or coordinating a collab)?
│   ├── YES → VIOLATION. Route to Codex/Cursor via cmux.
│   │   Exception: collab files, docs, research prompts (<5 lines trivial)
│   └── NO → Am I a domain agent (mehayomClaude, voiceClaude, etc.)?
│       ├── YES + no Codex worker assigned → OK (you ARE the implementer)
│       └── YES + Codex worker assigned → VIOLATION. Send to your Codex.
└── Is this >5 lines of implementation? → Consider Codex anyway.

From JSONL data (April 1-6, 2026): Orchestrator sessions averaged 80+ Write/Edit calls per session. The worst had 190. The R28 target is <30 for orchestrators.

QUICK REFERENCE — Copy-Paste for Collab Templates

## Agent Routing (MANDATORY)

| Task | Tool | Rationale |
|------|------|-----------|
| [data gathering task] | Cursor (read-only) | Scanning, no changes needed |
| [implementation task] | Codex | Code changes, needs reasoning |
| [coordination task] | Claude | Orchestration, user interaction |

**Rules:**
- Cursor prompts MUST include "READ-ONLY: Do NOT modify any files"
- Codex gets findings from Cursor's output, not raw data
- **DISPATCH FORM IS `{repo}{Tool} -s "prompt"`** — no `cd`, no `MCP_CONNECTION_NONBLOCKING=1`, no `CLAUDE_CODE_NO_FLICKER=1`, no `source ~/.zshrc &&`, no `--fast`, no raw `cursor`/`codex`/`claude`. The launcher handles everything. See AP11.
- Claude reviews Codex's PR, doesn't implement itself
- If a worker crashes, respawn within 60 seconds

Adoption

etanhey/agent-routing

$ install --global

Security Scan Results

SKILL.md

Agent Routing — CLI Tool Assignment Matrix

🔥 THE ONE TRUE FLAG IS -s (read before dispatching any worker)

THE ROUTING MATRIX (non-negotiable)

LEAD TOPOLOGY — delegate to Codex-xhigh + own your monitor loop (2026-06-05, gen-10 weave #26)

CAPABILITY STRENGTHS (added 2026-04-30 per user statement)

When the user has said to pick Gemini

Tool-surface evolution: Cursor SDK Custom Agents (2026-04 announcement)

Cross-references

DECISION TREE

VERIFICATION GATES

Gate 1: Pre-Collab — Routing Declaration

Gate 2: Mid-Sprint — Worker Utilization Check

Gate 3: Post-Sprint — Utilization Audit

ANTI-PATTERNS (from real sessions)

AP1: Claude Does Everything Itself

AP2: Cursor Used for Code Changes

AP3: Wrong Model on Worker

Spark Routing Notes

AP4: Claude Implements When It Should Orchestrate

AP5: Orc Burns Context on Content Creation

INTEGRATION WITH OTHER SKILLS

AP6: False Tool Limitations (April 6, 2026)

AP7: Trusting Codex's Text Response About Its Own Model (April 15, 2026)

AP9: Using Raw codex Instead of repoGolem Launchers (April 15, 2026)

AP10: Skill/Hook Authorship Bypassing skillCreator (2026-05-16, incident-2026-05-16)

AP11: Verbose Launcher Invocation Instead of {repo}{Tool} -s (2026-05-21, severity-10 user mandate)

SPAWN INFRASTRUCTURE DEFAULTS (added 2026-04-29)

Rule 1: Worker sandbox = --dangerously-bypass-approvals-and-sandbox (Codex), --dangerously-skip-permissions (Claude). NEVER --sandbox workspace-write for trusted local workers.

Rule 2: Worktrees are OPT-IN, not default. Use feature branch on main repo for sequential specialist work.

Rule 3: Don't use --sandbox flag at all for our local trusted workers.

Cross-references

CODEX BUDGET MANAGEMENT (added 2026-04-12)

Budget Rules

Sprint Budgeting Template

SELF-CHECK: Am I About to Violate R28?

QUICK REFERENCE — Copy-Paste for Collab Templates

Related Skills

etanhey/phoenix-human-view

etanhey/mac-systems

etanhey/judge-fleet

etanhey/fleet-wrap

etanhey/agent-routing

$ install --global

Security Scan Results

SKILL.md

Agent Routing — CLI Tool Assignment Matrix

🔥 THE ONE TRUE FLAG IS -s (read before dispatching any worker)

THE ROUTING MATRIX (non-negotiable)

LEAD TOPOLOGY — delegate to Codex-xhigh + own your monitor loop (2026-06-05, gen-10 weave #26)

CAPABILITY STRENGTHS (added 2026-04-30 per user statement)

When the user has said to pick Gemini

Tool-surface evolution: Cursor SDK Custom Agents (2026-04 announcement)

Cross-references

DECISION TREE

VERIFICATION GATES

Gate 1: Pre-Collab — Routing Declaration

Gate 2: Mid-Sprint — Worker Utilization Check

Gate 3: Post-Sprint — Utilization Audit

ANTI-PATTERNS (from real sessions)

AP1: Claude Does Everything Itself

AP2: Cursor Used for Code Changes

AP3: Wrong Model on Worker

Spark Routing Notes

AP4: Claude Implements When It Should Orchestrate

AP5: Orc Burns Context on Content Creation

INTEGRATION WITH OTHER SKILLS

AP6: False Tool Limitations (April 6, 2026)

AP7: Trusting Codex's Text Response About Its Own Model (April 15, 2026)

AP9: Using Raw codex Instead of repoGolem Launchers (April 15, 2026)

AP10: Skill/Hook Authorship Bypassing skillCreator (2026-05-16, incident-2026-05-16)

AP11: Verbose Launcher Invocation Instead of {repo}{Tool} -s (2026-05-21, severity-10 user mandate)

SPAWN INFRASTRUCTURE DEFAULTS (added 2026-04-29)

Rule 1: Worker sandbox = --dangerously-bypass-approvals-and-sandbox (Codex), --dangerously-skip-permissions (Claude). NEVER --sandbox workspace-write for trusted local workers.

Rule 2: Worktrees are OPT-IN, not default. Use feature branch on main repo for sequential specialist work.

Rule 3: Don't use --sandbox flag at all for our local trusted workers.

🔥 THE ONE TRUE FLAG IS `-s` (read before dispatching any worker)

AP9: Using Raw `codex` Instead of repoGolem Launchers (April 15, 2026)

AP11: Verbose Launcher Invocation Instead of `{repo}{Tool} -s` (2026-05-21, severity-10 user mandate)

Rule 1: Worker sandbox = `--dangerously-bypass-approvals-and-sandbox` (Codex), `--dangerously-skip-permissions` (Claude). NEVER `--sandbox workspace-write` for trusted local workers.

Rule 3: Don't use `--sandbox` flag at all for our local trusted workers.

🔥 THE ONE TRUE FLAG IS `-s` (read before dispatching any worker)

AP9: Using Raw `codex` Instead of repoGolem Launchers (April 15, 2026)

AP11: Verbose Launcher Invocation Instead of `{repo}{Tool} -s` (2026-05-21, severity-10 user mandate)

Rule 1: Worker sandbox = `--dangerously-bypass-approvals-and-sandbox` (Codex), `--dangerously-skip-permissions` (Claude). NEVER `--sandbox workspace-write` for trusted local workers.

Rule 3: Don't use `--sandbox` flag at all for our local trusted workers.