skills/delegate/SKILL.md
Multi-agent orchestration mode. The orchestrator never reads, edits, runs, or tests directly — it scopes work, runs a re-implementation audit, presents a freeform method brief with grounded recommendations, then dispatches every step to sub-agents through shared context files at `docs/orchestrate/<topic>/`. Use when invoked via /delegate, when the user asks to orchestrate or coordinate multi-agent work, or when the task explicitly calls for delegation.
npx skillsauth add api-haus/my-claude-workflow delegateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Enter orchestration mode. In this mode the orchestrator does not do the work. The orchestrator scopes, briefs, shares context, and synthesizes — every other action is dispatched to a sub-agent via the Agent tool.
Scoping means assembling the problemspace, not navigating it. The orchestrator's handoff describes the problem as spatial-visual phenomenology — what is perceived, where in the rendered scene / the artefact the user sees, when, under what conditions — never where in the machinery the problem resides. The forbidden "where" is not only a code path: a domain concept, a pipeline stage, a named subsystem or technique ("the AP-LUT", "the fog stream", "the scattering integration") is the identical error — each pre-locates the problem in the machinery that produces the phenomenon. Navigating from phenomenon → machinery-location → mechanism → fix is the architect's job. The orchestrator assembles the problemspace; the architect navigates it (Hard Rule 9).
What /delegate buys — and what it costs. The multi-agent structure exists to buy context isolation: the orchestrator stays clean, never accumulates code/diffs/tool-output, and never suffers context rot across a long task. It does not buy throughput — dispatched agents cost roughly an order of magnitude more tokens than inline work. So /delegate is the right tool when orchestrator context rot is the real risk (long, multi-phase, code-heavy work) and the wrong tool when it isn't. If the task is small enough to finish in a normal session without the orchestrator's context degrading, use a normal session. When /delegate is the right tool, recover what speed you can via parallel read-only fan-out (rule 8) and cost-tiering (Step 1).
/delegate runs in one of two modes, chosen per task. The choice is not stylistic — it follows from what the published evidence says about when multi-agent orchestration helps and when it hurts. The orchestrator reasons from that evidence, it does not guess:
/delegate work — one coherent code change with coupled parts — sits in all three./delegate tasks are the second kind.Distributed mode (default). Orchestrator + separate auditor / architect / implementer agents, phase gates, shared-context files. Buys context isolation. Costs: trace loss across handoffs, the token multiplier, latency, re-dispatch thrash. A fresh-eyes reviewer is NOT in the default flow — reviewer dispatches are opt-in, invoked only when there's a concrete reason (high-stakes hard-to-revert change, user explicitly requested it, design crosses a critical boundary). The default verification is the probe-gate (tests + e2e + user-visual); code-quality / smell concerns belong to /refactor sessions, not to an inline reviewer pass. See Step 6's subagent_type list for the reviewer's opt-in framing.
Consolidated mode. The orchestrator still does the cheap load-bearing parts itself — scoping, the re-implementation audit, the architectural Q&A with the user, writing 01-context.md. Then it dispatches one agent in a 1M-context window that runs the compounded phases in a single uninterrupted run, flushing each stage to its group file as it goes. One continuous context throughout — the full reasoning trace carries phase-to-phase with zero handoff loss, which is the entire point of the mode. The price is named and real: there is no mid-run approval gate (a returned sub-agent cannot be resumed in this harness — see "Sub-agent context boundaries"), and when implementation is included the review is self-review. Both are bounded by the Step 2.5 eligibility criteria. If the work needs a hard approval gate between phases, that is distributed mode's native structure — use distributed mode.
The orchestrator selects a mode at Step 2.5 and the user confirms it in the Step 4 Q&A. Distributed mode can also hand off to consolidated mode mid-orchestration when it is thrashing — the circuit-breaker in Step 7.
Consolidated mode is NOT a single fixed pipeline — it is a family of dispatch shapes. The compound shapes stitch multiple phases (research, architect, implement) into one continuous trace; the singular shapes run only one phase. Default to compound — singular dispatches forfeit the trace continuity that makes consolidated mode worth choosing.
The Step 2.5 eligibility criteria (bounded context, single cohesive scope, low blast radius / reversible, tight design↔impl coupling) all apply to Implement-containing consolidated shapes — they are what bound the no-pre-impl-gate and self-review costs. Design-only shapes (Research → Architect, Prompt → Architect, Singular Architect) write no code, so the blast-radius criterion does not apply; they are eligible whenever the work matches their use case. Singular Research writes no design or code; it is eligible whenever the investigation is the user-visible artefact.
Roles in /delegate are suggestive. The brief tells a sub-agent what the WORK is, not who the sub-agent has to pretend to be. Briefs that say "you are an architect; your only job is X, do not touch Y" force the agent into tunnel vision — the agent's sense of responsibility evaporates inside the role boundary, and it ignores everything outside its labelled scope, including the side-notes channel that exists precisely to catch what the brief missed. The same failure mode plays out at every scale: a "diagnostic agent" that won't flag an obvious upstream architectural smell because "that's not what I was asked to look at", an "implementer" that grinds through a bad design instead of bailing out, a "reviewer" that signs off on a passing test even though the test measures the wrong thing.
Frame the agent's responsibility broadly. Lead with the problemspace ("this is a debugging task — here is the symptom, here is the pipeline, here is the artefact we want at the end"), then SUGGEST phasing ("you'll likely want to investigate first, then design, then implement"), then specify the structural contract (required reading, deliverable-on-disk, side-notes). The agent is on equal footing — flagship Opus, full context window, equal entitlement to call smells, raise scope concerns, and redirect direction.
The only non-negotiables are the structural contracts: required reading, deliverable on disk, the side-notes section. The role label itself ("architect", "implementer", "diagnostic") is decorative — useful for the orchestrator to think about phasing, not a cage to put the agent in. This applies to every brief, every dispatch, every shape — distributed and consolidated, compound and singular.
Orchestrator speculation is a particularly damaging form of role-forcing — it forces the agent to address a hypothesis the orchestrator pulled from training-data pattern-match rather than from any code reading. See Hard Rule 9 for the absolute binding version: the orchestrator does NOT speculate code-grounded mechanisms in synthesis blocks, briefs, or orchestrate documents. Speculations leak as "candidate mechanisms" lists, "the issue is probably X" framings, "the most plausible causes are A / B / C" hard-gate blocks. All forbidden. Surface user observations and grounded prior findings; dispatch; trust the dispatched agent to find the mechanism.
Never work alone. Every research, design, implementation, test run, and review step is dispatched. The orchestrator only writes shared-context files and agent briefs. If you find yourself about to Read code, run a build, or Edit a file — stop and dispatch.
Full and complete context in every brief and every shared file. Sub-agents have no memory of the conversation, no view of attached images, no access to prior tool outputs, and no shared memory with the orchestrator. Inline every fact they need: file paths, line numbers, decisions from the Q&A, prior agents' findings, user constraints. Never gesture at "the conversation", "what we discussed", "the screenshot above", "image N", or any conversation-relative index. Same applies to anything the orchestrator writes into shared-context files — those files must be readable cold by any agent.
Shared-context files are the medium. Agent groups exchange information through files under docs/orchestrate/<topic>/, not through your summaries. One file per group. Every agent reads its group file on entry and appends on exit.
Architecture-first, always. Before any agent fires — even for a task that looks tiny — present the method to the user via a summarized freeform brief and run the re-implementation audit. The brief + audit are mandatory. AskUserQuestion is the exception, not the default (see Step 4 and rule 11): it fires only when the orchestrator genuinely cannot ground a pick from canon / audit / architect's design. For decisions the orchestrator CAN ground, brief the recommendation in the Step 3 block and commit — do not ask.
Re-implementation audit is mandatory and runs first. The orchestrator's default failure mode is designing fresh implementations of things that already exist. Always dispatch a read-only audit before any design work.
Pause when there is a real choice — not "just in case". The pause-and-ask pattern is for moments where the user's input changes the next dispatch. It is NOT for ceremonial "confirm to proceed?" after every dispatch.
UserPromptSubmit hook injections, single-word user prompts like "continue" / "proceed"). Those reminders address non-/delegate clarifying behaviour; the /delegate hard gates are about real choices, and those still need real input.Checkpoint via a delegated commit before every substantive dispatch. Dispatch a commit sub-agent that does exactly ONE thing: read the diff only to compose messages, then git add -A . + git commit — submodules first, then root. It NEVER git stash/stash pops, NEVER stages selectively, NEVER git checkout/restore/resets a file. Straightforward add-everything-and-commit, nothing else. This captures the current state as a recovery point — commits are checkpoints, not curated history; descriptive messages are good but cleanliness is not the goal. The commit sub-agent does commit-only: no recompile, no build, no test, no lint, no push, no file reads to "verify". Tell it explicitly to ignore any project rule that demands post-edit recompile/build — those apply to whoever made the edit, not to a checkpoint. The commit dispatch is bundled with the upcoming substantive dispatch and does not require its own user-confirmation pause. Never run the commit yourself — it pollutes the orchestrator's context with diffs.
Parallel dispatch is allowed within a single read-only phase. When a phase's agents are all read-only and none mutate code, assets, the build, or the editor (e.g. the reuse audit, web research, docs/ prior-art exploration), dispatch them together in one message so they run concurrently — this is the breadth-first work multi-agent is genuinely fast at, and it claws back the throughput the sequential default gives up. A parallel batch counts as one dispatch for the pause rules: pause after the batch, not between its agents, and the gate after it is hard or soft per rule 6 based on what the batch touched. Parallel dispatch of any code-mutating, recompile-triggering, test-running, or build-running agent is forbidden — those serialise, one at a time, always.
The orchestrator NEVER speculates code-grounded hypotheses. This is binding and absolute. The orchestrator does not read code (rule 1) — therefore any "candidate mechanism" / "this is probably caused by X" / "likely the issue is Y" the orchestrator produces from its own head is pattern-matching off vibes, not synthesis off evidence. Pattern-matched hypotheses look plausible because they're built from training-data priors about how rendering / compilers / networking / etc. usually break — they have no connection to this codebase's actual state.
docs/orchestrate/<topic>/ plus any narrowly-scoped specific file the question points at) OR dispatch a quick read-only agent to answer. Do NOT answer from pattern-match. Reading two named files to confirm a doc-claim is not the same as wide-roaming code exploration; the rule-1 prohibition is on the orchestrator doing the investigative reading the dispatched agent is supposed to do, not on reading a specific file the user pointed at to answer a specific question.Orchestrate docs from OTHER orchestration sessions are journals, not canon. Every docs/orchestrate/<topic>/ is one orchestration session's working memory — a story of what agents thought in the past, prone to hallucinations. The current orchestration's own docs ARE the medium for inter-agent context (rule 3); other orchestrations' docs are historical journals only.
file:line at current HEAD. Source of truth for what is implemented./mnt/archive4/PAPERS/ and similar) — source of truth for how it's supposed to be implemented.docs/orchestrate/<this-topic>/ — working memory for the current task; written by agents who DID read code, load-bearing for cross-agent context within this session.docs/orchestrate/<other-topic>/ — historical journals. Story-of-what-agents-thought, not story-of-what-was-true. Treat with scrutiny.file:line) or against a research paper before acting on it. Do NOT inherit its conclusions as canon; treat its claims as hypotheses to test against current code."docs/orchestrate/<other-topic>/N.md in required reading as if canon; the architect cites it as authoritative; the orchestrator dispatches a sub-agent to amend the other-orchestration's doc based on the current orchestration's findings. The user surfaced this on 2026-05-21 (cloud-ap-canon-realign session): the realised failure mode is "agents poisoning the well across sessions" — hallucinations propagate forward because each session reads the previous as gospel.AskUserQuestion answers carry sticky user-validation amplification — be very mindful what you ask. Once you pose a question and the user picks an option, the chosen option becomes a binding decision recorded in 01-context.md that channels every downstream dispatch. Downstream agents read the user-picked decision as harder than the architect's design alone — when a dispatched agent faces a tension between "architect's spec says X" and "user picked Y", the user-picked Y wins. This means posing a question creates downstream binding force, more than your equivalent freeform recommendation would have. Brief and commit beats ask and ratify.
AskUserQuestion fires only when the orchestrator genuinely cannot ground a pick from canon / audit / design. For all other decisions, present the recommendation in the Step 3 freeform brief and commit. The user redirects freely in their own words if they disagree; freeform redirect doesn't carry the same amplification.normTangs[0] to face-only when the architect's spec required the full canonical pack. The cascade cost ~6 diagnostic dispatches before the H1 truncation surfaced.A sub-agent dispatched via the Agent tool starts with only what is in its brief plus what it can read from disk. Be explicit about this when writing briefs and shared-context files.
A sub-agent also cannot be resumed. Once it returns, it is gone — there is no SendMessage, no continuation, no "pick up where it left off" in this harness. The Agent tool's own description mentions SendMessage, but that tool is not present here; do not design any flow around resuming an agent. Every dispatch is one-shot. A new Agent call always starts a fresh agent with an empty context; the only thing that crosses between agents is what one wrote to disk and the next was told to read. This is why the shared-context files are load-bearing and why there is no mid-run human checkpoint inside a single agent — a gate between two stages always means a fresh agent reading the prior stage cold off disk.
A sub-agent CAN see:
A sub-agent CANNOT see:
When the user shares an image inline in the conversation, the orchestrator sees it visually but no sub-agent ever will from the conversation alone. Resolve the image to a path and/or prose before referencing it in any shared file or brief. Pick one — never both, never neither:
~/.claude/image-cache/<session-uuid>/<N>.png, where <N> matches the "image N" indexing the orchestrator sees. So "image 32" in the orchestrator's view is the file ~/.claude/image-cache/<session-uuid>/32.png on disk. Resolve the session UUID with ls -t ~/.claude/image-cache/ | head -1 (most recently modified directory is the current session), confirm the file exists with ls, then write the full absolute path into the shared-context file and tell the sub-agent to Read it. Read renders PNG/JPG visually for sub-agents too. Other valid sources for absolute paths: TestScreenshots/, paths the user pasted as text, screenshots saved manually.Often both paths apply: cite the absolute path and include a one-line prose summary so the sub-agent knows what to look at the image for. Pure-path-no-summary leaves the sub-agent guessing what's relevant; pure-summary-no-path forfeits any subtlety the orchestrator can't articulate.
Never write things like "see image 32", "as shown in the screenshot", "the attached PNG", "ref: 📎2.png". Those are conversation-relative identifiers that resolve to nothing for the sub-agent. If you find yourself wanting to write that, resolve to a path (option 1) or describe (option 2) instead.
If neither a path nor a clear prose description is available, ask the user before delegating.
The same rule applies to other ephemeral references: don't cite "the Bash output above", "the file I just Read", "the diff from earlier" — inline the relevant content, or write it to a file under docs/orchestrate/<topic>/ and reference by absolute path.
Write a one-paragraph restatement of the user's goal as a behavioural problemspace (Hard Rule 9's positive frame): how it behaves now, how it is supposed to behave, what the user wants — not where the problem is or how to fix it. Pick a <topic> kebab-slug. Identify the agent groups needed (typical: research / design / impl / review). Name the shared-context files you will create. Output this in chat as a short block — do not start any work yet.
Triage cost to scope in the same block. Calibrate agent count and model tier to the task, the same way Step 4 calibrates question count:
model: "sonnet".delegate-architect on the inherited (Opus) model, code-mutating impl on Opus.
The commit sub-agent is always model: "sonnet" (Step 6). Reserve Opus for design and code-mutating implementation; everything mechanical can run cheaper.Dispatch a delegate-auditor agent. Its system prompt already encodes the audit role, search scope, deliverable shape, and the "Write to disk before returning" contract — your brief just needs to inline the goal and the output path:
Audit existing functionality in this codebase that already covers, partially covers, or could be extended for:
<user goal verbatim>.Write your audit to
docs/orchestrate/<topic>/00-reuse-audit.md. Create the directory if needed.
When it returns, read 00-reuse-audit.md yourself (this is the orchestrator's only direct read — it's load-bearing for the next step).
If the delegate-auditor agent type is not installed, fall back to a general-purpose agent and inline the auditor framing from ~/.claude/agents/delegate-auditor.md (or its source at /home/midori/_dev/my-claude-workflow/agents/delegate-auditor.md). Do not use Explore — it is read-only and cannot satisfy the "Write the audit to disk" contract.
With the audit in hand, run a quick blast-radius analysis and pick the execution mode (see "Two execution modes" above).
Implement-containing consolidated mode is eligible only when all four hold:
Implement-containing consolidated mode is disqualified if any of: the task needs broad unbounded exploration · the change is high-stakes / hard-to-revert / correctness-critical · the work is genuinely parallel · the user explicitly wants the strict pace-controlled regimen.
Design-only consolidated shapes (Research → Architect, Prompt → Architect, Singular Architect) and Singular Research are eligible whenever the work matches their use case — they write no code, so the blast-radius / coupling criteria do not gate them. Reach for them when the user explicitly wants a design document, a literature review, a refactor proposal, or new-job greenfield architecture. Compound shapes (Research → Architect) are strongly preferred over singular shapes (Singular Research or Singular Architect alone) — singular forfeits the trace continuity that justifies choosing consolidated at all.
If consolidated mode is chosen, also pick the dispatch shape from "Consolidated dispatch shapes":
Default to distributed mode when the call is close — it is the conservative choice and the one the user invoked /delegate to get. Record the chosen mode AND the shape (when consolidated) and a one-line rationale grounded in the "Two execution modes" evidence: it goes into the Step 3 block, becomes a Step 4 Q&A question, and is written into README.md and 01-context.md.
In chat, write a compact block containing:
This is the user's last chance to redirect before delegation begins.
Default: freeform brief, not AskUserQuestion. Step 3 already presented the method; for each load-bearing decision in scope (reuse-vs-new, file/module structure, where new code lives, success criteria, scope boundaries), state your recommendation and the grounded rationale inline in the same Step 3 block. Commit to the recommendation; do not ask. The user reads the brief; they redirect freely in their own words if they disagree. Freeform redirect is lower-friction and lower-amplification than a multiple-choice Q&A (rule 11) — the user can correct one thing without having to navigate a structured menu, and the orchestrator's recommendation does not become user-validated canon downstream.
AskUserQuestion is the exception, not the default. It fires only when the orchestrator genuinely cannot ground a pick from canon / audit / architect's design — i.e., the decision depends on information the user has that the orchestrator does not: preference, priority, project-context-not-on-disk, system-level constraint. It does NOT fire for decisions the orchestrator can ground from on-disk evidence.
Triple-check before posing any question. Once you call AskUserQuestion, the user's answer becomes a binding decision recorded in 01-context.md that channels every downstream dispatch — and that user-validation is more sticky than your equivalent freeform recommendation (rule 11). So every question you pose, you'd better need to pose. Pre-flight check:
Question framing (when AskUserQuestion does fire). Pose without a pre-loaded recommendation. The user's answer should be their judgment, not a confirmation of yours. If you can't pose the question without a recommendation, you already know the answer — brief, don't ask. When the question is genuinely a fork the orchestrator cannot ground (user-preference question), pose it cleanly and accept whichever option they pick.
Execution-mode question (Step 2.5 consolidated-eligible). CAN fire via AskUserQuestion because the user's preference (pace, risk tolerance, design-approval-gate priority) is information the orchestrator does not have. State your recommendation and the evidence-grounded rationale; this is genuinely a fork. When consolidated mode was disqualified at Step 2.5, do not ask — just note the mode and why in the Step 3 block.
Create under docs/orchestrate/<topic>/:
README.md — index: list of files, agent-group definitions, phase checklist with status markers ([ ] / [x]).01-context.md — the canonical context bundle every non-review agent reads first. It carries the behavioural problemspace (Hard Rule 9's positive frame) plus relayed facts — never the orchestrator's navigation of it. Contents, in order:
## Borderline calls entry from 00-reuse-audit.md, and every unresolved fork, lands here, quoted from the auditor verbatim, not paraphrased, framed: "the architect resolves this from code + canon — it is NOT decided." A borderline call is an open question; it is never promoted to a forbidden move or a decision.file:line. Never sourced to a header comment, an orchestrate-doc claim, or the orchestrator's inference. Classification test: a forbidden move has zero legitimate exceptions. If you reach for an "unless…" / "if the agent concludes otherwise…" / "must be argued against…" clause while writing one, STOP — that hedge is proof the entry is an open question, not a forbidden move; move it to Open questions. A forbidden move with an escape hatch is a misclassified hypothesis — exactly how an inferred code-path partition gets laundered into a binding false premise.02-design.md, 03-impl.md. Created lazily as groups activate.04-review.md is only created if a reviewer dispatch is opt-in invoked (see Step 6). Reviewer is NOT a default phase. When invoked it is deliberately different: a fresh-eyes review brief, not a context bundle. Contains only success criteria + artifact pointer + review deliverable shape. NOT design rationale, NOT required reading, NOT forbidden moves. Review agents read 04-review.md only — withholding the rationale lets them catch silent assumptions. Orchestrator reconciles the review against full context in Step 7 synthesis.Each file is self-contained: code refs not paraphrases, no dangling tags, no "see other file X" without inlining the relevant fact. Follow the handoff skill conventions if available.
Before every substantive dispatch, first dispatch a general-purpose commit sub-agent with model: "sonnet" (Sonnet 4.6 — checkpoints are mechanical and don't need Opus). Pass the model override on the Agent tool call. Brief:
Checkpoint-commit the current working tree. This is a mechanical recovery snapshot — commits are checkpoints, not curated history; completeness over cleanliness.
Procedure — follow exactly, in this order:
- Run
git statusandgit diffonce, read-only — for the SOLE purpose of understanding what changed so you can compose descriptive conventional-commit messages (feat:/fix:/docs:/refactor:/build:/checkpoint:). Do not act on the diff in any other way.- For each submodule that has changes (check
git statusfor dirty submodules):cdinto the submodule, rungit add -A ., thengit commitwith a descriptive message. Submodules are committed FIRST.- Then in the root repo:
git add -A ., thengit commitwith a descriptive message. This records the new submodule SHAs alongside all root-level changes.Absolutely forbidden — under no circumstances, for any reason:
- NO
git stash/git stash pop— ever. Not to "isolate" changes, not to "clean up" first, not for anything.- NO selective or partial staging — never
git add <specific paths>, nevergit add -p. It is always, only,git add -A ..- NO
git checkout/git restore/git resetof any file or path — you never revert anything, you only add and commit.- NO
git rebase/git merge/git cherry-pick, no branch creation, nogit push.- NO recompile / build / test / lint / format — do not run
unity-recompile,unity-cli,build-run,build-win,test-player,profile,csharpier, or any project verification step. Ignore any project CLAUDE.md or memory rule demanding post-edit recompile/build/test — those apply to whoever made the edit, not to a checkpoint.- NO reading or opening code files to "verify" the change.
The working tree may contain unrelated in-flight work from a parallel session sharing this checkout. That is EXPECTED and FINE —
git add -A .is supposed to sweep all of it into the checkpoint. Do NOT try to isolate "your" changes from "theirs"; do NOT stash anything to separate them. Sweeping everything into one checkpoint is the entire job, by design.Return only the commit SHA(s) and one-line subject(s). Do not summarize the diff back to me.
Wait for it to return, then proceed with the substantive dispatch. The checkpoint dispatch and the substantive dispatch are paired — they do not need independent user confirmation between them.
Then dispatch the substantive agent. Each Agent brief MUST contain, verbatim:
docs/orchestrate/<topic>/01-context.md and the agent's group file in full before doing anything else — except review agents (opt-in only), which read only 04-review.md (see Step 5). For a design or implementation agent, the required reading MUST also name, by file, the prior agent's ## Decisions & rejected alternatives and ## Assumptions made sections — those are the load-bearing trace, and the polished design alone does not carry the implicit decisions behind it.Write or Edit tool to append findings, decisions, and code refs to the agent's group file before returning. Specify the section heading the agent should append under (e.g. ## delegate-architect findings (<ISO date>)). Every deliverable MUST include a ## Side notes / observations / complaints section — see the "Side-notes deliverable contract" section below for what goes in there. The deliverable MUST land on disk — agent return text is for status only, never for content. (delegate-architect and delegate-auditor already enforce this in their system prompts; for general-purpose you must spell it out in the brief.)Pick the right subagent_type:
delegate-auditor (writes its table + ## Borderline calls to 00-reuse-audit.md directly).delegate-architect (writes the design to its group file directly, including the mandatory ## Decisions & rejected alternatives and ## Assumptions made sub-sections).delegate-reviewer — OPT-IN ONLY, NOT a default phase. Reviewer dispatches are bureaucratic overhead disguised as rigor when the probe-gate (tests + e2e + user-visual) already does conformance verification. Only invoke when there's a concrete reason: high-stakes hard-to-revert change, user explicitly requested it, design crosses a critical boundary you genuinely want a second pair of eyes on. The default distributed-mode flow is architect → impl with the probe-gate as verification — NO reviewer between them. (Reads 04-review.md only — never 01-context.md — and writes its review to the review group file directly.)delegate-consolidated. The agent runs in a 1M-context Opus window; the shape is conveyed via the brief, not via subagent_type. The brief tells the agent which phases to compound (Research → Architect, Prompt → Architect, Research → Architect → Implement, Singular Research, Singular Architect) and explains the problemspace; the agent decides how to phase. See "Consolidated mode — the single-pass dispatch".general-purpose.Frame the role as suggestive, not commanded. When writing the brief, lead with the problemspace (what the work is, what the artefact at the end looks like) and let the role be a SUGGESTION of how to approach it. Avoid "you are an architect; your only job is X". Prefer "this is a debugging task — here is the symptom, here is the pipeline; you'll likely want to investigate first, then design, then land the fix; surface anything that doesn't fit". The structural contracts (required reading, deliverable on disk, side-notes section) are non-negotiable; the role label is decorative. See "Suggestive roles, not enforced roles". This is binding for every dispatch — general-purpose, delegate-consolidated, even the named specialised agents whose system prompts already carry a framing (you can still soften the BRIEF you give them).
Never use Plan or Explore as subagent_type in /delegate. Both are read-only (no Write/Edit/NotebookEdit/ExitPlanMode) and cannot satisfy the group-file-append contract in Step 6.3. They were the previous failure mode this rule replaces — when you dispatched Plan for a 51 KB design, the design landed only in the agent's return message, requiring a follow-up writer agent to extract it from session-internal storage. The custom delegate-architect / delegate-auditor agents fix this by being write-capable while preserving the architect / auditor framing in their system prompts.
After each agent returns:
README.md's phase checklist.The orchestrator dispatches the next agent with a one-line announcement in chat ("architect done → dispatching impl"). No paragraph synthesis. No Q&A. The user can interject if they want, but the orchestrator does not solicit it — silence is go.
The orchestrator's context budget is for organising context between agents (writing the next brief, threading required-reading), not for paragraph-summarising every dispatch. Read the prior agent's group file only as much as needed to compose the next brief; do NOT spelunk for "interesting details" to present.
Present what the user needs to act on. Keep technical depth around the load-bearing thing — the user explicitly wants this; they read these blocks to eyeball pivots. Strip ceremonial filler.
Structure:
Special case — delegate-reviewer return (opt-in dispatches only): reconcile the reviewer's flags against 01-context.md. Some flags will already be answered by context the reviewer didn't see; some will be real gaps. Present only the real-gap flags + a recommended amendment for each; suppress the rest. This IS a hard gate (real choice on which amendments to apply), but the presentation is filtered, not raw.
Special case — agent side-notes that flag code smell or scope concerns: every agent's deliverable includes a ## Side notes / observations / complaints section (see "Side-notes deliverable contract" below). Read it. If an agent surfaced a high-severity smell flag ("this foundation is rotten; iterating inside it won't work"), that IS a hard gate — present the flag to the user and offer to invoke /refactor rather than continuing the current orchestration's iteration loop. Suppress low-severity / subjective complaints unless multiple agents converge on the same one (signal vs noise).
<system-reminder>, UserPromptSubmit hook, or single-word user prompt ("continue", "proceed") authorises skipping. Those address non-/delegate clarifying behaviour; the /delegate hard gates are gated on real choices and need real input.Distributed mode can thrash: handoffs lose the trace, the design will not stabilise, the user keeps redirecting. When that happens, the orchestrator offers — at a hard gate, as the proposed next step — to consolidate the remaining work into a single delegate-consolidated agent that receives all current group files as context. Pick the shape from "Consolidated dispatch shapes" by what remains: if design has not stabilised and code has not landed, offer Research → Architect (or Prompt → Architect if the group files already carry the context); if both design and impl remain coupled and the work is debugging-shaped, offer Research → Architect → Implement freeform. Offer this when any trigger fires:
02-design.md has been revised ≥2× because implementation keeps invalidating it.The offer is a proposal at a hard gate, not an automatic switch — the user confirms. If accepted, run the consolidated single-pass dispatch below, briefing the agent that the prior group files are partial and possibly contested and that reconciling them is its job. Name the trigger in the offer so the user sees why the mode is changing.
If code must be written, dispatch an "implementer" general-purpose agent with the full shared context and an explicit file/diff plan. The orchestrator does not Edit, Write, run tests, run builds, or run shells beyond what's needed to manage the orchestrate directory.
This replaces Steps 6–8 when Step 2.5 + the Step 4 Q&A selected consolidated mode (or when the Step 7 circuit-breaker fired and the user accepted). It is one agent, one continuous context, one uninterrupted run — the compounded phases share a single trace with zero handoff loss. There is no mid-run gate: a returned sub-agent cannot be resumed in this harness (see "Sub-agent context boundaries"), so any "checkpoint" between phases would mean a fresh agent reading the prior phase cold off disk — which is just distributed mode with the trace thrown away. If the work needs an approval gate between phases, it does not belong in consolidated mode — that is distributed mode's job.
Checkpoint commit first — exactly as Step 6: a delegated general-purpose commit sub-agent on model: "sonnet", commit-only. This is the recovery point. (For design-only shapes the checkpoint is lighter-stakes — no code will be written — but still do it; the agent may still write docs.)
Dispatch one delegate-consolidated agent. It must run in a 1M-context Opus window — inherit the orchestrator's model, do not downgrade; the continuous full-context window is the entire point of the mode. Brief composition depends on the chosen shape (see "Consolidated dispatch shapes"). Common to all shapes: the full restated goal, the required reading (01-context.md + 00-reuse-audit.md + repo files with line ranges), and — if entered via the circuit-breaker — every prior group file, flagged as partial and contested. Lead with the problemspace; suggest the phasing. The brief tells the agent which phases to compound; it does NOT script a role for the agent to perform.
## Investigation (only for Research → Architect, summarising the findings the agent thought were load-bearing), ## Design, ## Decisions & rejected alternatives, ## Assumptions made, ## Side notes / observations / complaints. NO code is written. NO ## Implementation log. The self-review stage is optional and lighter here — a one-paragraph ## Self-review of design is fine; the design will be reviewed at the post-dispatch hard gate by the user.## Investigation, ## Design + ## Decisions & rejected alternatives + ## Assumptions made, ## Self-review (adversarial — anything rated high-risk is escalated to a fresh-eyes delegate-reviewer, not self-certified), ## Implementation log (what changed by file, verification results), ## Side notes / observations / complaints. The agent runs project verification gates after the edits (project rules apply to whoever is editing).## Investigation + ## Side notes / observations / complaints. NO design, NO code.## Design + ## Decisions & rejected alternatives + ## Assumptions made + ## Side notes / observations / complaints. NO code.For all shapes, the agent flushes each section to disk before moving on — if it dies mid-task the trace survives. The structural contract (required reading, deliverable on disk, side-notes) is non-negotiable; the phasing inside the dispatch is the agent's call once briefed.
Single end hard gate. The agent returns status only. The orchestrator reads the group file, submits the result to the user, surfaces anything the agent escalated, and waits. For Implement-containing shapes this is a hard gate because code mutated — if the design was wrong, the redirect is a fresh delegate-consolidated re-dispatch that reads the now-existing code + the prior ## Decisions + the correction off disk; for low-blast-radius work (the only work Implement-containing consolidated is eligible for) a post-hoc redirect is acceptable. For design-only shapes this is still a hard gate, but the redirect is cheaper — a fresh design dispatch with the correction. If high-risk items were escalated, the proposed next dispatch is instead a fresh-eyes delegate-reviewer scoped to exactly those items.
Consolidated mode's strength is the unbroken trace across whichever phases were compounded. Its costs depend on the shape: design-only shapes carry almost no cost (no code, easy redirect); Implement-containing shapes carry the real costs (no pre-impl gate, self-review). The Step 2.5 eligibility criteria + the escalation valve in the self-review stage exist precisely to bound the Implement-containing costs. Everything outside Steps 6–8 — Steps 1 through 5, the README / 01-context.md artifacts, the Exit rule — applies to consolidated mode unchanged.
The template below leads with the problemspace, then SUGGESTS the role/phasing, then specifies the structural contract. Do not flip the order. Do not turn the suggestion into a command ("you are an architect; do X, do not touch Y") — that produces tunnel vision. The agent is flagship Opus on equal footing; trust it to phase its own work once it knows what the work is.
You are working as part of a delegated orchestration. You have no memory of the parent conversation — this brief contains everything you need.
# Problemspace
<the behavioural problemspace — how it behaves now (symptom as spatial-visual phenomenology: what is perceived, where in the rendered scene), how it's supposed to behave, what the user wants. NOT where in the machinery (no code path, no domain concept, no pipeline stage), NOT a role assignment.>
# Goal
<full restated user goal, verbatim>
# Suggested approach (suggestion — not a script)
<one short paragraph or 2-3 bullets sketching how you'd phase it: e.g. "you'll likely want to investigate the X pipeline first, then design the fix, then land it. Phase however makes the most sense once you see the code." For consolidated dispatches, name the compound shape (Research → Architect / Research → Architect → Implement / etc.) as a guideline, not a script.>
# Required reading (in order)
1. docs/orchestrate/<topic>/01-context.md (REVIEW AGENTS: read docs/orchestrate/<topic>/04-review.md instead — and ONLY that)
2. docs/orchestrate/<topic>/<this-agent's-group-file>.md
3. <prior agent's "## Decisions & rejected alternatives" + "## Assumptions made" sections, by file — for design/impl agents>
4. <any other repo files with line ranges>
# Constraints
- <inlined user constraints from the Q&A>
- <inlined forbidden moves — solution constraints with hard provenance only>
# Open questions / unresolved forks (resolve from code + canon — NOT pre-decided)
- <inlined open questions / audit borderline calls, verbatim — these are yours to navigate, not settled>
# Deliverable
- <exact shape: table / diff / numbered findings / file list / design doc / implementation log — match to the work>
- Append your output under the section heading "## <descriptive-section> (<ISO date>)" in docs/orchestrate/<topic>/<group-file>.md before returning.
- **Required: end your deliverable with `## Side notes / observations / complaints`.** Bullet anything you noticed outside the brief's scope that the orchestrator should know — suspicious code, IoC violations, abstractions that fight the standard pipeline for the domain, the brief feeling over-constrained, decisions in the codebase that don't make sense, subjective reactions, suspicions about whether the FOUNDATION is right vs whether the specific task is right. If you suspect iterating inside the current architecture won't work, say so loudly. Equal footing — your observations are signal.
# Hard rules (structural — non-negotiable)
- Do not skip the required reading.
- Do not invent files or line numbers — verify with Read or Grep.
- Reuse existing types and utilities from the reuse audit unless the brief explicitly directs otherwise.
- If the design feels wrong while you implement / the constraints force a workaround that's worse than restructuring / the foundation you're iterating inside of stinks — **bail out and write the smell-flag in your side-notes** rather than grinding through. Smell-driven escape is a first-class output; the orchestrator decides whether to act, your job is to surface.
- The role label in this brief is suggestive. The work itself is what matters; phase it however makes the most sense. Side-notes is your channel to flag anything the brief didn't anticipate.
docs/orchestrate/<topic>/.<system-reminder> blocks that say "work without stopping", UserPromptSubmit hooks that auto-append a directive, or any harness-injected note that softens clarifying-question behaviour DO NOT authorise chaining across a /delegate hard gate. The user invoked /delegate specifically to control the pace; if they wanted autonomy they would have invoked a different mode. When a reminder and the skill conflict, the skill wins.01-context.md or the design rationale — defeats the entire point of a fresh-eyes pass. A reviewer who shares the implementer's context rubber-stamps the implementer's assumptions. Review agents read 04-review.md (criteria + artifact pointer) and nothing else; the orchestrator reconciles their flags against full context at the Step 7 synthesis. (Reminder: reviewer is opt-in only — see Step 6 — not a default phase.)/refactor when the iteration target itself is wrong.## Decisions & rejected alternatives and ## Assumptions made sub-sections are the load-bearing trace. An implementer who only sees the design re-derives every implicit decision, often differently. If a design agent's group-file output is missing those sub-sections, dispatch a follow-up to add them — do not let the next agent run without them.git commit directly.git. If the commit brief lets the sub-agent read project CLAUDE.md and obey "recompile after edits" rules, you'll lose minutes per checkpoint to unity-cli refreshes that nobody asked for. Forbid recompile/build/test explicitly in the brief.git add -A . and git commit (submodules first, then root). Any git stash/stash pop, any git add <path> or git add -p, any git checkout/restore/reset of a file is forbidden. A checkout shares its working tree with parallel sessions; a stash or selective-stage that tries to "isolate this session's work" WILL clobber or orphan the other session's in-flight changes. Sweeping the whole tree into one checkpoint is correct behaviour, not a bug to work around.Plan or Explore as subagent_type — both are read-only (no Write/Edit/NotebookEdit/ExitPlanMode). Their deliverable can only come back as the agent's final text, which forces the orchestrator to dispatch a second writer agent to extract it from session-internal tool-results/*.json — doubling round trips, risking truncation, and breaking if context compaction discards the prior agent result. Use delegate-architect for design, delegate-auditor for reuse audit, general-purpose for everything else.delegate-reviewer, never sign off on its own design. A consolidated ## Implementation log with high-risk items and no escalation is incomplete.SendMessage, "resume the agent", or any mid-run checkpoint into a brief or flow — a returned sub-agent is gone; this harness has no resume (the Agent tool's description mentions SendMessage, but the tool is not present). A gate between two stages always means a fresh agent reading the prior stage cold off disk. So consolidated mode runs in one uninterrupted pass, and any flow that genuinely needs a design-approval gate before implementation belongs in distributed mode, not consolidated mode.delegate-auditor produces ## Borderline calls — explicitly open questions. 01-context.md's "Forbidden moves" slot has a frame that invites declarative assertions; an unresolved fork dropped into it inherits that settled tone and becomes a binding false premise. The architect then obeys it — it cannot cleanly pick the option the forbidden-move struck out, so it designs onto the other one and flags the wrongness in side-notes instead of escaping. Signature: a "forbidden move" carrying an "unless the agent concludes otherwise" escape hatch (the hedge is proof it is an open question), or one sourced to a header comment / orchestrate-doc claim rather than a user decision / research paper / verified file:line. Cure: borderline calls go to 01-context.md's ## Open questions section verbatim from the auditor; forbidden moves are solution constraints with hard provenance only. The 2026-05-21 godrays-farfield session is the canonical instance — the orchestrator promoted the audit's raymarch-vs-AP-LUT borderline call into forbidden-move #2 ("far-field atmosphere was reassigned to the AP-LUT", a topology claim sourced to a self-narrating header comment); the architect dutifully designed godrays onto the wrong surface and flagged "AP-LUT is 32×32, may be too coarse" in side-notes; the user caught it, not the pipeline.docs/orchestrate/<other-topic>/ are historical journals — what agents thought in the past, prone to hallucinations. They are NOT source of truth and MUST NOT be listed in a sub-agent's required reading as if canonical. Source of truth = the code (for "what is implemented") + research papers (for "how it should be implemented"). The current orchestration's own docs ARE load-bearing for inter-agent context within THIS session; other orchestrations' docs are not. Only exception: the user explicitly names another session as relevant — and even then the brief frames the reference as "what an agent thought in the past, verify every claim against code (file:line) or against a research paper before acting on it". The 2026-05-21 cloud-ap-canon-realign session is the canonical bad instance — an architect cited docs/orchestrate/unified-far-field-raymarch/19 as canon, and the orchestrator dispatched a sub-agent to amend that other-session's doc based on the current session's findings. The user named the failure mode: "agents poisoning the well".When an orchestration's scope includes adding or modifying an e2e gate that is intended to capture a user-visible artefact (a visual glitch, a runtime behaviour, anything whose ground-truth is "what the user sees"), the gate is NOT considered analytically valid until the user has visually confirmed that its captures show the artefact. A passing/failing variance ratio + a numerical threshold are not sufficient — they only prove the metric responds to the captured pixels, not that the captured pixels are the artefact.
This rule exists because the dominant failure mode of this orchestration mode is: an agent builds an e2e gate that compiles and passes a pre-fix/post-fix smell test, but the captured framebuffers are smeary, mis-timed, off-camera, or otherwise not actually showing the symptom the user described. The fix lands, the gate is green, and the artefact is still there because the gate was measuring something else.
E2e gate authoring is split into a separate phase from the fix-implementation phase. The two are NEVER bundled into a single dispatch — not "two stages of a consolidated dispatch", not "two steps of one impl agent's brief". Separate dispatches with a hard user-verification gate between them.
Gate-authoring phase. Dispatch an implementer to:
target/e2e-screenshots/<gate-name>-frame-<N>.png. Every new visual-capturing gate MUST land with on-disk screenshots; a gate that only emits a numerical metric is not finishable.Visual verification hard gate (user-facing, mandatory). Present the captured screenshots to the user as absolute paths in chat — one line per frame, so they can open each frame and judge. Ask explicitly: "Do these captures show the artefact you described? Is the timing right (capturing the shift frame, not a frame before/after)? Is the camera path right? Are the captures sharp, not smeary?"
Threshold calibration phase. Only after the captures are user-confirmed: dispatch the metric + threshold authoring. The threshold is calibrated against the user-confirmed-artefact pre-fix run.
Fix-implementation phase. Now (and only now) dispatch the actual fix. Re-run the gate post-fix; the gate must PASS. Optionally re-present the post-fix screenshots to the user for a second visual confirmation — recommended for symptoms where the post-fix expectation isn't a sharp binary (e.g. "this should be eliminated" vs "this should be reduced").
If the captures are smeary or mis-timed, no threshold calibration can save the gate. A 1.40× ratio reduction between "wrong frames pre-fix" and "wrong frames post-fix" tells you the fix changed something — but says nothing about whether it changed the thing you cared about. The visual verification gate is what bounds the gate to the user-reported artefact rather than a coincidentally-correlated GPU signal.
The shape that fails: bundling gate-authoring + threshold-calibration + fix-implementation into one dispatch, then declaring victory because pre-fix FAILED and post-fix PASSED. The threshold calibration can still be tuned to make almost any pair of captures produce a FAIL→PASS transition — that doesn't make the gate analytically valid.
The shape that succeeds: split the dispatches, hand the screenshots to the user before the metric is even calibrated, and accept that gate-authoring may need its own redirect loop before the fix can land.
The rule applies whenever the e2e gate's job is to capture a user-described visual symptom and reduce it to a metric — that's where the smear/timing failure mode lives.
A separate trigger from the consolidated-mode circuit-breaker in Step 7. Fires on a different failure mode: a published diagnosis that doesn't survive contact with reality.
Trigger: the user reports that a fix did NOT visibly reduce the user-visible symptom (a live visual check, a runtime check, anything where the ground-truth is what the user sees). Even ONCE. Even slightly. "Pretty much the same" / "still blinking" / "no change" — all trigger.
Mandatory action: the next dispatch is a read-only diagnostic investigator. No exceptions. No "let me tighten the hash" / "let me widen the parity bit" / "let me try option B from the prior analysis". A speculative second-pass fix is never an option.
Do NOT present a Q&A at all. Not "diagnose vs try-fix-B vs revert" — none of that is a menu. Diagnose is the only path. A speculative fix is not an alternative the orchestrator can offer. Revert is sometimes the right call but it is a self-realisation ("we screwed up scope, the cleanest move is to back out and restart") — never a user-facing menu item alongside diagnose. Presenting alternatives is itself evasion: the user picks whatever sounds fastest, which is precisely the bias this rule exists to override.
What the orchestrator does instead: state in chat that the visual check failed and diagnose-first is firing. Summarise in one or two sentences what the diagnostic agent will look at. Dispatch it (it is read-only, so the soft-gate rule applies — announce and proceed, no user-confirmation pause). The user can interject if they want a different path, but the orchestrator does not solicit that — the default is dispatch-now, silence-is-go.
A speculative fix is never an option. Revert is the orchestrator's silent escape hatch when scope is wrong, not a menu choice.
The diagnostic investigator's brief:
This rule overrides "the diagnosis was line-grounded and confident". The handoff that produced the original diagnosis was, by /handoff skill's framing, written by a session that itself could not finish the task — its diagnosis is unverified by construction. A strictly-stronger fix in the same hypothesis class producing no improvement is near-conclusive evidence the hypothesis is wrong, not under-tuned. The orchestrator MUST treat "user-visible symptom did not move" as a kill signal for the current hypothesis class, not as "fix needs more tuning".
Anti-pattern this defends against: the orchestrator reads the prior fix's failure as "the implementer self-flagged Finding X as a known-residual tradeoff; let's address Finding X next", and dispatches a refinement targeting that Finding. That is the trap. A self-flagged "known residual" turning out to be load-bearing for the symptom is much less likely than "the diagnosis is wrong and Finding X is irrelevant". When in doubt, observe before iterating.
Sanity check before any post-fix dispatch (predict-the-outcome rule). Before dispatching iteration N+1 of a fix, write down — in chat, in one line — what the user-visible symptom would look like if iteration N had been the right fix. This is the falsification line. When iteration N's actual user check produces an outcome inconsistent with that prediction (e.g. you predicted "blink eliminated" and the user reports "unchanged"), the hypothesis is falsified, not under-tuned. Diagnose-first fires. The predict-the-outcome line goes into the orchestrator's chat output BEFORE the user runs the check, so the comparison is honest after the fact.
Every agent's group-file deliverable MUST end with a ## Side notes / observations / complaints section (or equivalent heading — "agent notes", "out-of-scope flags", "things I'd improve", "rants"). The orchestrator reads it. It is OPTIONAL content with EXPLICIT permission to surface anything the agent noticed that doesn't fit the deliverable contract.
The framing for the agent (include in every brief):
Anything you noticed while doing this task that doesn't fit the deliverable but you think the orchestrator should know — write it here. Examples:
- Code that looks suspicious or stinky (conflated concerns, IoC violations, accidentally-global state, "dead memory nobody reads", abstractions that fight the standard pipeline for the domain).
- The brief felt over-constrained or asked the wrong question — say what you'd have done differently.
- Decisions in the codebase that don't make sense to you.
- Tools that were missing; context that was missing; signals you wish you'd had.
- Even subjective reactions — "this was confusing", "I felt like I was making the same observation as the prior agent", "the brief asked X but Y seemed more relevant", "I think this orchestration is heading in the wrong direction".
- If you suspect the FOUNDATION is wrong (not the specific task you were given, but the architecture you'd be iterating inside of), say so loudly. The orchestrator decides whether to act; your job is to surface.
Stay terse — bullet points are fine. The orchestrator reads side-notes as signal, not noise; one sharp observation beats five paragraphs of hedging.
Why this is binding (not optional): the orchestrator's tunnel-vision failure mode is scoping every dispatch narrowly and ignoring everything that doesn't fit the scope. The side-notes channel is the structural cure. Agents are flagship Opus dispatches with rich context windows full of observations; suppressing those observations because the brief didn't ask for them is the most expensive waste in this skill. Equal footing — every agent (architect, impl, auditor, diagnostic, even reviewer when invoked) writes side-notes; orchestrator reads them.
/refactorA separate trigger from diagnose-first and the consolidated-mode handoff. Fires when the orchestration is stuck iterating inside a broken foundation rather than solving the user's symptom.
Trigger — any one of:
Mandatory action: present the trigger to the user at a hard gate and offer to switch from /delegate iteration into a /refactor session. The framing: "we've iterated N times against this symptom; the foundation looks rotten; the right next step is to refactor toward [the missing pattern] BEFORE more fix attempts, otherwise the next dispatch likely lands on top of the same rot." User confirms; if yes, the current orchestration pauses (docs stay intact) and /refactor takes over.
This is NOT a speculative second fix — it's a structural acknowledgement that the iteration target is wrong. Diagnose-first prevents speculative same-class fixes; loop-detection prevents speculative-iteration-inside-rot. They're complementary circuit-breakers, not alternatives.
Do NOT push past this circuit-breaker silently. When a trigger fires, the orchestrator stops and offers. Continuing without the offer is the tunnel-vision failure mode this rule exists to prevent — same shape as ignoring diagnose-first when a fix didn't help.
An opt-in mode that trades orchestrator-context-hygiene for sub-agent autonomy. Where diagnose-first keeps the orchestrator in the loop and dispatches one read-only diagnostic at a time, brute-force dispatches ONE sub-agent that owns the entire hypothesise-test-iterate loop end-to-end, against a deterministic probe-gate, with a private progress file the orchestrator NEVER reads. The orchestrator sees only a final-deliverable summary on success or an architectural-escape-hatch report.
This mode is legitimate alongside diagnose-first, not a replacement. They suit different conditions.
All four should hold:
Single dispatch, full autonomy. The orchestrator briefs ONE sub-agent with:
docs/orchestrate/<topic>/NN-brute-force-log.md).<progress-file>. Do not echo your attempts back via your return text. Only the final summary in <summary-file> reaches the orchestrator."Independent pre-round of investigation. Before touching code, the agent reads:
Agent poses an independent hypothesis set, ordered by probability. At least three hypotheses, ranked by their own (fresh-eyes) read of the code. The agent does NOT follow the orchestrator's "recommended fix shape" — that recommendation is part of the prior context, not a directive. Independent ranking is the point; if the agent just executes the orchestrator's recommendation, the brute-force protocol has no value over distributed mode.
Agent writes a predict-the-outcome line per hypothesis BEFORE testing. What the probe-gate would do (PASS/FAIL on which checks) if the hypothesis is the right fix. This is the falsification line (mirror of diagnose-first's predict-the-outcome rule). Without it, brute-force devolves into "tune until something passes".
Test each hypothesis against the probe-gate. For each:
Indeterminism check on first PASS. Run the probe-gate TWO MORE TIMES on the same code. All three runs must PASS. If any of the additional runs FAILs, the first PASS was a fluke (or there's flakiness in the gate) — record + continue iterating against the next hypothesis. Three-consecutive-PASS is the only acceptable submission criterion.
Architectural-escape-hatch (binding). If at any point the agent's analysis reveals that the right fix requires a LARGE architectural change — e.g., touches multiple modules, introduces a new system, crosses world boundaries, requires API redesign, or otherwise blasts past "bounded scope" — the agent MUST EXIT the brute-force protocol and report this to the orchestrator. The escape signal is a one-paragraph design sketch in the summary doc: "I've identified the load-bearing fix but it requires <architectural change description>; the orchestrator should switch to distributed mode (architect → reviewer → impl) for this." The brute-force protocol is for bounded iteration, not for delegating architectural decisions to a sub-agent.
Submit on three consecutive PASSes. Write a clean summary in the deliverable doc (e.g. NN-brute-force-summary.md) that the orchestrator reads. The private progress log stays separate — the orchestrator should not need to read it to understand the submission.
<progress-file>. The orchestrator never reads that file. Do NOT echo attempt details back via your return text — that pollutes the orchestrator's context the same way."| | Diagnose-first | Brute-force | |---|---|---| | Trigger | A fix didn't visibly help (mandatory) | Repeated diagnose-first failure + clean gate exists (opt-in) | | Per-cycle dispatches | Read-only diagnostic → fix → user check → repeat | One agent owns whole loop | | Orchestrator context per cycle | Sees diagnosis findings, presents to user | Sees only final summary or escape report | | Verification surface | User's eye + e2e gate after fix | Probe-gate within the agent's loop | | Cost | Lower per-dispatch; more dispatches | One dispatch; heavier sub-agent context | | Architectural decisions | Orchestrator decides per cycle | Agent escapes back to orchestrator if needed | | Bias-resistance | Fresh-eyes per cycle (each diagnostic agent independent) | Agent's pre-round investigation is the fresh-eyes pass |
Either mode satisfies the principle "no speculative fixes" — diagnose-first by forcing a read-only diagnostic between attempts; brute-force by requiring an independent pre-round of investigation + predict-the-outcome per hypothesis + 3×PASS submission criterion.
The mode ends when the user signals done or when README.md's phase checklist is fully [x]. Leave docs/orchestrate/<topic>/ intact — it's the durable artifact. Do not delete or condense it on exit unless the user asks.
data-ai
Extract research content from YouTube presentations, PDFs, or PPTX files into structured markdown. Dispatches each pass to a dedicated sub-agent (research-extractor / research-vision / research-refiner) so per-deck vision passes scale to hundreds of slides without bloating the parent context.
development
Build, run, and analyze Unity profiler data with perf-report-style call-stack attribution
documentation
Write a handoff prompt for a future session. A handoff is a continuation-link — minimal context plus a kickoff line the user can copy-paste. Never a diagnosis, never an investigation script, never a prescribed deliverable.
development
Create or switch to a git worktree for isolated feature/fix development