Name: skills/planner-worker
Author: RonanCodes

/ro:planner-worker (alias /ro:swarm)

Multi-agent coding swarm. Sibling to /ro:ralph. Where Ralph is one agent doing stories in series, planner-worker is N agents doing stories in parallel under a planner and a merger, with optional judge-agent termination.

Part of the local factory

This skill is one part of the local factory — the family of agent-loop skills that run autonomously on Ronan's machine. Siblings: /ro:ralph, /ro:matt-pocock-coding-workflow, /ro:night-shift, /ro:day-shift. They share artefact shape, gitignore rules, and PR conventions. See /ro:ralph § "Run artefacts (the canonical shape)" for the canonical reference.

The companion is the remote factory — the Factory app (tracked separately) that runs equivalent loops as a cloud service. Story formats and PR conventions are kept compatible so a queue from one can be consumed by the other.

For parallel workers specifically: every worker MUST write its scratch to .ralph/sessions/<session-id>/<worker-id>.md (gitignored). The planner harvests learnings into .ralph/patterns.md, writes the per-session detail log to .ralph/sessions/<session-id>.md, and updates the per-phase rolling aggregate at .ralph/<phase>.session.md at end-of-wave. At the same end-of-wave point the planner also harvests .swarm/shared-learnings.md into .ralph/patterns.md and clears .swarm/claims.jsonl (see US-3c2). No worker touches a shared file. No merge conflicts on the progress log.

Close-with-summary (mandatory for every shipped slice): every worker MUST post a structured comment on the GH issue BEFORE merging via gh issue comment <N>. Sections: Shipped, Surprises encountered, Patterns promoted to patterns.md, Local learnings, Follow-ups. See /ro:ralph § "Close-with-summary comment (required for every story)" for the canonical body shape. The merger should reject any PR whose linked issue is missing this comment.

When to use

A PRD has been grilled via /ro:grill-me and written via /ro:write-a-prd and now has 5+ vertical-slice issues to ship
You want overnight autonomous execution (--afk)
The issues are independent enough that parallelisation actually helps (the planner emits depends_on to serialise hot-file work)
You're on the Max plan and want Agent-tool fan-out, not API-billed Sandcastle

When NOT to use

One-shot single-story task (use /ro:ralph --mode single)
Cross-repo / monorepo coordination (out of scope v1)
Hostile-code or untrusted-input runs (no container isolation; future --sandcastle flag)
You don't have a PRD yet (run /ro:grill-me then /ro:write-a-prd first)

Quick start

# Interactive: grills you for config, then runs against a named PRD
/ro:planner-worker --prd lekkertaal-pwa-bootstrap

# Autonomous night-shift (AFK)
/ro:planner-worker --prd lekkertaal-pwa-bootstrap --afk

# Real GitHub issues + PRs
/ro:planner-worker --prd lekkertaal-pwa-bootstrap --github

# Resume after a human-resolved escalation
/ro:planner-worker --resume

Filter / scope: `prd:draft` is NEVER picked up

prd:draft issues are NEVER picked up by this skill. They represent ideas captured in the agent-native repo's "inbox", not ready work. Drafts have freeform bodies, NOT Pocock's 7-section template, and have not been grilled. The planner picking one up would mean dispatching workers against an unfinished spec.

To promote a draft into ready work, the user runs /grill on the issue. The grill-with-docs flow rewrites the body into the Pocock 7-section template, then the user swaps the label from prd:draft to the repo's gate label (ready-for-agent by default, or the project synonym like Sandcastle / swarm configured in docs/agents/triage-labels.md).

When the planner (or any phase of this skill) queries GitHub for backlog issues — typically via --source github:<label> from /agentic-e2e-flow or /ro:night-shift — ALWAYS exclude prd:draft. gh label semantics are tricky: --label <gate> matches issues that have the gate label, but an issue can have BOTH prd:draft AND the gate label (mis-labelling drift). Defence in depth:

Pass --label <gate> to scope the initial query.
Post-filter the JSON: drop any issue whose labels[].name contains prd:draft.
Equivalently, use gh issue list --label <gate> --search "-label:prd:draft" ... to push the exclusion server-side.

Tip: if you're not sure what's queued vs what's still an idea, run /ro:list-draft-prds first to see the drafts inbox before kicking off the swarm.

US-0: Interactive config grilling

Unless --skip-grill (or --afk which implies it) is passed, the skill opens with a project-aware grilling phase. Each missing flag is asked via AskUserQuestion, one at a time, with a recommendation derived from project state.

Order and project-awareness rules:

GitHub issues + PRs for this run?
- Resolve repo mode first via the 4-line snippet from /ro:repo-mode. If mode == work → recommend "no" (hard rule, do NOT prompt; work-mode repos must stay invisible to the work GH/Jira/ADO project). If mode == personal → recommend "yes" when a gh remote resolves. If mode == unset → run the first-run prompt from /ro:repo-mode § "First-run prompt", persist, then re-resolve.
- Probe: gh repo view --json visibility,nameWithOwner 2>/dev/null
- For personal mode: recommend "yes" when a remote resolves and visibility is PRIVATE; "no" when no remote configured or repo is public + you don't want noise.
- CLI equivalent: --github
Worker count?
- Probe: cores via sysctl -n hw.ncpu (macOS) or nproc (linux); free disk on .swarm/worktrees/ parent via df -k .
- Recommend min(5, cores - 2) and warn if free disk < (repo_size_mb * worker_count * 2)
- Rationale: default 5 on a fresh usage window. Scale DOWN to 2-3 when sharing the window with a long interactive session (2026-06-11 lesson: 5 workers + a heavy main session exhausted the weekly cap mid-run). Scale UP via --workers N, .swarm.json workers, or --unsafe-many past 10.
- CLI equivalent: --workers N
Run mode?
- Recommend "interactive" by default
- Offer "AFK" if the user said anything like "kick off and ping me", "run overnight", "wake me when done"
- CLI equivalent: --afk for AFK, nothing for interactive
Merge target?
- Probe: git rev-parse --abbrev-ref HEAD
- Recommend the current branch; offer a staging branch (swarm-staging) as alt
- CLI equivalent: --staging-branch <name> to override; default = current branch
Worker model?
- Recommend sonnet; offer opus only if the PRD is flagged as "research-heavy" (more than 3 issues mention "investigate" or "explore")
- CLI equivalent: encoded in .swarm.json models.worker
Add a Judge agent?
- Heuristic: count concrete acceptance criteria in the PRD. If the PRD looks "closed" (>= 3 numbered acceptance criteria), recommend "no". If open-ended ("we'll know when we see it"), recommend "yes"
- CLI equivalent: --judge-agent
Trust local pre-push CI; skip waiting for GitHub Actions?
- Probe: test -f .husky/pre-push AND grep that file for any of pnpm quality, pnpm test, pnpm build, pnpm lint, pnpm typecheck
- Recommend "yes" when a pre-push hook with at least build + test + lint is present (the case in any repo scaffolded by /ro:new-tanstack-app)
- Recommend "no" when no pre-push hook, OR when the repo's .github/workflows/*.yml runs checks that are NOT in the pre-push hook (e.g. cross-OS matrix, integration tests against a managed service)
- CLI equivalent: --trust-local-ci (yes) or --wait-for-gh-ci (no)
- Persisted in .swarm.json as trust-local-ci: true|false
- Workers configured with trust-local-ci=true push, verify the push hook ran clean, and squash-merge immediately. They do NOT poll gh api .../check-runs.
- If the repo has branch-protection rules that require status checks, the planner detects this via gh api repos/<owner>/<repo>/branches/main/protection and overrides to wait-for-gh-ci=true (printing why)

Each answer is echoed back as the CLI flag the user could have passed next time, e.g. "noted, equivalent to --workers 5". CLI flags supplied at invocation skip their corresponding prompt.

--skip-grill short-circuits to defaults: workers=5, github=<true if repo-mode=personal AND gh remote present, else false>, mode=interactive, merge-target=current-branch, worker-model=sonnet, judge-agent=false, trust-local-ci=true. The github default specifically defers to /ro:repo-mode — work-mode repos always run with github=false so nothing leaks to the work GH/Jira/ADO project. If repo-mode is unset and --skip-grill is in play (e.g. --afk), default github=false (safe failure mode); a separate session will catch the unset state and prompt.

US-1: Planner

The planner is invoked once at the start of a run via the Agent tool, with the Opus model.

Planner prompt template (see scripts/planner-prompt.sh for the canonical version):

You are the PLANNER agent for /ro:planner-worker.

Read the PRD at .ralph/<name>/prd.md.
Explore the codebase enough to size the work.
Emit a backlog of vertical-slice issues to .ralph/<name>/issues/<NNN>-<slug>.md.

Each issue file MUST have frontmatter:

  id: 001
  title: <one-line>
  status: ready
  depends_on: []        # array of issue ids that must merge first
  estimate: <S|M|L>
  dod:
    - <pnpm test passes>
    - <pnpm typecheck passes>
    - <any extra acceptance check>

Body: 5-30 lines describing what to build + why, file pointers, test plan.

Every body MUST end with the verbatim `### Close-the-loop tests` block
(unit + integration + e2e + live smoke) from `/ro:slice-into-issues`.
The dispatcher refuses to send a worker against a slice missing it
(see US-2a "Close-the-loop AC gate"); pre-emit it so we never hit
that gate.

Rules:
- Vertical slices only. No "scaffold the schema" issues without a UI touchpoint.
- Mark depends_on for hot-file conflicts (two issues touching the same component).
- 3-12 issues for v1. If you'd plan more than 12, split the PRD instead.
- Exit cleanly when issues are written. Do not implement anything.

The planner is only re-invoked under US-7 (worker failure re-decomposition).

US-2: Confirm the backlog

After the planner exits, the skill prints a one-screen summary:

Backlog plan ready (.ralph/<name>/issues/):

  001 [S] schema: add points column                    deps: -
  002 [M] service: award points on completed task      deps: 001
  003 [M] dashboard: tile rendering points trend       deps: 002
  004 [S] hook: usePoints                              deps: 001
  005 [M] integration test: points e2e                 deps: 003,004

Dispatch order: [001] -> [002, 004] -> [003] -> [005]
Workers will run up to N=3 in parallel.

[y] dispatch  [n] re-plan with feedback  [q] abort

Skipped under --auto-approve or --afk. Rejecting with feedback re-invokes the planner Agent with the feedback inlined.

US-2a: Close-the-loop AC gate

kind:research issues BYPASS this gate. A research/investigation issue legitimately has no test matrix — its close-the-loop is a doc in docs/research/ + the LLM wiki, per [[canon:research-tasks]]. When a candidate issue carries kind:research, skip the test-AC check below, dispatch the research-worker prompt (see "US-3d: Research-worker mode"), and let the merger assert the research close-the-loop matrix (doc exists + index updated + wiki ingested + bidirectional issue↔doc links) instead of tests.

Before dispatching ANY non-research worker, the planner parses each candidate slice body for the marker ### Close-the-loop tests (the narrower predecessor) OR ### Close-the-loop verification matrix (the full 7-row matrix per [[close-the-loop-verification-matrix]]). Slices missing both are guaranteed leaks: the night-shift swarm only implements what the AC list spells out, so a slice without an explicit e2e AC ships with no Playwright coverage (lesson captured at [[close-the-loop-tests-acs]]; canonical example is dataforce #229 / #231).

Behaviour is controlled by the repo-local .ronan-skills.json flag swarm.missing_test_acs:

refuse (default, safe): print the offending slice ids + bodies, skip dispatching them, fail-loud if zero dispatchable slices remain. Forces the human to either re-run /ro:slice-into-issues against the parent PRD (which now emits the section automatically) or hand-edit the slice body and re-queue.
inject: auto-prepend the boilerplate ### Close-the-loop tests block (verbatim from /ro:slice-into-issues) before dispatch AND tag the issue with the acs-auto-injected label (so reviewers know to read the diff for actual test files). Faster but riskier: a worker may implement the slice without writing the tests if the issue body doesn't already reference them in the "What to build" prose.

Parse logic (cheap, no LLM needed) — accepts EITHER the narrower ### Close-the-loop tests marker OR the full ### Close-the-loop verification matrix marker:

issue_body="$(gh issue view "$NUM" --json body --jq .body)"
if ! grep -qE '^### Close-the-loop (tests|verification matrix)' <<< "$issue_body"; then
  case "$(jq -r '.swarm.missing_test_acs // "refuse"' .ronan-skills.json)" in
    inject) inject_block_and_label_acs_auto_injected ;;
    *)      mark_slice_skipped_with_reason "missing-close-the-loop-tests" ;;
  esac
fi

This gate runs in both --prd <name> mode (planner Agent-emitted slices) AND --queue <path> mode (ranked night-shift input). The night-shift loop should not be expected to backfill ACs into pre-existing issues; if a slice arrives without the section, the gate is the last line of defence.

US-2b: Ranked queue input from /ro:night-shift

When invoked with --queue <path>, the skill skips US-1 (planner runs) and US-2 (confirmation) and reads the ranked slice list straight from the file. Format is a one-slice-per-line plaintext list of <issue-number> (when --github) or <issue-id> (when local). Used by /ro:night-shift to pass its ranker output through and prevent every wave from re-running the planner Agent.

The file is consumed in order — top of file = highest priority. The skill still respects depends_on and the US-3b file-area graph; the ranked order only sets dispatch priority among work that's otherwise dispatchable.

--queue and --prd are mutually exclusive. When --queue is set the planner Agent is NOT re-invoked; that's the night-shift loop's job between waves.

US-3: Worker dispatch

For each unblocked issue (depends_on all merged or empty), up to the worker cap:

gh issue develop <gh-issue-number> --name swarm/<id> — creates the branch with the issue↔branch dev-link (canon ronan-skills/canon/labels.md § Branch flow). NEVER plain git checkout -b here; the dev-link is what makes the Closes #N propagation reliable and lets the night-shift retro walk issue↔branch↔PR without title-matching.
git worktree add .swarm/worktrees/<id> swarm/<id> — check the just-created branch out into the isolated worktree.
Append .swarm/ to .gitignore if not already present (commit only on first run, on a chore(swarm): gitignore .swarm commit)
Spawn a Claude Code Agent (default Sonnet) with the issue body as instructions and working directory pinned to the new worktree

Workers are dispatched as multiple Agent tool calls in a single assistant message so the runtime fans out. They are independent and do not coordinate.

US-3b: File-area conflict detection (hybrid static + scout)

Two parallel workers touching the same files race each other at merge time. The skill mitigates this with a hybrid graph, separate from the depends_on graph.

Canonical-label hints (read first, free): before doing any path-set analysis, read these labels from the canonical label system (the plugin's canon/labels.md):

parallel-eligible — the slicer asserted this slice is file-disjoint from its siblings. The planner treats this as "safe to fan out unless the static pass disagrees with a path overlap".
repo-lock — the slice takes a repo-wide lock (lockfile churn, schema reset, top-level config rewrite). Mark it serial-only: it runs in its own wave after every parallel-eligible slice drains.
hitl-likely — reviewer expected to escalate (ORM, schema-migration, billing, OAuth, secret rotation). The planner counts these against the HITL concurrency cap (default 1 in flight at a time) to keep reviewer attention focused.

These hints are advisory: the static pass below still runs and a path conflict still serialises. The labels exist so the planner doesn't need to re-derive the slicer's intent.

Static pass (every wave, free): read each candidate slice body for:

A ## Files touched or ## Touches section listing paths
An estimate: or paths: frontmatter array
Inline mentions matching src/..., app/..., apps/..., packages/..., drizzle/..., etc.

Two slices conflict if their declared path sets intersect by exact file OR share a directory at depth ≥ 2 (e.g. src/routes/settings.tsx and src/routes/settings.test.ts collide via src/routes/). Conflicting slices are never dispatched in the same wave, even if they're both unblocked.

Scout pass (only after a wave hits a cross-worker merge conflict): spawn a one-shot scout sub-agent per remaining candidate slice with the prompt:

You are a SCOUT sub-agent for /ro:planner-worker.

Slice: <issue body inlined>

Task: read the slice and predict the SET of file paths the implementation will touch.
Output a JSON object on stdout, nothing else:
  {"paths": ["src/routes/...", "src/lib/..."], "confidence": "high|medium|low"}

Rules:
- Do not edit anything.
- Do not run pnpm install or any side-effects.
- Read the codebase enough to be accurate. ~2-3 min budget.
- If you cannot tell, output paths: [] with confidence: low.

Scout outputs are merged into the file-area graph and persist for subsequent waves of the same run.

--no-scout disables the scout pass entirely (purely static). Use when you'd rather waste worker time on conflicts than spend planner time on probing.

US-3c2: Live claims registry + shared learnings (every wave)

The static + scout passes above predict conflicts BEFORE dispatch. This protocol catches the conflicts prediction cannot: scope drift, where a worker legitimately wanders outside its predicted territory mid-implementation. Proof case: nutmeg #167 and #168 (2026-06-11) both created admin/sources.tsx in the same wave — the predicted territories were disjoint, the actual work wasn't, and the collision only surfaced at merge time. A live claims file would have surfaced it mid-run.

Two append-only files in .swarm/, active in every wave. Critical: they live in the PRIMARY checkout's .swarm/, NOT the worker's worktree. A worker runs inside .swarm/worktrees/<id>/ (or .claude/worktrees/agent-<id>/ under the Agent tool's isolation:"worktree"), and .swarm/ is gitignored — so a relative .swarm/... write from a worktree creates a private file in that worktree, isolated from every wave-mate, silently breaking the "benefit immediately" promise (observed nutmeg 2026-06-15). Workers MUST target the shared path: source skills/planner-worker/scripts/swarm-shared.sh (exposes swarm_learn / swarm_claim / swarm_read_learnings / swarm_read_claims, all pointed at the primary .swarm/), or compute it inline once with SWARM="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)/.swarm" (robust across both worktree layouts; the older ../../ relative hop assumed .swarm/worktrees/<id> and breaks under the Agent-tool layout).

.swarm/claims.jsonl (i.e. $SWARM/claims.jsonl) — the live claims registry. One JSON object per line:

{"worker": "003", "issue": 168, "paths": ["src/routes/admin/sources.tsx", "src/lib/sources/"], "ts": "2026-06-11T21:14:02Z"}

Appends are atomic — swarm_claim '<json>', or flock-guarded when available (flock "$SWARM/.lock" ...), else a single-line printf '%s\n' "$line" >> "$SWARM/claims.jsonl" (atomic under PIPE_BUF on a local FS; macOS has no flock, so single-line append is the default path). Protocol:

A worker appends its claim BEFORE editing an area — not after.
A worker reads all existing claims on arrival AND again before any scope expansion (touching a path it didn't originally claim).
On a clash with another live claim: do NOT edit the contested path. Flag the clash in .swarm/status.md and in the final report so the planner arbitrates (serialise, re-scope, or hand the path to one worker).

.swarm/shared-learnings.md (i.e. $SWARM/shared-learnings.md) — append-only one-liner discoveries: build quirks ("run pnpm build first to regenerate routeTree.gen"), flaky tests, migration-journal gotchas. Workers append discoveries with swarm_learn "..." (or printf '%s\n' "- ..." >> "$SWARM/shared-learnings.md") as they hit them, and swarm_read_learnings on arrival and between phases — wave-mates benefit immediately instead of waiting for the next wave's prompts.

Honest framing: this is cooperative, not enforced — LLM workers follow it because the prompt says so. The merge-time rebase + full DoD re-run remains the hard gate. Claims exist to catch scope drift early, not to replace the gate. It pays off most at 4+ workers; at 2-3 the static graph already does well.

Planner duties: at wave end, harvest .swarm/shared-learnings.md into .ralph/patterns.md (alongside the existing worker-scratch harvest) and clear .swarm/claims.jsonl — stale claims from a finished wave must not block the next one.

Relation to the factory: this is the local, cooperative sibling of the remote factory's enforced DO file-lock graph (github.com/RonanCodes/factory). Keep the claim record shape (worker, issue, paths, ts) compatible so a queue can move between local swarm and cloud factory.

US-3c: Canonical lifecycle transitions

The planner is the lifecycle-label authority while a wave is in flight, per the canonical label system (the plugin's canon/labels.md):

Slice pickup query (GH mode): gh issue list --label kind:slice --label ready-for-agent --state open. The planner picks from this set, applying the conflict graph + label hints above.
On dispatch: swap ready-for-agent → in-progress on the chosen issue (single gh issue edit --add-label in-progress --remove-label ready-for-agent).
On reviewer reject: swap in-progress → ready-for-agent (back into the queue for the next wave).
On HITL escalation: swap in-progress → needs-human and post a structured comment naming the human action required.
On PR merge: the worker MUST explicitly gh issue edit <NUM> --remove-label in-progress BEFORE the squash-merge call. The Closes #N autoclose flips the issue from open to closed but does NOT strip labels — without the explicit remove, the closed issue keeps a stale in-progress label and the lifecycle-label invariant breaks. Closed state is the absence of any lifecycle label.

Branch flow: workers branch off their assigned issue with gh issue develop <issue-number> --name <slug> --checkout rather than plain git checkout -b. This produces the GitHub issue→branch dev-link, which makes Closes #N automatic in the PR and lets the night-shift retro walk issue→PR without title-matching.

US-3d: Research-worker mode (`kind:research`)

When the dispatched issue carries kind:research, the planner routes to the research-worker prompt instead of the code-worker template below, and the merger asserts the research close-the-loop matrix instead of tests. The full prompt + matrix + folder convention live in [[canon:research-tasks]]. In short: the worker does deep web research (WebSearch + WebFetch + /ro:perplexity-research), writes a cited doc to docs/research/<slug>/README.md with bidirectional issue↔doc front-matter, updates the docs/research/README.md index, ingests the doc into the designated LLM wiki vault via /ro:wiki, posts a close-with-summary comment, and opens a docs-only PR. CI gate is lint/build only — research PRs touch docs/**, never application code, so there are no unit/integration/e2e tests to wait on. Research issues are always parallel-eligible (they only touch docs/research/, never conflicting with code workers' file-areas).

Worker prompt template (code workers; for kind:research see US-3d above and [[canon:research-tasks]]):

You are a WORKER agent for /ro:planner-worker.

Issue: .ralph/<name>/issues/<id>-<slug>.md (full body inlined here)
   OR a GH issue number; in that case, FIRST run:
     gh issue develop <issue-number> --name <slug> --checkout
   to branch off the issue (NOT git checkout -b). This produces the
   issue->branch dev-link required by the canon.
Worktree: .swarm/worktrees/<id>/
You are pinned to this worktree. Do NOT touch sibling worktrees.

Shared state lives in the PRIMARY checkout's .swarm/, NOT your worktree
(.swarm/ is gitignored, so a relative .swarm/... path writes a private copy
nobody else sees). FIRST resolve it once:
  source <repo>/skills/planner-worker/scripts/swarm-shared.sh   # if available
  # else: SWARM="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)/.swarm"
Then use swarm_read_learnings / swarm_read_claims / swarm_learn / swarm_claim
(or $SWARM/shared-learnings.md and $SWARM/claims.jsonl directly).

Workflow:
1. On arrival: swarm_read_learnings (repo quirks your wave-mates already hit)
   and swarm_read_claims (which paths other live workers hold)
2. Read the issue body
3. swarm_claim '{"worker","issue","paths","ts"}' BEFORE editing an area — not
   after. Re-read claims and append a fresh claim BEFORE any scope expansion
   (touching a path you didn't originally claim)
4. NEVER edit a path another live claim holds. On a clash: do NOT edit —
   flag it in $SWARM/status.md and your final report; the planner arbitrates
5. Implement the feature/fix
6. swarm_learn "..." (build quirks, flaky tests, migration gotchas) as
   one-liners AS YOU HIT THEM, not at the end
7. Run the DoD commands (auto-detected or .swarm.json `dod`)
8. Only exit successfully when ALL DoD commands pass
9. Commit on the worktree branch with an emoji-conventional message
10. Append a one-line summary to $SWARM/logs/<id>.log (primary checkout, not your worktree)
11. Before squash-merge (or just before exit, whichever applies in your dispatch shape): `gh issue edit <NUM> --remove-label in-progress`. The `Closes #N` autoclose strips the issue's open state but does NOT strip labels; without this step the closed issue keeps its stale `in-progress` label and the canonical lifecycle invariant breaks.

Worktree guard — run BEFORE every git command (including `git status` / `git rev-parse`):

```bash
test "$(git rev-parse --abbrev-ref HEAD)" != "main" \
  || { echo "ERROR: on main branch, refusing to commit"; exit 1; }
test "$(pwd)" != "<MAIN_REPO_ABSOLUTE_PATH>" \
  || { echo "ERROR: in main checkout, refusing to commit"; exit 1; }

If either check fails, exit "stuck" immediately — do NOT recover by switching branches. Lesson #5a documents the failure mode: the Agent tool's isolation: "worktree" flag occasionally fails to take effect and the worker lands in the shared main checkout; once there, every subsequent git op mutates the main checkout's index instead of the worktree's. Branch protection on main is the load-bearing backstop (rejects the eventual push), but the in-prompt guard catches the drift earlier and avoids a "stuck on rejected push" loop.

If git push is rejected by GitHub's branch-protection hook (e.g. remote: error: GH006: Protected branch update failed), do NOT retry, do NOT use --force, --force-with-lease, or --admin. That rejection means you drifted out of your worktree onto main. Exit "stuck" with the rejection message verbatim so the orchestrator can intervene.

Rules:

One commit per logical change, but ALL must land before exit
Commit early and often: commit each coherent unit AS SOON as it exists, so progress survives a kill or crash. A worker that dies mid-task should leave commits behind, not a pile of uncommitted edits (lesson: a nutmeg wave-5 worker died with 111 uncommitted lines; attempt 2 with this protocol shipped clean)
NO silent waits over 5 minutes on builds/tests. If a command hasn't returned in 5 minutes: kill it, diagnose verbosely (re-run with verbose flags, read the logs), and report what you found — never sit quietly
Do not push (the merger handles that)
Do not edit issues/, prd.md, or other worktrees
If you fail DoD after one retry with the failure context, exit with status "stuck" and a one-line cause to .swarm/logs/<id>.log


## US-4: DoD detection + override

Auto-detect from `package.json` in this order:

1. `.swarm.json` `dod` block (highest precedence)
2. If `package.json` has `scripts.test` and `scripts.typecheck`: use them
3. Else if `tsconfig.json` present: `tsc --noEmit`
4. Else: `pnpm test` if `pnpm-lock.yaml` present; `npm test` if `package-lock.json` present

If `.swarm.json` lists `dod.extras`, append those (e.g. `pnpm lint`, `./verify-quick.sh`).

Worker exits successfully **only** when every DoD command passes (non-zero exit = stuck).

## US-5: Merger agent

When a worker reports green, dispatch a **merger Agent** (Opus) for that worktree:

You are the MERGER agent for /ro:planner-worker, issue <id>.

Worktree: .swarm/worktrees/<id>/ Merge target: <merge-target-branch>

Workflow:

cd into the worktree
Read the diff (git diff <merge-target>...HEAD)
Sanity check vs the issue spec: does the diff match the DoD? Any obvious red flags?
git rebase <merge-target>
Re-run the DoD commands on the rebased branch
If clean: cd back to repo root and git merge --no-ff swarm/<id> (or gh pr merge --squash if --github)
On success: git worktree remove .swarm/worktrees/<id> and git branch -D swarm/<id>
On any failure (rebase conflict, DoD red, or red flags in the diff): STOP, write one-line cause to .swarm/status.md, exit "escalated"

MANDATORY stale-squash guard (--github mode): after ANY git push --force-with-lease (e.g. post-rebase), do NOT merge blind. Re-read the PR head:

remote_sha=$(gh pr view <PR#> --json headRefOid --jq .headRefOid) local_sha=$(git rev-parse HEAD)

Only run gh pr merge --squash when remote_sha == local_sha (your rebased HEAD). On mismatch: wait a few seconds and re-check — GitHub's view of the branch lags the push. NEVER merge a stale head. Lesson: nutmeg night-shift ns-20260611-1624 — a merge fired immediately after a force-push, the squash captured the pre-push branch state, main landed red, and fix-forward #170 was needed. Applied manually for waves 5-6: zero recurrence.


The merger has merge authority on the merge-target branch. Default = the branch the skill was invoked from. `--staging-branch swarm-staging` routes merges to a staging branch instead; main is updated only when the user explicitly merges staging.

## US-6: Escalation

If the merger exits `escalated`:

1. Leave the worktree intact (forensics)
2. Append a one-line summary to stdout AND `.swarm/status.md` (under `Escalations:`)
3. Mark the issue `status: escalated` in its file
4. The dispatcher continues with the next unblocked issue
5. On all-issues-resolved-or-escalated, the run pauses for the user
6. User fixes the conflict, then `/ro:planner-worker --resume` picks up where we left off

## US-7: Worker failure -> planner re-plan (retry-and-split state machine)

Implements [[close-the-loop-verification-matrix]] § "Retry-and-split state machine". If a worker exits stuck:

1. Mark the issue `status: stuck`
2. **Attempt 2** — Retry ONCE with the failure context inlined in the worker prompt (single retry, fresh context, same models). Capture the failure-class (the matrix row that failed) so the retro can track patterns.
3. If the retry hits a **different** failure-class, treat that as attempt 2.b (reset, no counter bump): re-dispatch fresh with both failure contexts injected as constraints.
4. **Attempt 3** — If still stuck on the same failure-class, re-dispatch with model bump (Opus implementer + Opus reviewer; both fresh contexts).
5. **Auto-split** — If attempt 3 fails: re-invoke the **planner Agent** with:
   - The stuck issue body
   - All three workers' failure logs
   - The matrix-row(s) that failed across attempts
   - Instructions to decompose this single issue into 2-4 smaller sub-issues that route around the failure mode
6. New sub-issues are appended to `.ralph/<name>/issues/` with ids like `001a`, `001b` and `depends_on` set sensibly. When `--github`, they also get GH issue numbers via the standard slicer flow and the parent gets the `split-by-loop` label.
7. Original stuck issue is marked `status: replaced-by: [001a, 001b]`
8. Dispatcher picks up the new sub-issues next cycle
9. **Hard escalation** — If the same parent ancestry triggers auto-split twice (e.g. `001` → `001a` + `001b`, then `001a` → `001a-i` + `001a-ii`), STOP and escalate to HITL via Pushover + Telegram. Anthropic's "0% across many trials = broken task" rule says the spec is wrong, not the agent.

### Emit `failures[]` for the retro

For every auto-split, EVERY model-bumped attempt, AND every successful merge that required >= 2 attempts, append a JSON line to `.swarm/failures.jsonl`:

```json
{
  "slice": 234,
  "mode": "thrashing|proxy-gaming|misalignment|non-convergence|context-drift|runaway-resource|other",
  "evidence_ref": "<PR url or .swarm/logs/<id>.log>",
  "narrative": "<one to two sentences>"
}

/ro:night-shift-retro reads .swarm/failures.jsonl as part of US-1 (Gather run data) and folds entries into the retro JSON's failures[] block. The retro classifier may re-classify the mode based on cross-slice patterns (a single thrashing looks like non-convergence; three across slices is a SYSTEM signal).

This is the only path that re-invokes the planner mid-run.

US-8: --github flag

When --github is set:

The planner ALSO calls gh issue create --title <title> --body <body> --label swarm for each backlog issue and stores the issue number in the issue file's frontmatter (github-issue: 42)
Workers git push origin swarm/<id> and gh pr create --base <merge-target> --head swarm/<id> --title <title> --body "Closes #42"
Merger uses gh pr merge --squash --delete-branch instead of local merge — gated by the MANDATORY stale-squash guard: after any git push --force-with-lease, re-read gh pr view <PR#> --json headRefOid and only merge when it equals the local rebased HEAD SHA; on mismatch wait and re-check, never merge a stale head (nutmeg #168 incident: the squash captured pre-push state, main landed red, fix-forward #170 required)
On escalation, the PR is left open with a swarm:escalated label added
Prereq: gh auth status must be green; gh repo view must resolve; the skill fails loudly otherwise

US-9: --judge-agent

Default termination: empty backlog = exit.

With --judge-agent, after each dispatch cycle (one cycle = "dispatch all unblocked, wait for all merged or escalated"), invoke a judge Agent (Opus):

You are the JUDGE agent for /ro:planner-worker.

PRD: .ralph/<name>/prd.md (re-read it)
Backlog state: .ralph/<name>/issues/ (re-read all)
This cycle: <summary of merged + escalated>

Decide: KEEP_GOING or STOP.

KEEP_GOING means: the PRD has goals not yet covered by the merged backlog, and you can describe what's missing in 1-3 bullets. Returning KEEP_GOING re-invokes the planner with those bullets.

STOP means: the PRD's acceptance criteria are met, or all remaining work is escalated, or you're going in circles.

Hard caps (regardless of judge): --max-cycles (default 10), --max-runtime (default 4h).

KEEP_GOING re-invokes the planner with the judge's missing-work bullets; STOP exits cleanly.

US-10: --afk

--afk is the autonomous combination flag. It implies:

--auto-approve (skip US-2 confirmation)
--judge-agent (Cursor-style termination)
--skip-grill (use defaults for unspecified flags)

Always fire completion-report + Pushover at end (load-bearing default)

ANY /ro:planner-worker run against a real backlog ends with TWO tail calls, in this order:

/ro:completion-report --prs <merged-pr-list> --title "<prd-name> swarm" --no-open — writes a browsable HTML report to <repo>/.completion-reports/<ts>-<slug>.html with per-PR cards (CI status, file stats, summary), per-file diffs, per-PR revert command, per-file rollback command, and a risk panel (schema migrations, env changes, large deletions, lockfile-only PRs, no-tests-on-feat). Capture the absolute path it prints. Pass --prs rather than --prd because the swarm tracks the merged PR list directly; falls back to --prd <name> if PR list isn't available.
/ro:pushover --url file://<path-from-step-1> — sends the done / paused / blocked / crashed ping with the report path as a deep link.

If step 1 reports an empty range, skip --url on step 2 but STILL send the ping with state.

Pushover message anatomy:

night shift done: <N> stories merged, <M> stuck, <duration>

with the report URL attached so tapping the notification opens the diff browser.

Skip BOTH tail calls ONLY when:

--plan-only (planner runs but no workers dispatch)
User explicitly passed --no-ping

Per the global Pushover firing rule (see ~/CLAUDE.md § Pushover Notifications, condition 4) and the completion-report skill at the plugin's skills/completion-report/SKILL.md.

US-11: --workers N

N defaults to 5. Hard ceiling of 10 without --unsafe-many. Above 10:

WARNING: >10 worktrees will likely thermal-throttle your laptop and consume
<estimate_mb> MB of disk via duplicate node_modules. Pass --unsafe-many to
override.

The skill computes estimate_mb as repo_size_mb * worker_count.

US-12: --resume

After a human-resolved escalation, --resume:

Re-reads .swarm/status.md and .ralph/<name>/issues/*.md
Re-dispatches any issue with status: ready and unmet depends_on now met
Re-attempts any issue with status: escalated whose worktree was modified since the escalation timestamp
Idempotent: running --resume on a clean state is a no-op

US-13: Live status + logs

During the run:

.swarm/logs/<id>.log is the worker's own append-only log
.swarm/status.md is rewritten on every state change with this template:

# planner-worker live status

PRD: <name>
Started: <ISO8601>
Cycle: <N> of <max>

## Workers active
- 002 (worker, 4 min)
- 003 (merger, rebasing)

## Merged
- 001 schema: add points column          <commit-hash>

## Escalated
- (none)

## Stuck (retrying)
- (none)

## Last 5 events
- 12:14:02 worker-001 GREEN
- 12:14:05 merger-001 dispatched
- 12:14:18 merger-001 MERGED  abc1234
- 12:14:19 worker-002 dispatched
- 12:14:19 worker-003 dispatched

Tail with watch -n 2 cat .swarm/status.md or code -r .swarm/status.md.

US-14: Postmortem on exit

On final exit, the skill writes .swarm/run-<ISO8601>.md:

# Swarm run postmortem

PRD: <name>
Duration: 47m 12s (started 2026-05-14T07:30Z, ended 2026-05-14T08:17Z)
Workers cap: 5

## Planned issues: 5
- 001 schema: add points column
- 002 service: award points on completed task
- 003 dashboard: tile
- 004 hook: usePoints
- 005 integration test

## Merged: 4
- 001  abc1234  (4 min worker, 2 min merger)
- 002  def5678  (8 min worker, 1 min merger)
- 003  ghi9abc  (12 min worker, 3 min merger)
- 004  jkl0def  (6 min worker, 1 min merger)

## Stuck: 1
- 005 integration test: failed e2e setup after retry; planner re-decomposed into 005a, 005b (queued for next run)

## Escalated: 0

## PRs (github mode)
- (n/a; local merge mode)

Per-repo config (.swarm.json)

Optional. If present in the repo root:

{
  "dod": {
    "test": "pnpm test",
    "typecheck": "pnpm typecheck",
    "extras": ["pnpm lint", "./verify-quick.sh"]
  },
  "models": {
    "planner": "opus",
    "worker": "sonnet",
    "merger": "opus",
    "judge": "opus"
  },
  "workers": 5,
  "github": false,
  "merge-target": "main"
}

Sensible defaults when the file is absent. CLI flags always win over the file.

File layout (in the user's repo)

.ralph/<name>/
  prd.md
  issues/
    001-schema-points.md
    002-service-award.md
    ...

.swarm/                         # gitignored
  swarm.json                    # optional config
  worktrees/                    # active git worktrees
    001-schema-points/
    ...
  logs/
    001-schema-points.log
    ...
  claims.jsonl                  # live claims registry (US-3c2), cleared at wave end
  shared-learnings.md           # append-only one-liner discoveries (US-3c2)
  status.md                     # live state
  run-<ISO8601>.md              # postmortem (one per run)

Anti-patterns

DO NOT silently flip to batched / no-worktree mode. Worktree isolation is the whole point
DO NOT run without a PRD. If .ralph/<name>/prd.md is missing, fail loudly and point to /ro:write-a-prd
DO NOT mutate main directly from the worker; only the merger has merge authority
DO NOT push worker branches in non---github mode (no noise on origin)
DO NOT exceed 10 workers without --unsafe-many. Real laptops thermal-throttle hard

Lessons from live runs

Captured from the lekkertaal swarm run on 2026-05-14 (8 PRs landed across two waves). These are baked into the prompt template each worker receives.

1. Pre-assign migration numbers when N workers all touch the DB

When multiple workers each run pnpm drizzle-kit generate, they all grab the same next number (e.g. 0004) and stomp drizzle/meta/_journal.json. The conflict is mechanical to resolve but burns planner cycles.

Fix: the planner assigns each DB-touching worker a unique migration slot in its dispatch prompt: "your migration is 0007_<slug>.sql; do not run drizzle-kit generate until you've manually claimed that number in _journal.json." Workers without a DB change don't get a slot.

2. Worker poll loops must be foreground bash, NOT in-context monitor

Workers that delegate CI watching to an in-context monitor tool can park silently if the monitor doesn't deliver events back into the worker's conversation. We saw one worker exit with "I'll wait for the monitor events to come through" — the PR was open, conflicts present, and the worker never came back.

Fix: the dispatch prompt MUST tell the worker to poll CI via a foreground bash loop using gh api repos/<owner>/<repo>/commits/<sha>/check-runs, sleep 30s, re-poll. Hard cap 15 minutes. NEVER use an in-context monitor for this. If checks stay pending past the cap, STOP and report; do not retry blindly.

3. CI may not trigger on the first push to a new branch

We observed a worker push a branch + open a PR, and GitHub Actions did not fire on it. Force-pushing the same branch after a no-op rebase triggered CI normally. Root cause not fully isolated — could be path-filter quirk, draft-PR gating, or a brief Actions hiccup.

Fix: after git push, the worker must verify CI fired within 60s via gh api .../actions/runs?head_sha=<sha>. If no run exists, force-push a no-op (git commit --amend --no-edit && git push -f) to nudge it. If that still doesn't work, escalate to the planner.

4. Workers should rebase, not merge, on parent-branch drift

When PR-1 lands while PR-2 is still in flight, PR-2 will hit conflicts. Rebasing PR-2 onto the new main and force-pushing is cleaner than merging main into PR-2 (which leaves merge bubbles in the PR). The lekkertaal run #66 (prompt caching) rebased cleanly onto #65 (telemetry) mid-flow and ended up composing both behaviours — a real win.

Fix: worker prompt should say "if your push is rejected because main moved, rebase onto origin/main, resolve conflicts, force-push-with-lease. Do not merge main into your branch."

5a. Parallel workers MUST use isolated worktrees; verify on arrival

Even with isolation: "worktree" in the Agent tool, workers occasionally land on the SHARED main checkout instead of a fresh worktree. When that happens, parallel workers stomp each other's uncommitted edits as they git checkout between branches. Observed 4+ times in the lekkertaal run (PRs #79, #82, #100, #103).

Symptoms: a worker reports "the main checkout had uncommitted edits from another worker" or "my edits were wiped when another worker switched branches".

Fix in worker prompt: every dispatch prompt should include a self-check at the top:

ON ARRIVAL:
1. Run `git worktree list` and `pwd`. Confirm your pwd is NOT
   <main-repo-path>. If it IS, create your own worktree at
   `<main-repo-path>-wt-<issue-number>` and `cd` into it.
2. `git status` should show clean (no other worker's edits).
3. Only then start work.

This guarantees isolation even when the harness's isolation: "worktree" doesn't take effect for some reason. The cost is one extra mkdir + checkout per worker; the benefit is no lost work.

Fix in planner: when 3+ workers are dispatched in parallel and ANY two report worktree collisions, the planner should add a pre-dispatch step that creates the worktrees explicitly and passes the path to each worker.

5. Trust local pre-push CI; do not wait for GitHub Actions

If the repo has a husky pre-push hook running pnpm quality-checks (format + lint + build + test + audit), pushing a branch already proves the same checks GitHub Actions will run. Waiting for GH CI to repeat the gauntlet adds 1-2 minutes per worker for zero additional safety.

Default (--trust-local-ci, ON): workers run local checks, push, immediately squash-merge. If pre-push hook is missing or SKIP_QUALITY_CHECKS=1 was used, fail loudly and do not merge.

Opt-out (--wait-for-gh-ci): workers still poll gh api .../check-runs and only merge on green. Use when:

The repo has no pre-push hook
The repo lacks parity between local and CI (e.g. CI does additional integration tests against managed services)
The repo has branch-protection rules that require status checks before merge (in which case the planner cannot merge without CI green anyway — the flag is forced ON)

Setup-time prompt (US-0): the planner asks the user once per repo:

"This repo has a pre-push hook running [<list of checks>]. GitHub Actions runs effectively the same checks. Skip GH CI wait and merge immediately after a clean push? [Y/n]"

The answer is persisted in .swarm.json as trust-local-ci: true|false.

Safety net: GH CI still runs on the merged commit on main. If a fluke bug lands, the post-merge CI run will flag it on main, and the planner can revert. The frequency of this happening (zero in the lekkertaal 8-PR run) does not justify the per-PR wait cost.

6. Pre-dispatch conflict prediction cannot catch scope drift — use live claims

From the nutmeg double-shift, 2026-06-11: slices #167 and #168 had disjoint predicted territories, passed the static file-area pass, and were dispatched in the same wave. Both workers then independently created admin/sources.tsx — the work drifted into the same file mid-implementation, and the collision only surfaced at merge time.

Fix: the live claims registry (US-3c2). Workers append {worker, issue, paths, ts} to .swarm/claims.jsonl BEFORE editing an area and re-check before any scope expansion; a clash with a live claim means stop-and-flag, not edit. Prediction handles the conflicts you can see from the slice bodies; claims handle the ones that only appear once the work is underway. Cooperative, not enforced — the merge-time rebase + DoD re-run is still the hard gate.

7. Never squash-merge a freshly force-pushed PR without verifying the head; workers must commit early and never wait silently

Two failure modes from the nutmeg night-shift ns-20260611-1624:

Stale squash: a merger ran git push --force-with-lease (post-rebase) and immediately gh pr merge --squash --admin. The squash captured the STALE pre-push branch state — GitHub's view of the head hadn't caught up — so main landed red and fix-forward #170 was required.

Fix: the mandatory stale-squash guard in the merger template (US-5): after any force-push, re-read gh pr view <PR#> --json headRefOid and only merge when it equals the local rebased HEAD SHA; on mismatch wait and re-check. Applied manually for waves 5-6 of the same run — zero recurrence.

Worker stall: a wave-5 worker died mid-task with 111 lines of uncommitted work — all lost, and the stall itself was invisible because the worker waited silently on a hung command.

Fix: stall-hardening in the worker template (US-3 Rules): commit each coherent unit as soon as it exists so progress survives a kill, and never wait silently more than 5 minutes on a build/test — kill, diagnose verbosely, report. Attempt 2 with this protocol shipped clean.

/ro:planner-worker (alias /ro:swarm)

Part of the local factory

When to use

A PRD has been grilled via /ro:grill-me and written via /ro:write-a-prd and now has 5+ vertical-slice issues to ship
You want overnight autonomous execution (--afk)
The issues are independent enough that parallelisation actually helps (the planner emits depends_on to serialise hot-file work)
You're on the Max plan and want Agent-tool fan-out, not API-billed Sandcastle

When NOT to use

One-shot single-story task (use /ro:ralph --mode single)
Cross-repo / monorepo coordination (out of scope v1)
Hostile-code or untrusted-input runs (no container isolation; future --sandcastle flag)
You don't have a PRD yet (run /ro:grill-me then /ro:write-a-prd first)

Quick start

# Interactive: grills you for config, then runs against a named PRD
/ro:planner-worker --prd lekkertaal-pwa-bootstrap

# Autonomous night-shift (AFK)
/ro:planner-worker --prd lekkertaal-pwa-bootstrap --afk

# Real GitHub issues + PRs
/ro:planner-worker --prd lekkertaal-pwa-bootstrap --github

# Resume after a human-resolved escalation
/ro:planner-worker --resume

Filter / scope: `prd:draft` is NEVER picked up

Pass --label <gate> to scope the initial query.
Post-filter the JSON: drop any issue whose labels[].name contains prd:draft.
Equivalently, use gh issue list --label <gate> --search "-label:prd:draft" ... to push the exclusion server-side.

Tip: if you're not sure what's queued vs what's still an idea, run /ro:list-draft-prds first to see the drafts inbox before kicking off the swarm.

US-0: Interactive config grilling

Order and project-awareness rules:

GitHub issues + PRs for this run?
- Resolve repo mode first via the 4-line snippet from /ro:repo-mode. If mode == work → recommend "no" (hard rule, do NOT prompt; work-mode repos must stay invisible to the work GH/Jira/ADO project). If mode == personal → recommend "yes" when a gh remote resolves. If mode == unset → run the first-run prompt from /ro:repo-mode § "First-run prompt", persist, then re-resolve.
- Probe: gh repo view --json visibility,nameWithOwner 2>/dev/null
- For personal mode: recommend "yes" when a remote resolves and visibility is PRIVATE; "no" when no remote configured or repo is public + you don't want noise.
- CLI equivalent: --github
Worker count?
- Probe: cores via sysctl -n hw.ncpu (macOS) or nproc (linux); free disk on .swarm/worktrees/ parent via df -k .
- Recommend min(5, cores - 2) and warn if free disk < (repo_size_mb * worker_count * 2)
- Rationale: default 5 on a fresh usage window. Scale DOWN to 2-3 when sharing the window with a long interactive session (2026-06-11 lesson: 5 workers + a heavy main session exhausted the weekly cap mid-run). Scale UP via --workers N, .swarm.json workers, or --unsafe-many past 10.
- CLI equivalent: --workers N
Run mode?
- Recommend "interactive" by default
- Offer "AFK" if the user said anything like "kick off and ping me", "run overnight", "wake me when done"
- CLI equivalent: --afk for AFK, nothing for interactive
Merge target?
- Probe: git rev-parse --abbrev-ref HEAD
- Recommend the current branch; offer a staging branch (swarm-staging) as alt
- CLI equivalent: --staging-branch <name> to override; default = current branch
Worker model?
- Recommend sonnet; offer opus only if the PRD is flagged as "research-heavy" (more than 3 issues mention "investigate" or "explore")
- CLI equivalent: encoded in .swarm.json models.worker
Add a Judge agent?
- Heuristic: count concrete acceptance criteria in the PRD. If the PRD looks "closed" (>= 3 numbered acceptance criteria), recommend "no". If open-ended ("we'll know when we see it"), recommend "yes"
- CLI equivalent: --judge-agent
Trust local pre-push CI; skip waiting for GitHub Actions?
- Probe: test -f .husky/pre-push AND grep that file for any of pnpm quality, pnpm test, pnpm build, pnpm lint, pnpm typecheck
- Recommend "yes" when a pre-push hook with at least build + test + lint is present (the case in any repo scaffolded by /ro:new-tanstack-app)
- Recommend "no" when no pre-push hook, OR when the repo's .github/workflows/*.yml runs checks that are NOT in the pre-push hook (e.g. cross-OS matrix, integration tests against a managed service)
- CLI equivalent: --trust-local-ci (yes) or --wait-for-gh-ci (no)
- Persisted in .swarm.json as trust-local-ci: true|false
- Workers configured with trust-local-ci=true push, verify the push hook ran clean, and squash-merge immediately. They do NOT poll gh api .../check-runs.
- If the repo has branch-protection rules that require status checks, the planner detects this via gh api repos/<owner>/<repo>/branches/main/protection and overrides to wait-for-gh-ci=true (printing why)

Each answer is echoed back as the CLI flag the user could have passed next time, e.g. "noted, equivalent to --workers 5". CLI flags supplied at invocation skip their corresponding prompt.

US-1: Planner

The planner is invoked once at the start of a run via the Agent tool, with the Opus model.

Planner prompt template (see scripts/planner-prompt.sh for the canonical version):

You are the PLANNER agent for /ro:planner-worker.

Read the PRD at .ralph/<name>/prd.md.
Explore the codebase enough to size the work.
Emit a backlog of vertical-slice issues to .ralph/<name>/issues/<NNN>-<slug>.md.

Each issue file MUST have frontmatter:

  id: 001
  title: <one-line>
  status: ready
  depends_on: []        # array of issue ids that must merge first
  estimate: <S|M|L>
  dod:
    - <pnpm test passes>
    - <pnpm typecheck passes>
    - <any extra acceptance check>

Body: 5-30 lines describing what to build + why, file pointers, test plan.

Every body MUST end with the verbatim `### Close-the-loop tests` block
(unit + integration + e2e + live smoke) from `/ro:slice-into-issues`.
The dispatcher refuses to send a worker against a slice missing it
(see US-2a "Close-the-loop AC gate"); pre-emit it so we never hit
that gate.

Rules:
- Vertical slices only. No "scaffold the schema" issues without a UI touchpoint.
- Mark depends_on for hot-file conflicts (two issues touching the same component).
- 3-12 issues for v1. If you'd plan more than 12, split the PRD instead.
- Exit cleanly when issues are written. Do not implement anything.

The planner is only re-invoked under US-7 (worker failure re-decomposition).

US-2: Confirm the backlog

After the planner exits, the skill prints a one-screen summary:

Backlog plan ready (.ralph/<name>/issues/):

  001 [S] schema: add points column                    deps: -
  002 [M] service: award points on completed task      deps: 001
  003 [M] dashboard: tile rendering points trend       deps: 002
  004 [S] hook: usePoints                              deps: 001
  005 [M] integration test: points e2e                 deps: 003,004

Dispatch order: [001] -> [002, 004] -> [003] -> [005]
Workers will run up to N=3 in parallel.

[y] dispatch  [n] re-plan with feedback  [q] abort

Skipped under --auto-approve or --afk. Rejecting with feedback re-invokes the planner Agent with the feedback inlined.

US-2a: Close-the-loop AC gate

Behaviour is controlled by the repo-local .ronan-skills.json flag swarm.missing_test_acs:

refuse (default, safe): print the offending slice ids + bodies, skip dispatching them, fail-loud if zero dispatchable slices remain. Forces the human to either re-run /ro:slice-into-issues against the parent PRD (which now emits the section automatically) or hand-edit the slice body and re-queue.
inject: auto-prepend the boilerplate ### Close-the-loop tests block (verbatim from /ro:slice-into-issues) before dispatch AND tag the issue with the acs-auto-injected label (so reviewers know to read the diff for actual test files). Faster but riskier: a worker may implement the slice without writing the tests if the issue body doesn't already reference them in the "What to build" prose.

Parse logic (cheap, no LLM needed) — accepts EITHER the narrower ### Close-the-loop tests marker OR the full ### Close-the-loop verification matrix marker:

issue_body="$(gh issue view "$NUM" --json body --jq .body)"
if ! grep -qE '^### Close-the-loop (tests|verification matrix)' <<< "$issue_body"; then
  case "$(jq -r '.swarm.missing_test_acs // "refuse"' .ronan-skills.json)" in
    inject) inject_block_and_label_acs_auto_injected ;;
    *)      mark_slice_skipped_with_reason "missing-close-the-loop-tests" ;;
  esac
fi

US-2b: Ranked queue input from /ro:night-shift

--queue and --prd are mutually exclusive. When --queue is set the planner Agent is NOT re-invoked; that's the night-shift loop's job between waves.

US-3: Worker dispatch

For each unblocked issue (depends_on all merged or empty), up to the worker cap:

gh issue develop <gh-issue-number> --name swarm/<id> — creates the branch with the issue↔branch dev-link (canon ronan-skills/canon/labels.md § Branch flow). NEVER plain git checkout -b here; the dev-link is what makes the Closes #N propagation reliable and lets the night-shift retro walk issue↔branch↔PR without title-matching.
git worktree add .swarm/worktrees/<id> swarm/<id> — check the just-created branch out into the isolated worktree.
Append .swarm/ to .gitignore if not already present (commit only on first run, on a chore(swarm): gitignore .swarm commit)
Spawn a Claude Code Agent (default Sonnet) with the issue body as instructions and working directory pinned to the new worktree

Workers are dispatched as multiple Agent tool calls in a single assistant message so the runtime fans out. They are independent and do not coordinate.

US-3b: File-area conflict detection (hybrid static + scout)

Two parallel workers touching the same files race each other at merge time. The skill mitigates this with a hybrid graph, separate from the depends_on graph.

Canonical-label hints (read first, free): before doing any path-set analysis, read these labels from the canonical label system (the plugin's canon/labels.md):

parallel-eligible — the slicer asserted this slice is file-disjoint from its siblings. The planner treats this as "safe to fan out unless the static pass disagrees with a path overlap".
repo-lock — the slice takes a repo-wide lock (lockfile churn, schema reset, top-level config rewrite). Mark it serial-only: it runs in its own wave after every parallel-eligible slice drains.
hitl-likely — reviewer expected to escalate (ORM, schema-migration, billing, OAuth, secret rotation). The planner counts these against the HITL concurrency cap (default 1 in flight at a time) to keep reviewer attention focused.

These hints are advisory: the static pass below still runs and a path conflict still serialises. The labels exist so the planner doesn't need to re-derive the slicer's intent.

Static pass (every wave, free): read each candidate slice body for:

A ## Files touched or ## Touches section listing paths
An estimate: or paths: frontmatter array
Inline mentions matching src/..., app/..., apps/..., packages/..., drizzle/..., etc.

Scout pass (only after a wave hits a cross-worker merge conflict): spawn a one-shot scout sub-agent per remaining candidate slice with the prompt:

You are a SCOUT sub-agent for /ro:planner-worker.

Slice: <issue body inlined>

Task: read the slice and predict the SET of file paths the implementation will touch.
Output a JSON object on stdout, nothing else:
  {"paths": ["src/routes/...", "src/lib/..."], "confidence": "high|medium|low"}

Rules:
- Do not edit anything.
- Do not run pnpm install or any side-effects.
- Read the codebase enough to be accurate. ~2-3 min budget.
- If you cannot tell, output paths: [] with confidence: low.

Scout outputs are merged into the file-area graph and persist for subsequent waves of the same run.

--no-scout disables the scout pass entirely (purely static). Use when you'd rather waste worker time on conflicts than spend planner time on probing.

US-3c2: Live claims registry + shared learnings (every wave)

.swarm/claims.jsonl (i.e. $SWARM/claims.jsonl) — the live claims registry. One JSON object per line:

{"worker": "003", "issue": 168, "paths": ["src/routes/admin/sources.tsx", "src/lib/sources/"], "ts": "2026-06-11T21:14:02Z"}

A worker appends its claim BEFORE editing an area — not after.
A worker reads all existing claims on arrival AND again before any scope expansion (touching a path it didn't originally claim).
On a clash with another live claim: do NOT edit the contested path. Flag the clash in .swarm/status.md and in the final report so the planner arbitrates (serialise, re-scope, or hand the path to one worker).

US-3c: Canonical lifecycle transitions

The planner is the lifecycle-label authority while a wave is in flight, per the canonical label system (the plugin's canon/labels.md):

Slice pickup query (GH mode): gh issue list --label kind:slice --label ready-for-agent --state open. The planner picks from this set, applying the conflict graph + label hints above.
On dispatch: swap ready-for-agent → in-progress on the chosen issue (single gh issue edit --add-label in-progress --remove-label ready-for-agent).
On reviewer reject: swap in-progress → ready-for-agent (back into the queue for the next wave).
On HITL escalation: swap in-progress → needs-human and post a structured comment naming the human action required.
On PR merge: the worker MUST explicitly gh issue edit <NUM> --remove-label in-progress BEFORE the squash-merge call. The Closes #N autoclose flips the issue from open to closed but does NOT strip labels — without the explicit remove, the closed issue keeps a stale in-progress label and the lifecycle-label invariant breaks. Closed state is the absence of any lifecycle label.

US-3d: Research-worker mode (`kind:research`)

Worker prompt template (code workers; for kind:research see US-3d above and [[canon:research-tasks]]):

You are a WORKER agent for /ro:planner-worker.

Issue: .ralph/<name>/issues/<id>-<slug>.md (full body inlined here)
   OR a GH issue number; in that case, FIRST run:
     gh issue develop <issue-number> --name <slug> --checkout
   to branch off the issue (NOT git checkout -b). This produces the
   issue->branch dev-link required by the canon.
Worktree: .swarm/worktrees/<id>/
You are pinned to this worktree. Do NOT touch sibling worktrees.

Shared state lives in the PRIMARY checkout's .swarm/, NOT your worktree
(.swarm/ is gitignored, so a relative .swarm/... path writes a private copy
nobody else sees). FIRST resolve it once:
  source <repo>/skills/planner-worker/scripts/swarm-shared.sh   # if available
  # else: SWARM="$(cd "$(git rev-parse --git-common-dir)/.." && pwd)/.swarm"
Then use swarm_read_learnings / swarm_read_claims / swarm_learn / swarm_claim
(or $SWARM/shared-learnings.md and $SWARM/claims.jsonl directly).

Workflow:
1. On arrival: swarm_read_learnings (repo quirks your wave-mates already hit)
   and swarm_read_claims (which paths other live workers hold)
2. Read the issue body
3. swarm_claim '{"worker","issue","paths","ts"}' BEFORE editing an area — not
   after. Re-read claims and append a fresh claim BEFORE any scope expansion
   (touching a path you didn't originally claim)
4. NEVER edit a path another live claim holds. On a clash: do NOT edit —
   flag it in $SWARM/status.md and your final report; the planner arbitrates
5. Implement the feature/fix
6. swarm_learn "..." (build quirks, flaky tests, migration gotchas) as
   one-liners AS YOU HIT THEM, not at the end
7. Run the DoD commands (auto-detected or .swarm.json `dod`)
8. Only exit successfully when ALL DoD commands pass
9. Commit on the worktree branch with an emoji-conventional message
10. Append a one-line summary to $SWARM/logs/<id>.log (primary checkout, not your worktree)
11. Before squash-merge (or just before exit, whichever applies in your dispatch shape): `gh issue edit <NUM> --remove-label in-progress`. The `Closes #N` autoclose strips the issue's open state but does NOT strip labels; without this step the closed issue keeps its stale `in-progress` label and the canonical lifecycle invariant breaks.

Worktree guard — run BEFORE every git command (including `git status` / `git rev-parse`):

```bash
test "$(git rev-parse --abbrev-ref HEAD)" != "main" \
  || { echo "ERROR: on main branch, refusing to commit"; exit 1; }
test "$(pwd)" != "<MAIN_REPO_ABSOLUTE_PATH>" \
  || { echo "ERROR: in main checkout, refusing to commit"; exit 1; }

Rules:

One commit per logical change, but ALL must land before exit
Commit early and often: commit each coherent unit AS SOON as it exists, so progress survives a kill or crash. A worker that dies mid-task should leave commits behind, not a pile of uncommitted edits (lesson: a nutmeg wave-5 worker died with 111 uncommitted lines; attempt 2 with this protocol shipped clean)
NO silent waits over 5 minutes on builds/tests. If a command hasn't returned in 5 minutes: kill it, diagnose verbosely (re-run with verbose flags, read the logs), and report what you found — never sit quietly
Do not push (the merger handles that)
Do not edit issues/, prd.md, or other worktrees
If you fail DoD after one retry with the failure context, exit with status "stuck" and a one-line cause to .swarm/logs/<id>.log


## US-4: DoD detection + override

Auto-detect from `package.json` in this order:

1. `.swarm.json` `dod` block (highest precedence)
2. If `package.json` has `scripts.test` and `scripts.typecheck`: use them
3. Else if `tsconfig.json` present: `tsc --noEmit`
4. Else: `pnpm test` if `pnpm-lock.yaml` present; `npm test` if `package-lock.json` present

If `.swarm.json` lists `dod.extras`, append those (e.g. `pnpm lint`, `./verify-quick.sh`).

Worker exits successfully **only** when every DoD command passes (non-zero exit = stuck).

## US-5: Merger agent

When a worker reports green, dispatch a **merger Agent** (Opus) for that worktree:

You are the MERGER agent for /ro:planner-worker, issue <id>.

Worktree: .swarm/worktrees/<id>/ Merge target: <merge-target-branch>

Workflow:

cd into the worktree
Read the diff (git diff <merge-target>...HEAD)
Sanity check vs the issue spec: does the diff match the DoD? Any obvious red flags?
git rebase <merge-target>
Re-run the DoD commands on the rebased branch
If clean: cd back to repo root and git merge --no-ff swarm/<id> (or gh pr merge --squash if --github)
On success: git worktree remove .swarm/worktrees/<id> and git branch -D swarm/<id>
On any failure (rebase conflict, DoD red, or red flags in the diff): STOP, write one-line cause to .swarm/status.md, exit "escalated"

MANDATORY stale-squash guard (--github mode): after ANY git push --force-with-lease (e.g. post-rebase), do NOT merge blind. Re-read the PR head:

remote_sha=$(gh pr view <PR#> --json headRefOid --jq .headRefOid) local_sha=$(git rev-parse HEAD)


The merger has merge authority on the merge-target branch. Default = the branch the skill was invoked from. `--staging-branch swarm-staging` routes merges to a staging branch instead; main is updated only when the user explicitly merges staging.

## US-6: Escalation

If the merger exits `escalated`:

1. Leave the worktree intact (forensics)
2. Append a one-line summary to stdout AND `.swarm/status.md` (under `Escalations:`)
3. Mark the issue `status: escalated` in its file
4. The dispatcher continues with the next unblocked issue
5. On all-issues-resolved-or-escalated, the run pauses for the user
6. User fixes the conflict, then `/ro:planner-worker --resume` picks up where we left off

## US-7: Worker failure -> planner re-plan (retry-and-split state machine)

Implements [[close-the-loop-verification-matrix]] § "Retry-and-split state machine". If a worker exits stuck:

1. Mark the issue `status: stuck`
2. **Attempt 2** — Retry ONCE with the failure context inlined in the worker prompt (single retry, fresh context, same models). Capture the failure-class (the matrix row that failed) so the retro can track patterns.
3. If the retry hits a **different** failure-class, treat that as attempt 2.b (reset, no counter bump): re-dispatch fresh with both failure contexts injected as constraints.
4. **Attempt 3** — If still stuck on the same failure-class, re-dispatch with model bump (Opus implementer + Opus reviewer; both fresh contexts).
5. **Auto-split** — If attempt 3 fails: re-invoke the **planner Agent** with:
   - The stuck issue body
   - All three workers' failure logs
   - The matrix-row(s) that failed across attempts
   - Instructions to decompose this single issue into 2-4 smaller sub-issues that route around the failure mode
6. New sub-issues are appended to `.ralph/<name>/issues/` with ids like `001a`, `001b` and `depends_on` set sensibly. When `--github`, they also get GH issue numbers via the standard slicer flow and the parent gets the `split-by-loop` label.
7. Original stuck issue is marked `status: replaced-by: [001a, 001b]`
8. Dispatcher picks up the new sub-issues next cycle
9. **Hard escalation** — If the same parent ancestry triggers auto-split twice (e.g. `001` → `001a` + `001b`, then `001a` → `001a-i` + `001a-ii`), STOP and escalate to HITL via Pushover + Telegram. Anthropic's "0% across many trials = broken task" rule says the spec is wrong, not the agent.

### Emit `failures[]` for the retro

For every auto-split, EVERY model-bumped attempt, AND every successful merge that required >= 2 attempts, append a JSON line to `.swarm/failures.jsonl`:

```json
{
  "slice": 234,
  "mode": "thrashing|proxy-gaming|misalignment|non-convergence|context-drift|runaway-resource|other",
  "evidence_ref": "<PR url or .swarm/logs/<id>.log>",
  "narrative": "<one to two sentences>"
}

This is the only path that re-invokes the planner mid-run.

US-8: --github flag

When --github is set:

The planner ALSO calls gh issue create --title <title> --body <body> --label swarm for each backlog issue and stores the issue number in the issue file's frontmatter (github-issue: 42)
Workers git push origin swarm/<id> and gh pr create --base <merge-target> --head swarm/<id> --title <title> --body "Closes #42"
Merger uses gh pr merge --squash --delete-branch instead of local merge — gated by the MANDATORY stale-squash guard: after any git push --force-with-lease, re-read gh pr view <PR#> --json headRefOid and only merge when it equals the local rebased HEAD SHA; on mismatch wait and re-check, never merge a stale head (nutmeg #168 incident: the squash captured pre-push state, main landed red, fix-forward #170 required)
On escalation, the PR is left open with a swarm:escalated label added
Prereq: gh auth status must be green; gh repo view must resolve; the skill fails loudly otherwise

US-9: --judge-agent

Default termination: empty backlog = exit.

With --judge-agent, after each dispatch cycle (one cycle = "dispatch all unblocked, wait for all merged or escalated"), invoke a judge Agent (Opus):

You are the JUDGE agent for /ro:planner-worker.

PRD: .ralph/<name>/prd.md (re-read it)
Backlog state: .ralph/<name>/issues/ (re-read all)
This cycle: <summary of merged + escalated>

Decide: KEEP_GOING or STOP.

KEEP_GOING means: the PRD has goals not yet covered by the merged backlog, and you can describe what's missing in 1-3 bullets. Returning KEEP_GOING re-invokes the planner with those bullets.

STOP means: the PRD's acceptance criteria are met, or all remaining work is escalated, or you're going in circles.

Hard caps (regardless of judge): --max-cycles (default 10), --max-runtime (default 4h).

KEEP_GOING re-invokes the planner with the judge's missing-work bullets; STOP exits cleanly.

US-10: --afk

--afk is the autonomous combination flag. It implies:

--auto-approve (skip US-2 confirmation)
--judge-agent (Cursor-style termination)
--skip-grill (use defaults for unspecified flags)

Always fire completion-report + Pushover at end (load-bearing default)

ANY /ro:planner-worker run against a real backlog ends with TWO tail calls, in this order:

/ro:completion-report --prs <merged-pr-list> --title "<prd-name> swarm" --no-open — writes a browsable HTML report to <repo>/.completion-reports/<ts>-<slug>.html with per-PR cards (CI status, file stats, summary), per-file diffs, per-PR revert command, per-file rollback command, and a risk panel (schema migrations, env changes, large deletions, lockfile-only PRs, no-tests-on-feat). Capture the absolute path it prints. Pass --prs rather than --prd because the swarm tracks the merged PR list directly; falls back to --prd <name> if PR list isn't available.
/ro:pushover --url file://<path-from-step-1> — sends the done / paused / blocked / crashed ping with the report path as a deep link.

If step 1 reports an empty range, skip --url on step 2 but STILL send the ping with state.

Pushover message anatomy:

night shift done: <N> stories merged, <M> stuck, <duration>

with the report URL attached so tapping the notification opens the diff browser.

Skip BOTH tail calls ONLY when:

--plan-only (planner runs but no workers dispatch)
User explicitly passed --no-ping

Per the global Pushover firing rule (see ~/CLAUDE.md § Pushover Notifications, condition 4) and the completion-report skill at the plugin's skills/completion-report/SKILL.md.

US-11: --workers N

N defaults to 5. Hard ceiling of 10 without --unsafe-many. Above 10:

WARNING: >10 worktrees will likely thermal-throttle your laptop and consume
<estimate_mb> MB of disk via duplicate node_modules. Pass --unsafe-many to
override.

The skill computes estimate_mb as repo_size_mb * worker_count.

US-12: --resume

After a human-resolved escalation, --resume:

Re-reads .swarm/status.md and .ralph/<name>/issues/*.md
Re-dispatches any issue with status: ready and unmet depends_on now met
Re-attempts any issue with status: escalated whose worktree was modified since the escalation timestamp
Idempotent: running --resume on a clean state is a no-op

US-13: Live status + logs

During the run:

.swarm/logs/<id>.log is the worker's own append-only log
.swarm/status.md is rewritten on every state change with this template:

# planner-worker live status

PRD: <name>
Started: <ISO8601>
Cycle: <N> of <max>

## Workers active
- 002 (worker, 4 min)
- 003 (merger, rebasing)

## Merged
- 001 schema: add points column          <commit-hash>

## Escalated
- (none)

## Stuck (retrying)
- (none)

## Last 5 events
- 12:14:02 worker-001 GREEN
- 12:14:05 merger-001 dispatched
- 12:14:18 merger-001 MERGED  abc1234
- 12:14:19 worker-002 dispatched
- 12:14:19 worker-003 dispatched

Tail with watch -n 2 cat .swarm/status.md or code -r .swarm/status.md.

US-14: Postmortem on exit

On final exit, the skill writes .swarm/run-<ISO8601>.md:

# Swarm run postmortem

PRD: <name>
Duration: 47m 12s (started 2026-05-14T07:30Z, ended 2026-05-14T08:17Z)
Workers cap: 5

## Planned issues: 5
- 001 schema: add points column
- 002 service: award points on completed task
- 003 dashboard: tile
- 004 hook: usePoints
- 005 integration test

## Merged: 4
- 001  abc1234  (4 min worker, 2 min merger)
- 002  def5678  (8 min worker, 1 min merger)
- 003  ghi9abc  (12 min worker, 3 min merger)
- 004  jkl0def  (6 min worker, 1 min merger)

## Stuck: 1
- 005 integration test: failed e2e setup after retry; planner re-decomposed into 005a, 005b (queued for next run)

## Escalated: 0

## PRs (github mode)
- (n/a; local merge mode)

Per-repo config (.swarm.json)

Optional. If present in the repo root:

{
  "dod": {
    "test": "pnpm test",
    "typecheck": "pnpm typecheck",
    "extras": ["pnpm lint", "./verify-quick.sh"]
  },
  "models": {
    "planner": "opus",
    "worker": "sonnet",
    "merger": "opus",
    "judge": "opus"
  },
  "workers": 5,
  "github": false,
  "merge-target": "main"
}

Sensible defaults when the file is absent. CLI flags always win over the file.

File layout (in the user's repo)

.ralph/<name>/
  prd.md
  issues/
    001-schema-points.md
    002-service-award.md
    ...

.swarm/                         # gitignored
  swarm.json                    # optional config
  worktrees/                    # active git worktrees
    001-schema-points/
    ...
  logs/
    001-schema-points.log
    ...
  claims.jsonl                  # live claims registry (US-3c2), cleared at wave end
  shared-learnings.md           # append-only one-liner discoveries (US-3c2)
  status.md                     # live state
  run-<ISO8601>.md              # postmortem (one per run)

Anti-patterns

DO NOT silently flip to batched / no-worktree mode. Worktree isolation is the whole point
DO NOT run without a PRD. If .ralph/<name>/prd.md is missing, fail loudly and point to /ro:write-a-prd
DO NOT mutate main directly from the worker; only the merger has merge authority
DO NOT push worker branches in non---github mode (no noise on origin)
DO NOT exceed 10 workers without --unsafe-many. Real laptops thermal-throttle hard

Lessons from live runs

Captured from the lekkertaal swarm run on 2026-05-14 (8 PRs landed across two waves). These are baked into the prompt template each worker receives.

1. Pre-assign migration numbers when N workers all touch the DB

2. Worker poll loops must be foreground bash, NOT in-context monitor

3. CI may not trigger on the first push to a new branch

4. Workers should rebase, not merge, on parent-branch drift

Fix: worker prompt should say "if your push is rejected because main moved, rebase onto origin/main, resolve conflicts, force-push-with-lease. Do not merge main into your branch."

5a. Parallel workers MUST use isolated worktrees; verify on arrival

Symptoms: a worker reports "the main checkout had uncommitted edits from another worker" or "my edits were wiped when another worker switched branches".

Fix in worker prompt: every dispatch prompt should include a self-check at the top:

ON ARRIVAL:
1. Run `git worktree list` and `pwd`. Confirm your pwd is NOT
   <main-repo-path>. If it IS, create your own worktree at
   `<main-repo-path>-wt-<issue-number>` and `cd` into it.
2. `git status` should show clean (no other worker's edits).
3. Only then start work.

This guarantees isolation even when the harness's isolation: "worktree" doesn't take effect for some reason. The cost is one extra mkdir + checkout per worker; the benefit is no lost work.

5. Trust local pre-push CI; do not wait for GitHub Actions

Default (--trust-local-ci, ON): workers run local checks, push, immediately squash-merge. If pre-push hook is missing or SKIP_QUALITY_CHECKS=1 was used, fail loudly and do not merge.

Opt-out (--wait-for-gh-ci): workers still poll gh api .../check-runs and only merge on green. Use when:

The repo has no pre-push hook
The repo lacks parity between local and CI (e.g. CI does additional integration tests against managed services)
The repo has branch-protection rules that require status checks before merge (in which case the planner cannot merge without CI green anyway — the flag is forced ON)

Setup-time prompt (US-0): the planner asks the user once per repo:

"This repo has a pre-push hook running [<list of checks>]. GitHub Actions runs effectively the same checks. Skip GH CI wait and merge immediately after a clean push? [Y/n]"

The answer is persisted in .swarm.json as trust-local-ci: true|false.

6. Pre-dispatch conflict prediction cannot catch scope drift — use live claims

7. Never squash-merge a freshly force-pushed PR without verifying the head; workers must commit early and never wait silently

Two failure modes from the nutmeg night-shift ns-20260611-1624:

Worker stall: a wave-5 worker died mid-task with 111 lines of uncommitted work — all lost, and the stall itself was invisible because the worker waited silently on a hung command.

Adoption

RonanCodes/skills/planner-worker

$ install --global

Security Scan Results

SKILL.md

/ro:planner-worker (alias /ro:swarm)

Part of the local factory

When to use

When NOT to use

Quick start

Filter / scope: prd:draft is NEVER picked up

US-0: Interactive config grilling

US-1: Planner

US-2: Confirm the backlog

US-2a: Close-the-loop AC gate

US-2b: Ranked queue input from /ro:night-shift

US-3: Worker dispatch

US-3b: File-area conflict detection (hybrid static + scout)

US-3c2: Live claims registry + shared learnings (every wave)

US-3c: Canonical lifecycle transitions

US-3d: Research-worker mode (kind:research)

US-8: --github flag

US-9: --judge-agent

US-10: --afk

Always fire completion-report + Pushover at end (load-bearing default)

US-11: --workers N

US-12: --resume

US-13: Live status + logs

US-14: Postmortem on exit

Per-repo config (.swarm.json)

File layout (in the user's repo)

Anti-patterns

Lessons from live runs

1. Pre-assign migration numbers when N workers all touch the DB

2. Worker poll loops must be foreground bash, NOT in-context monitor

3. CI may not trigger on the first push to a new branch

4. Workers should rebase, not merge, on parent-branch drift

5a. Parallel workers MUST use isolated worktrees; verify on arrival

5. Trust local pre-push CI; do not wait for GitHub Actions

6. Pre-dispatch conflict prediction cannot catch scope drift — use live claims

7. Never squash-merge a freshly force-pushed PR without verifying the head; workers must commit early and never wait silently

See also

Related Skills

RonanCodes/skills/linear-pipeline

RonanCodes/skills/in-your-face

RonanCodes/codex

RonanCodes/warranty-check

RonanCodes/skills/planner-worker

$ install --global

Security Scan Results

SKILL.md

/ro:planner-worker (alias /ro:swarm)

Part of the local factory

When to use

When NOT to use

Quick start

Filter / scope: prd:draft is NEVER picked up

US-0: Interactive config grilling

US-1: Planner

US-2: Confirm the backlog

US-2a: Close-the-loop AC gate

US-2b: Ranked queue input from /ro:night-shift

US-3: Worker dispatch

US-3b: File-area conflict detection (hybrid static + scout)

US-3c2: Live claims registry + shared learnings (every wave)

US-3c: Canonical lifecycle transitions

US-3d: Research-worker mode (kind:research)

US-8: --github flag

US-9: --judge-agent

US-10: --afk

Always fire completion-report + Pushover at end (load-bearing default)

US-11: --workers N

US-12: --resume

US-13: Live status + logs

US-14: Postmortem on exit

Per-repo config (.swarm.json)

File layout (in the user's repo)

Anti-patterns

Lessons from live runs

1. Pre-assign migration numbers when N workers all touch the DB

Filter / scope: `prd:draft` is NEVER picked up

US-3d: Research-worker mode (`kind:research`)

Filter / scope: `prd:draft` is NEVER picked up

US-3d: Research-worker mode (`kind:research`)