skills/golem-powers/pr-loop/SKILL.md
The complete PR loop — branch, implement, test, commit, push, PR, WAIT FOR REVIEW, fix, merge, cleanup. Includes PR creation and review comment fetching. Use whenever creating a PR or finishing work. This is NOT optional. Every change goes through this loop. No exceptions.
npx skillsauth add etanhey/golems pr-loopInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The full loop. Not "create PR." Not "push and move on." The FULL loop through MERGED.
MISSION = MERGED
Not "tests pass." Not "PR created." Not "pushed."
Done = PR merged + branch deleted + main pulled.
If you are running autonomously (no human in the loop), these rules are mandatory:
When a dispatch brief says LEAD owns merge (or "worker endpoint = PR + review responses"):
MISSION = MERGED and merge locally unless the brief explicitly
grants merge authority to this worker.Evidence: two independent re-derivations of the conflict (vlW7#5 "brief explicitly says no merge"; kg-harvest#2).
Draft PRs silently skip bot reviews (CodeRabbit skipped a PR because it was draft — phoenix-revival#3).
WRONG: gh pr create --draft … then wait for @coderabbitai
RIGHT: Create ready-for-review PRs; if draft was used, `gh pr ready <N>` BEFORE
invoking reviewers. Never merge while still draft.
"Review without merge" means mark ready-for-review and run the review loop — NOT leave the PR in draft state.
LEAD (or any merger) must never merge a worker's PR mid-review-fix:
# BEFORE gh pr merge — confirm head matches worker's latest push
gh pr view <N> --json headRefOid,commits,reviewDecision
Checklist:
headRefOid matches the worker's latest pushed commit (not an earlier SHA
mid-fix)Evidence: #257 merged at 61d6f50 before fix 750efdd landed (recovered via
#260); #256 same evening — stranded fixes.
Merged ≠ serving. Deploy/live claims require a same-moment live probe of the running process — not merge SHA alone.
| Claim | Required probe (same turn) |
|---|---|
| "deployed" / "live" / "restarted" | Running PID + binary/commit of the served process (ps, launchctl list, health endpoint) |
| "page server updated" | Hit the served URL/asset AFTER rebuild/restart — not the build directory |
Expands Post-Merge Verification below. Evidence: "deployed #250" collab claim was wrong because the atomic app rebuild never restarted the page server (77631d2e#3); orchestrator 62517efa:1105.
Artifacts cited in PR bodies, review threads, or merge verification cannot live
only in gitignored docs.local/ — reviewers and CI cannot fetch them.
WRONG: PR body links orchestrator/docs.local/plans/foo.md (gitignored)
RIGHT: Commit the artifact to a tracked path, paste the excerpt inline, or link
a committed copy; docs.local is local cache only when a committed record exists
Evidence: phoenix-revival#7.
On EtanHey repos (single GitHub account), agent PR verdicts go as PR comments
— inline review comments or gh pr comment — NOT formal self-review
REQUEST_CHANGES (GitHub blocks self-approve/self-request-changes).
WRONG: gh pr review --request-changes on your own PR
RIGHT: Post structured verdict as a PR comment; merge authority follows Merge
Authority below after a clean bot-reviewed loop
Evidence: codexE08#5. Complements the existing Merge Authority single-account note.
1. BRANCH git checkout main && git pull && git checkout -b feat/name
2. IMPLEMENT Write code (invoke /superpowers:test-driven-development)
3. TEST Run full test suite — ALL must pass
4. VERIFY Invoke /superpowers:verification-before-completion
↳ DAEMON GATE: If this PR touches daemon/socket/MCP code,
you MUST test with a real client session before proceeding.
See "Daemon Verification Gate" below.
5. COMMIT git add <specific files> → CodeRabbit pre-commit review → commit
↳ Codex env: run `coderabbit review --agent` with a ~3 minute
hard timeout BEFORE committing. If the local CLI hangs or hits
rate/review limits, stop it, record the limitation, and commit
on fresh test evidence. After the PR exists, require PR-level
bot status/comments before merge.
If CRITICAL issues found → fix first. If minor → proceed.
Ralph mode: `--story=ID --message=MSG` for atomic commit + criterion.
6. PUSH git push -u origin feat/name
7. PR Create PR (see "Creating the PR" below)
8. REVIEW Fetch + read review comments (see "Reading Reviews" below)
9. FIX Address real bugs from review
10. MERGE gh pr merge <N> --squash --delete-branch
11. CLEANUP git checkout main && git pull
git status is clean (no uncommitted changes, no staged or unstaged work-in-progress)keeping because X, orshould be gitignored, orcommitting nowIf the assigned worktree differs from the session cwd, every apply_patch filename must be an absolute path inside the worktree. exec_command.workdir does not apply to apply_patch; relative patch paths resolve against the session cwd and can mutate the main checkout silently. If a patch lands unexpectedly, check git status in both the main checkout and the assigned worktree before continuing.
gh CLI installed (brew install gh)gh auth login# Push branch first
git push -u origin HEAD
# Create PR with structured body
gh pr create --title "feat: description" --body-file - <<'EOF'
## Summary
- What changed and why
## Test plan
- [ ] Tests pass
- [ ] Manual verification done
Co-Authored-By: Claude Opus 4.6 <[email protected]>
EOF
If the PR body or comment contains backticks, use --body-file (or stdin).
Shell command substitution will mangle inline code and can silently rewrite the
evidence you meant to preserve.
gh pr view to check, don't create duplicate.gh pr create --base devThis is NOT optional. This is NOT "auto-merge."
# Check BEFORE merging — empty reviewDecision + <2 comments = nobody looked
gh pr view <N> --json reviewDecision,comments
Review Wait Timer (ENFORCED):
CLEAN status with no reviews ≠ approved. It means NOBODY LOOKED.
| Bot | Expected time | Notes | |-----|--------------|-------| | CodeRabbit | 2-5 min | Auto-reviews on push | | Greptile | Needs OSS approval | Manual activation | | Macroscope | Needs activation | Auto-reviews once installed |
If reviewDecision is empty and comments < 2 → DO NOT MERGE.
Always explicitly request reviews. Don't wait for auto-detection.
# Invoke all available reviewers (do this right after PR creation)
gh pr comment <N> --body "@coderabbitai review"
gh pr comment <N> --body "@greptileai review"
gh pr comment <N> --body "@codex review"
gh pr comment <N> --body "@cursor @bugbot review"
Reviewer roster reality can degrade. If Greptile is unavailable, Cursor/Bugbot
is billing-blocked, or CodeRabbit is rate-limited, do not burn dead mentions.
Use Codex + Macroscope + cr review --plain before commit, and document which
reviewers were unavailable in the PR.
# Option A: Use coderabbit:code-reviewer subagent
Agent(subagent_type="coderabbit:code-reviewer", prompt="Review PR #N")
# Option B: Use cr CLI
coderabbit review --agent # Codex env, bounded to ~3 minutes
# Poll for reviews (preferred: /loop 2m, or CronCreate */2, or manual sleep 90)
gh pr view <N> --comments
Fetch comments from all review sources with full context:
# Quick view of all comments
gh pr view <N> --comments
# Detailed: get review comments with diff context
gh api repos/{owner}/{repo}/pulls/{N}/comments
To reply to an inline review thread, use the replies endpoint:
gh api --method POST \
repos/{owner}/{repo}/pulls/{N}/comments/{comment_id}/replies \
-f body='Fixed in commit abc123.'
comment_id is the numeric top-level review comment ID, not a node ID. Replies
are one level deep; you cannot reply to an existing reply. The bare comments
collection path does not reply to a thread; use /replies or the documented
in_reply_to form.
Review sources (coverage stack):
| Source | Type | How to Trigger / Check |
|--------|------|----------------------|
| CodeRabbit | AI review + auto-summaries | Auto on PR. Also: CodeRabbit plugin or coderabbit review --agent in Codex env (cr review --plain for human terminal use) |
| Codex Cloud | AI code review | gh pr comment <N> --body "@codex review" or comment manually on GitHub. Auto-reviews if enabled in Codex settings. Reads AGENTS.md "Review guidelines". Flags P0/P1 by default. |
| Cursor Bugbot | Bug detection | gh pr comment <N> --body "@cursor @bugbot review" or comment manually on GitHub. For re-review after fixes: gh pr comment <N> --body "@cursor @bugbot re-review". Bot responds as cursor[bot]. |
| Greptile | AI review + codebase understanding | Comment @greptileai review. Needs OSS activation. |
| DeepSource | Static analysis | Check via CI status |
After fixing review feedback, trigger re-review on every reviewer:
gh pr comment <N> --body "@coderabbitai review"
gh pr comment <N> --body "@codex review"
gh pr comment <N> --body "@cursor @bugbot re-review"
Codex Cloud is enabled on: EtanHey/voicelayer, EtanHey/orchestrator, EtanHey/golems, EtanHey/brainlayer.
Default stance = "let me investigate" — NOT "this is intentional."
The worst PR loop failure mode: auto-dismissing a reviewer suggestion with "intentional per design doc" without checking if the reviewer found a real gap the design doc missed.
WRONG:
CodeRabbit: "Missing orphan reparenting when a node is deleted"
You: "@coderabbitai This is intentional per phase5-v2-synthesis.md. Please learn this."
Later: Realize CodeRabbit was right. Design was wrong. You taught it a bad Learning.
RIGHT:
CodeRabbit: "Missing orphan reparenting when a node is deleted"
You: Read the design doc. Does it explicitly address THIS tradeoff?
→ Yes, with clear reasoning → Push back with the specific passage.
→ No, or vague → Treat as potential gap. Investigate before closing.
The investigation protocol for any "conflicts with design" suggestion:
Teaching a reviewer a bad Learning is worse than not teaching it anything. A false Learning suppresses future flags on a real bug category. Audit any Learning you've set if the underlying assumption turned out to be wrong:
# CodeRabbit: flag a Learning for correction
@coderabbitai I need to correct a previous learning. [Pattern X] does actually require
[handling Y] — our earlier design was incomplete. Please update your understanding:
[correct explanation].
Every CRITICAL or HIGH review comment MUST receive an explicit reply. Not fixing is acceptable. Not replying is NOT.
WRONG: PR #84 had 5 CRITICAL/HIGH CodeRabbit findings. Zero replies. Merged.
→ One of those findings was the root cause of BrainBar socket death.
RIGHT: Every CRITICAL/HIGH comment gets one of:
1. "Fixed in commit abc123" (fix)
2. "Won't fix because X — [specific technical reason]" (acknowledged)
3. "Investigating — will address in follow-up PR #N" (deferred with ticket)
Pre-merge checklist (verify before gh pr merge):
1. READ: Complete feedback without reacting
2. UNDERSTAND: Restate requirement in own words (or ask)
3. VERIFY: Check against codebase reality
4. EVALUATE: Technically sound for THIS codebase?
5. RESPOND: Technical acknowledgment or reasoned pushback
6. IMPLEMENT: One item at a time, test each
Forbidden responses: NEVER say "You're absolutely right!", "Great point!", "Thanks for catching that!" — instead restate the technical requirement, ask clarifying questions, or just fix it silently.
Implementation order (multi-item feedback):
Max 3 review-fix rounds — skip persistent nitpicks after that.
| Type | Action | |------|--------| | CRITICAL (data loss, crash, security) | FIX. Blocks merge. Must reply. | | MAJOR (real bug) | FIX immediately. Push fix. Re-review. Must reply. | | TRIVIAL (style, nitpick) | Fix if genuinely better. Skip if bikeshed. | | CONFLICTS WITH DESIGN | INVESTIGATE first. Only dismiss with explicit doc evidence. |
When in doubt, the reviewer might be right. Push back only with explicit evidence.
Every PR is an opportunity to make reviewers smarter. Use the right format for each.
| Reviewer | How it learns | Reply format | What persists |
|----------|--------------|--------------|--------------|
| CodeRabbit | @coderabbitai replies → explicit Learnings | @coderabbitai [explain design]. See [doc]. Please learn this for future reviews. | Permanent Learning applied to ALL future reviews on this repo |
| Greptile | Observes all reply patterns passively | Any plain reply explaining the design decision | Updates internal preference model — stops flagging dismissed patterns |
| Macroscope | macroscope.md file in repo root (no reply-learning) | Add rule to macroscope.md file | Persists as a repo-level rule, referenced in every future review |
CodeRabbit — reply with @coderabbitai [explain design]. Please learn this for future reviews. Never just "I'll leave this as-is" — that teaches nothing.
Greptile — reply naturally with design context. It passively learns from your replies.
Macroscope — add rules to macroscope.md in repo root. No reply-learning.
Rule: Always reply with context. The reply compounds knowledge across every future PR.
Round 1: Push fixes → request re-review from all bots
Round 2: Read re-review → fix any new issues found
Round 3+: Only if new issues surfaced. Max 3 rounds for nitpicks.
CodeRabbit auto-re-reviews on new pushes. For others, comment @bot re-review explicitly.
If reviewer finds new issues in round 2 → fix and go to round 3. Never merge with open issues.
Never put real client data in public PRs:
+1-555-0123, client-abc-123, Group: Example CoSanitize in PR description, code comments, test fixtures, and commit messages.
git add <files> && git commit -m "fix: address review feedback"
git push
# Wait for re-review — CodeRabbit auto-triggers, others need manual @mention
# Verify reviews are actually in before merging
gh pr view <N> --json reviewDecision,comments
gh pr merge <N> --squash --delete-branch
git checkout main && git pull
If a child PR is based on the branch you are merging, deleting the base branch
can auto-close the child. Closed PRs cannot be retargeted (Cannot change the base branch of a closed pull request; #456 needed replacement #462).
Safe order: merge base without --delete-branch, then immediately retarget
children. If squash artifacts conflict, rebase children with
git rebase --onto origin/<target-branch> origin/<base-branch> while the base
exists. Delete the base branch only after children are retargeted, rebased if
needed, and mergeable. For golems/master, <target-branch> is master.
After every merge, verify the merge commit contains the changes from your latest
pushed SHA. For squash merges, this is content/tree verification, not ancestor
containment. Mid-review external merges can strand fixes (#475 merged without
d6a292a; recovered by cherry-pick #476).
Merged is not deployed. Rebuild dist, restart consumers, and live re-test
before claiming a fix is live. Evidence: "Docking, fixed, and merged. Are you
sure? ... Today's evidence argues against it." (orchestrator 62517efa:1105).
Etan, verbatim: "you can fucking merge… that's why we have a PR loop."
The PR loop IS the gate. Once the loop is clean — reviewers invoked, every CRITICAL/MAJOR/HIGH comment fixed-or-replied, CI green, self-QA gates passed (daemon gate + visual self-QA below where they trigger) — merge. Do NOT bolt a heavyweight human diff-review gate on top of a clean loop; that gate was dropped deliberately.
reviewDecision can never reach APPROVED on
EtanHey repos (self-approve is structurally blocked). A clean loop with bot
reviews replied-to is the approval. Where branch protection blocks the merge,
--admin after a clean loop is the standing policy — not a violation.Every merged PR MUST update its tracking. No exceptions.
~/Gits/orchestrator/roadmap/README.mdbrain_store what changed and why (tagged pr-merged, <project>)WRONG: Merge PR, exit silently ← Tracking drift!
WRONG: "I'll update the collab later" ← You won't. Do it NOW.
WRONG: Only update one of collab/roadmap/BL ← Update ALL relevant trackers.
If you are an autonomous agent, this step is NON-NEGOTIABLE. The orchestrator should NEVER discover completed work by accident.
[!CAUTION] NON-NEGOTIABLE HARD STOP: IF A PR TOUCHES DAEMON, SOCKET, MCP, OR PROTOCOL CODE, YOU MUST COMPLETE REAL CLIENT RUNTIME VERIFICATION BEFORE MERGE. MARKDOWN INTENT IS NOT ENOUGH.
Triggers when: PR touches daemon, socket, MCP, protocol, VoiceBar runtime, or any socket-based daemon code (BrainBar, VoiceBar, cmux MCP, VoiceLayer MCP daemon, flow-bar/**, launchd/**, src/mcp-server*.ts, src/mcp-socket-owner.ts, src/socket-*.ts, src/daemon*.ts, src/paths.ts, src/process-lock.ts, src/resolve-binary.ts, src/whisper-server.ts).
socat/curl/unit tests are NOT sufficient. Real client test required.
For VoiceLayer:
./scripts/voicelayer-verify.sh BEFORE gh pr merge.verification test, release, and confirm paste fired..verified/verified-runtime-<branch>-<short-sha>.txt exists and contains Verified-Runtime: <sha>.Verified-Runtime: <sha> line to the PR body.If you run gh pr merge --admin without the matching verify artifact, you violated the gate. brain_store the violation immediately as an orc-correction with what happened, why the hook/skill failed, and how to prevent recurrence.
For non-VoiceLayer daemon PRs:
cmux new-split right).Why: Real regressions passed markdown-only gates and shell smoke tests. Persistent clients, app focus, hotkeys, paste behavior, and MCP framing failures only show up in live sessions.
Triggers when: the PR touches anything a human SEES — UI views, dashboards, HTML pages, menu-bar apps (BrainBar/VoiceBar), Phoenix views, TUI surfaces.
The rule: the BUILDER clicks INTO a real running session and screenshots it BEFORE merge. Mechanical checks (PID running, commit-matches, server responds) are NOT a functional pass — "generated" ≠ "verified."
/never-fabricate R7 Verification Receipt to the PR/collab message.LSUIElement
grant dialog is a focus state, not a hard block (fallback: screencapture
CLI + coordinate clicks).WRONG: gh pr create && gh pr merge --auto ← No review!
WRONG: gh pr create && gh pr merge --squash ← Same message, no review!
WRONG: "PR created, done!" ← Mission is MERGED, not created
WRONG: Skip review "because it's a small change" ← Small changes break things too
Not every branch goes through the full PR flow. When implementation is done:
| Option | When to Use | Commands |
|--------|-------------|----------|
| Create PR (default) | Most cases — full review loop | Continue with steps 7-11 above |
| Merge locally | Small team, already reviewed | git checkout main && git merge <branch> && git branch -d <branch> |
| Keep as-is | Need to park work | Just stop. Worktree preserved. |
| Discard | Wrong approach, start over | Requires typed "discard" confirmation. git branch -D <branch> |
Worktree cleanup: For options 1 (after merge), 3 (never), and 4 (after discard):
# Check if in worktree
git worktree list | grep $(git branch --show-current)
# If yes, after merging/discarding:
git worktree remove <worktree-path>
After git worktree add, read the file from the worktree path before editing.
Reads from the primary checkout or another worktree do not carry over.
When operating from a linked worktree, do not merge or cleanup from that
worktree. Move to the original checkout, run gh pr merge, verify the remote
merge SHA, delete the remote branch, then remove the worktree. This avoids the
local failure when main is checked out elsewhere and keeps cleanup from
deleting the session underneath you.
For every NEW file > 50 lines or with non-obvious architecture decisions:
brain_store("New file: {path} ({lines} lines). Purpose: {why}. Key decisions: {list}. Alternatives considered: {list}.",
tags=["component-reasoning", "{repo}", "pr-{N}"], importance=7)
Future sessions query this instead of opening files to understand design choices.
This skill is referenced by:
/large-plan — every phase goes through this loopThis skill references:
/superpowers:test-driven-development — step 2 (implement with TDD)/superpowers:verification-before-completion — step 4 (verify before claiming)/never-fabricate — never claim review is green without reading itNote: the CodeRabbit commit gate and /coderabbit receiving-side logic are inlined in Steps 5 and 8 respectively. pr-loop is self-contained.
# The whole loop in commands:
git checkout main && git pull
git checkout -b feat/my-feature
# ... implement with TDD ...
bun test # or npm test
git add src/changed-file.ts tests/new-test.ts
git commit -m "feat: description
Co-Authored-By: Claude Opus 4.6 <[email protected]>"
git push -u origin feat/my-feature
gh pr create --title "feat: description" --body "## Summary\n..."
# WAIT for review (60-90s for bots, or run coderabbit:code-reviewer)
gh pr view <N> --comments # READ the review
# Fix any real bugs, push again if needed
gh pr merge <N> --squash --delete-branch
git checkout main && git pull
tools
The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).
tools
macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.
development
Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.
development
Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).