skills/golem-powers/_archive/code-review/SKILL.md
Full code review lifecycle: requesting reviews (CodeRabbit, Greptile, Bugbot, GitHub PR comments) and receiving feedback (classify issues, implement fixes, push back on wrong suggestions). Use when: creating a PR review, reading review comments, handling reviewer feedback, fixing review items, or deciding whether to accept or reject a suggestion. NOT for: running tests directly or CI/CD pipeline issues (use relevant repo tools).
npx skillsauth add etanhey/golems code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Technical evaluation, not emotional performance. Verify before implementing. Push back when wrong.
Mandatory: After completing a feature, before merge to main, after each task in multi-step work. Optional: When stuck (fresh perspective), before refactoring, after fixing complex bug.
# 1. Get git SHAs
BASE_SHA=$(git rev-parse HEAD~1) # or origin/main
HEAD_SHA=$(git rev-parse HEAD)
# 2. Use CodeRabbit CLI
cr review --plain
# 3. Or use coderabbit:code-reviewer subagent
Agent(subagent_type="coderabbit:code-reviewer", prompt="Review PR #N")
# 4. Or use Greptile plugin
# list_merge_request_comments, trigger_code_review
1. READ: Complete feedback without reacting
2. UNDERSTAND: Restate requirement in own words (or ask)
3. VERIFY: Check against codebase reality
4. EVALUATE: Technically sound for THIS codebase?
5. RESPOND: Technical acknowledgment or reasoned pushback
6. IMPLEMENT: One item at a time, test each
NEVER say: "You're absolutely right!", "Great point!", "Thanks for catching that!" INSTEAD: Restate the technical requirement, ask clarifying questions, or just fix it silently.
| Type | Action | |------|--------| | Real bug / Security | FIX immediately | | Important improvement | Fix before proceeding | | Style preference | Fix if genuinely better, skip if bikeshed | | Over-engineering | SKIP with reasoning | | False positive | SKIP with reasoning |
Max 3 review-fix rounds — skip persistent nitpicks after that.
Push back when:
How: Use technical reasoning, reference working tests/code, ask specific questions.
If you were wrong: "Checked [X] and you're correct. Fixing." State it factually, move on.
From human partner: Trusted — implement after understanding. Still ask if scope unclear. From external reviewers: Verify against codebase first. Check: technically correct? Breaks things? Works on all platforms? Full context? From bots (CodeRabbit, Greptile, Bugbot): High signal but not infallible. Verify critical findings.
IF reviewer suggests "implementing properly":
grep codebase for actual usage
IF unused: "This endpoint isn't called. Remove it (YAGNI)?"
IF used: Then implement properly
Reply in the comment thread, not as top-level PR comment:
gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies -f body="Fixed in <commit>"
Use for security-sensitive changes, auth code, input handling, data mutations. See workflows/red-team.md and references/red-team-prompt.md.
Use for architecture changes, new features, refactors, UI work. See workflows/blue-team.md and references/blue-team-prompt.md.
Run both: 1 red + 1 blue = comprehensive coverage. Red team catches what will break. Blue team catches what will rot. Deduplicate findings across both reports — red H findings take priority over blue suggestions on the same code.
tools
The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).
tools
macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.
development
Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.
development
Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).