skills/golem-powers/_archive/critique-waves/SKILL.md
Use when needing multi-agent verification of complex work. Runs parallel critique agents until consensus. Covers verification, consensus, multi-agent review, validate work. NOT for: simple code reviews (use coderabbit), single-reviewer tasks.
npx skillsauth add etanhey/golems critique-wavesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run iterative waves of verification agents until consensus is achieved. Useful for verifying PRs, code changes, or any work that needs multi-agent validation.
| What you want to do | Workflow | |---------------------|----------| | Set up verification folder and tracker | workflows/setup.md | | Run a wave of parallel agents | workflows/run-wave.md | | Handle failures and iterate to consensus | workflows/iteration.md |
Execute directly - they handle errors and edge cases:
| Script | Purpose | Usage |
|--------|---------|-------|
| scripts/init-tracker.sh | Initialize verification folder with templates | bash ~/.claude/commands/critique-waves/scripts/init-tracker.sh <branch-name> [goal] |
Critique Waves uses multiple parallel agents to verify code changes. Each agent independently checks the same files against FORBIDDEN/REQUIRED patterns. By requiring multiple consecutive passes from all agents, we achieve high-confidence verification.
┌─────────────────┐
│ Setup Phase │
│ - Create folder│
│ - instructions │
│ - tracker.md │
└────────┬────────┘
│
┌────────▼────────┐
┌─────│ Launch Wave │─────┐
│ │ (3 agents max) │ │
│ └────────┬────────┘ │
┌───▼───┐ ┌─────▼─────┐ ┌───▼───┐
│Agent 1│ │ Agent 2 │ │Agent 3│
└───┬───┘ └─────┬─────┘ └───┬───┘
│ │ │
└──────────────┼─────────────┘
│
┌────────▼────────┐
│ Update Tracker │
│ - Log results │
│ - Check passes │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Any FAIL │ │ All PASS │ │ Goal Met │
│ Reset=0 │ │ Increment │ │ DONE! │
│ Fix issue │ │ passes │ │ │
└─────┬─────┘ └─────┬─────┘ └───────────┘
│ │
└──────────────┘
Loop
Setting up for a new verification?
scripts/init-tracker.shReady to run agents?
Agent returned a FAIL?
docs.local/<BRANCH>/Setting up docs.local/feature-branch/...
Created instructions.md and tracker.md
Wave 1: Launching 3 agents...
Results: Agent 1 PASS, Agent 2 FAIL (found forbidden pattern), Agent 3 PASS
Consecutive: 0 (reset due to failure)
Fixing issue found by Agent 2...
Wave 2: Launching 3 agents...
Results: All 3 PASS
Consecutive: 3
Wave 3: Launching 3 agents...
Results: All 3 PASS
Consecutive: 6
...
Wave 7: Launching 3 agents...
Results: All 3 PASS
Consecutive: 21
GOAL ACHIEVED: 21 consecutive passes (exceeded 20 goal)
tools
The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).
tools
macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.
development
Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.
development
Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).