skills/review/SKILL.md
Use after all tasks in an epic complete, after refactoring verifies, or before merging to main. Triggers when independent validation is needed that code meets requirements, has no security gaps, passes quality standards, and has no performance regressions. User phrases like "review this", "is this ready to merge", "validate the implementation".
npx skillsauth add joshsymonds/gambit reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Dispatch four specialized reviewer agents to independently audit completed work, then dispatch a dedicated verifier sub-agent to kill-or-keep each finding, then synthesize surviving findings into a gate decision. Reviewers and verifier run in fresh context — they haven't seen the implementation process and have no sunk cost bias.
The verification work is delegated to a dedicated verifier sub-agent, not done in the main context. Main context's job is dispatch + assembly; the verifier's job is ruthless kill-or-keep classification. This split follows Anthropic's CitationAgent pattern and avoids the synthesizer becoming context-starved from juggling four simultaneous roles (dispatch, verify, dedup, implement).
Works in two modes:
Core principle: The implementing context is the worst reviewer of its own work. Delegate review to fresh agents.
Announce at start: "I'm using gambit:review to validate this implementation before finishing."
LOW FREEDOM — Dispatch all four reviewers. Synthesize all findings. No approval if any reviewer finds gaps. No skipping reviewers for "simple" changes.
| Step | Action | STOP If |
|------|--------|---------|
| 1. Detect Context | Epic Task or workflow Task | Can't find any Task |
| 2. Load Context | Task + changed files list | Can't load task |
| 3. Prepare Brief | Requirements/goal + file list + base branch | Brief incomplete |
| 4. Dispatch Reviewers | 4 agents in parallel, each reading its own instructions by path | Any agent fails to run |
| 5. Dedupe Candidates | Merge reviewer outputs; dedupe on byte-identical (file, line, verify_by) tuples | — |
| 6. Dispatch Verifier | 1 verifier sub-agent with the deduped candidate list | Verifier fails to run |
| 7. Assemble Findings | Route verifier verdicts: confirmed → kept, gap → surfaced, refuted → dropped | — |
| 8. Implement Improvements | Implement ALL confirmed reviewer improvements | Tests fail after changes |
| 9. Gate | APPROVED or GAPS FOUND with verification counts | Confirmed gaps → fix tasks, STOP |
gambit:executing-plans Step 5)gambit:refactoring completes changes (mandatory)gambit:finishing-branchDon't use when:
gambit:executing-plansDetermine what you're reviewing against:
Epic context (default when epic exists):
TaskList → find epic Task (subject starts with "Epic:")
TaskGet → epic (requirements, success criteria, anti-patterns)
TaskList → all subtasks (verify all completed)
Task context (refactoring or standalone work):
TaskList → find the workflow Task (most recent in-progress or just-completed Task)
TaskGet → task (goal, implementation steps, success criteria)
The review brief adapts based on which context is detected. If both exist (e.g., a refactor during an epic), prefer the epic context.
For epic context:
TaskGet → epic (requirements, success criteria, anti-patterns)
TaskList → all subtasks (verify all completed)
For task context:
TaskGet → workflow task (goal, success criteria)
Both contexts:
git diff main...HEAD --name-only # Changed files
git diff main...HEAD --stat # Change summary
Build a brief that each reviewer agent will receive. Include:
For epic context:
--name-only outputFor task context:
--name-only outputDo NOT include your opinions, implementation notes, or rationale. The reviewers should form their own conclusions from the code.
Resolve the absolute path to this skill's reviewers/ directory once (Glob **/skills/review/reviewers/conformance.md if you don't already know it). You pass this path to the agents — do NOT read the reviewer files into this context. The four reviewer files are ~8k tokens; reading them here and re-emitting them as prompts wastes ~18k tokens every review. Each agent reads its own instruction file in its own fresh context.
In ONE message, emit exactly four general-purpose Agent calls. Each prompt is just: (1) a directive to read and follow that agent's instruction file by path, then (2) the review brief.
Agent subagent_type="general-purpose" description="Conformance review" prompt="Read <abs>/reviewers/conformance.md — that file is your complete instructions; your FIRST action must be to Read it, then follow it exactly.\n\n## Review Brief\n\n[brief]"
Agent subagent_type="general-purpose" description="Security review" prompt="Read <abs>/reviewers/security.md — that file is your complete instructions; your FIRST action must be to Read it, then follow it exactly.\n\n## Review Brief\n\n[brief]"
Agent subagent_type="general-purpose" description="Quality review" prompt="Read <abs>/reviewers/quality.md — that file is your complete instructions; your FIRST action must be to Read it, then follow it exactly.\n\n## Review Brief\n\n[brief]"
Agent subagent_type="general-purpose" description="Performance review" prompt="Read <abs>/reviewers/performance.md — that file is your complete instructions; your FIRST action must be to Read it, then follow it exactly.\n\n## Review Brief\n\n[brief]"
Parallelism is structural, not a reminder. That single message contains four Agent calls and nothing else — no Read calls, no prose between them. Reading one reviewer file before each dispatch is exactly what forces the agents sequential; passing paths removes the read step, so there's nothing left to interleave. If you catch yourself using Read on a reviewer file, you've reverted to the old serializing pattern — stop and dispatch by path.
Each reviewer will:
**Verify by:** line to every Gap and Improvement (required — see each reviewer file's "Verification Requirement" section)Critical: Reviewers are strictly advisory. They must NOT run tests, execute commands, or edit files. All tests are already passing by the time review runs — their job is code analysis only. They DO have access to WebFetch and WebSearch and should use them to validate edge cases, check API documentation, verify security patterns, or confirm language-specific behavior when they aren't confident from code reading alone.
Collect the four reviewer reports into one candidate list. Each finding carries a **Verify by:** line; assign each finding an opaque id (any stable string — reviewer name + sequence works).
Dedupe on byte-identical (path, line_range, Verify by:) tuples only. Do NOT dedupe on semantic similarity.
Semantic dedup ("these two findings sound alike, collapse them") silently drops true positives — different reviewers flagging the same line with different verify_by steps have different investigation paths, and losing one loses coverage. Only collapse when all three fields match byte-for-byte. The verifier handles near-duplicates downstream.
Before dispatching to the verifier, build a side-table keyed by id recording each finding's category (gap or improvement), verify_by (original reviewer text), and reviewer (which of the four emitted it). The verifier never sees this side-table — it's the main context's private state. You need it back in Step 7 to route confirmed verdicts to the Gaps vs. Improvements sections, in Step 8 to identify which confirmed items are implementable improvements, and in Step 9 to render Verify by text in the GAPS FOUND template. Losing this mapping breaks all three steps.
The deduped list goes to the verifier in Step 6.
Dispatch ONE general-purpose agent. As with the reviewers, pass the path — do NOT read verifier.md into this context. The candidate list IS passed inline (it's dynamic):
Agent subagent_type="general-purpose" description="Verify candidates" prompt="Read <abs>/reviewers/verifier.md — that file is your complete instructions; your FIRST action must be to Read it, then follow it exactly.\n\n## Candidate Findings\n\n[deduped list with ids]"
Do NOT include reviewer severity, category (Gap vs. Improvement), or reasoning chain in the candidate list. The verifier receives only: id, path, line_range, body, verify_by. Fresh context prevents the verifier anchoring on the reviewer's confidence. The stripped category, original verify_by, and reviewer are retained by main context in the Step 5 side-table — they're restored via id lookup after the verifier returns.
Do NOT verify findings in the main context. Main context's job is dispatch + assembly. The verifier is the single source of truth for classification.
Skip the verifier dispatch only if the candidate list is empty — in which case all reviewers returned APPROVED, and the overall verdict is APPROVED with zero findings.
The verifier returns one classification per candidate, each with verdict, quoted_evidence, evidence_location, tool_calls_made, confidence, and (for gaps) gap_reason.
Route by verdict, using the Step 5 side-table to recover each finding's original category:
quoted_evidence / evidence_location. Place the finding in the "Gaps" section if the side-table's category is gap, or the "Improvements to Implement" section if improvement.gap_reason verbatim.Present the unified summary:
## Review: [Epic/Task Name]
### Verification Summary
- Candidates sent to verifier: [N]
- Confirmed (kept): [X]
- Refuted (dropped): [Y]
- Gaps (surfaced, not blocking): [Z]
(If [Y] is consistently ~0 across reviews, the verifier is rubber-stamping — reconsider whether it earns its slot. If [Y] is meaningful and the counter-evidence below holds up, it's filtering real noise.)
### Conformance / Security / Quality / Performance Review
[Per-reviewer sections with only CONFIRMED findings, each carrying evidence]
### Gaps (confirmed blocking issues, if any)
[Consolidated blocking findings]
### Improvements to Implement (confirmed)
[Consolidated non-blocking improvements — one entry per unique issue, credit all reviewers who flagged it]
### 🔍 Couldn't verify (for your awareness, not blocking)
- [Area the reviewer wanted to check] — [verifier's specific gap_reason: tool + error, missing credential, inaccessible system]
### Refuted (dropped — audit trail, not blocking)
- `path:line` — [reviewer's one-line claim] → refuted: [verifier's quoted counter-evidence]
### Overall Verdict: APPROVED / GAPS FOUND
Conflict resolution is gone — the verifier already picked. Do not override verifier verdicts. If a refuted finding looks correct to you, that's a calibration signal about reviewer or verifier prompts, not a reason to un-drop.
Gap vs. dropped: Gaps are literal walls the verifier ran into — tool failures, missing credentials, inaccessible systems. Not "evidence was ambiguous" (that's refuted). If the gap section is populated with prose like "couldn't confirm" or "plausible but hard to verify," those entries are badly-classified and should be refuted instead — treat them as calibration signal and drop.
Collect ALL items categorized as Improvements that the verifier returned confirmed for. These are non-blocking findings that reviewers determined should be fixed before merge, and that the verifier independently proved hold. Gap-classified findings are surfaced to the user but not auto-implemented — the user decides whether to investigate.
You MUST implement every improvement. Do not list them and move on. Do not defer them to a follow-up. Do not describe them as "non-blocking suggestions" and skip them. Reviewer improvements are work items, not commentary.
For each improvement:
After implementing all improvements, run the project's test suite to verify nothing broke.
The only valid reason to skip an improvement is if the reviewer fundamentally misunderstood the code — their suggestion is incoherent or would break correctness. If skipping, you MUST document:
"Low priority," "not blocking," or "can be done later" are NOT valid reasons to skip.
The gate evaluates confirmed findings only. Refuted findings are dropped; gap-classified findings are surfaced to the user but don't block.
If APPROVED (no confirmed gaps remain and all confirmed improvements are implemented):
Announce: "Review passed. [N] candidates verified — [X] confirmed, [Y] refuted, [Z] gaps. All confirmed improvements implemented. Tests green (ran in Step 8 after improvements, or already green since review began and no code changed). Proceeding to finishing-branch."
Invoke gambit:finishing-branch directly via Skill tool. Because tests are known-green at this handoff, finishing-branch will skip its own test run (its Step 2). State this explicitly in your handoff announcement so finishing-branch knows tests were just verified.
If GAPS FOUND (any confirmed gap remains):
## Gaps Found — Cannot Proceed
### Verification Summary
[X] confirmed / [Y] refuted / [Z] gaps
### Issues by Reviewer (confirmed only)
[Consolidated list with the verifier's `quoted_evidence` and `evidence_location`, plus the reviewer's original `Verify by` text recovered from the Step 5 side-table, attributed to the `reviewer` recorded there]
### Recommended Fix Tasks
- [Concrete task description for each confirmed gap]
### 🔍 Couldn't verify (for your awareness, not blocking)
- [Area] — [verifier's specific gap_reason]
### Refuted (dropped — audit trail, not blocking)
- `path:line` — [reviewer's one-line claim] → refuted: [verifier's quoted counter-evidence]
Create fix tasks with TaskCreate for each confirmed gap. Set dependencies. Then STOP — return to gambit:executing-plans to implement fixes (for epic context) or address fixes directly (for task context).
Do NOT proceed to finishing-branch with confirmed gaps. Do NOT override verifier verdicts. Do NOT proceed with unimplemented confirmed improvements. Do NOT create fix tasks for refuted findings — they were disproved for a reason. Do NOT create fix tasks for gap-classified findings — a gap means the verifier hit a literal wall, not that a bug exists.
# Stage 1: Resolve reviewers/ path once (no file reads). ONE message, four Agent calls, parallel:
Agent general-purpose: "Read <abs>/conformance.md + follow it" + [brief] (parallel)
Agent general-purpose: "Read <abs>/security.md + follow it" + [brief] (parallel)
Agent general-purpose: "Read <abs>/quality.md + follow it" + [brief] (parallel)
Agent general-purpose: "Read <abs>/performance.md + follow it" + [brief] (parallel)
# All four read their own instructions in their own context and return findings.
# Dedupe on byte-identical tuples.
# Stage 2: ONE more Agent call, sequential — verifier reads its own file too:
Agent general-purpose: "Read <abs>/verifier.md + follow it" + [deduped candidate list]
# Verifier returns confirmed/refuted/gap per candidate. Assemble final verdict.
# Context detection: no epic, found refactoring Task #5
# TaskGet #5 → goal: "Extract connection-pool setup into a reusable factory"
# Brief includes task goal + success criteria + changed files
# All four reviewers dispatched with task-level brief
# Conformance checks the change addresses the stated goal
# Security checks no new vulnerabilities introduced
# Quality checks the code quality of the change
# Performance checks the change doesn't regress performance
# WRONG: Only dispatching two reviewers
"Security isn't relevant for this config change"
# WRONG: Reviewing or verifying in main context instead of dispatching
"Let me just quickly check the architecture myself..."
"I'll verify these findings inline instead of spawning another agent..."
# WRONG: Skipping the verifier
"The findings look good, let's just implement them"
# WRONG: Overriding a verifier verdict
"The verifier refuted this but I think it's real"
# WRONG: Feeding the verifier reviewer severity/reasoning
Agent general-purpose: "[verifier.md] + [full reviewer reports with severity tags]"
# Correct: strip to id/path/line/body/verify_by only
# WRONG: Semantic dedup before the verifier
"Findings 2 and 5 look similar, collapse them"
# Correct: byte-identical (path, line, verify_by) tuples only
# WRONG: Creating fix tasks for gap-classified findings
"The gap says we couldn't check LaunchDarkly — add a fix task to check it"
# Correct: gap is a tool-access boundary, not a bug
# WRONG: Pasting reviewer file contents into the prompt (wastes ~18k tokens/review)
Agent general-purpose: "[full conformance.md contents] + brief"
# Correct: pass the path — "Read <abs>/conformance.md and follow it" + brief
# WRONG: Inventing instructions from memory instead of pointing at the file
"I'll just tell the agent to check for security issues..."
# Correct: the agent reads the real reviewer file by path
# WRONG: Reading a reviewer file before each dispatch (serializes the agents)
Read conformance.md → Agent → Read security.md → Agent → ...
# Correct: no reads; four Agent calls in one message, each pointed at its file
# WRONG: Listing improvements without implementing them
"Non-blocking suggestions: consider extracting a helper..."
"None of these block the commit. Ready to commit."
# WRONG: Deferring improvements to follow-up work
"These are good ideas for a future PR."
# WRONG: Skipping review for "small" debugging fixes
"This was just a one-line fix, no need for four reviewers"
id, path, line_range, body, verify_by; fresh context prevents anchoringid → {category, verify_by, reviewer} side-table before dispatch; losing it breaks Steps 7–9 routingCommon rationalizations:
| Excuse | Reality | |--------|---------| | "I already reviewed during implementation" | You're biased — that's why agents exist | | "Security isn't relevant here" | Every project has an attack surface | | "Performance review is overkill" | Dispatch it anyway — it's parallel, costs nothing | | "The reviewer is being too strict" | The verifier handles this — trust the verdict | | "I can review faster myself" | Speed isn't the goal — unbiased review is | | "These are non-blocking suggestions" | Improvements are work items — implement them | | "Good ideas for a future PR" | No. Implement now, before this merge | | "None of these block the commit" | Improvements don't block the verdict, but they block the merge | | "It's just a small debugging fix" | Small fixes can introduce regressions. Review anyway | | "The verifier refuted this but I think it's real" | Refuted is refuted. If you disagree, that's a prompt-calibration signal, not a gate-override | | "This gap looks like a real bug to me" | A gap means the verifier hit a literal wall, not that a bug exists. If you suspect a bug, re-run review after fixing the tool-access issue the verifier cited | | "I'll just skip the verifier for this small change" | The verifier is where the quality comes from. Main context cannot verify without anchoring |
Verify by: on every finding(path, line, verify_by) tuplesid built before verifier dispatch (category + verify_by + reviewer)Called by:
gambit:executing-plans (Step 5, when all tasks complete)gambit:refactoring (mandatory, after final verification passes)/gambit:reviewCalls:
gambit:finishing-branch (if approved)Dispatches general-purpose agents (parallel, read-only). Each agent reads its own instruction file by path — main context never loads them:
reviewers/conformance.md — completeness, architecture, dead codereviewers/security.md — OWASP audit, secrets, auth, data exposurereviewers/quality.md — language idioms, linter circumvention, test qualityreviewers/performance.md — scaling, N+1, resource managementDispatches one verification agent (sequential, after reviewers, read-only), also by path:
reviewers/verifier.md — kill-or-keep each candidate finding with evidence, three-verdict enum (confirmed/refuted/gap)Call chain (epic context):
executing-plans (all tasks done) → review → finishing-branch
↓
(if gaps: STOP → fix → re-review)
Call chain (task context):
refactoring (changes verified) → review → finishing-branch
↓
(if gaps: STOP → fix → re-review)
testing
Use when creating a new skill, modifying an existing skill, writing or rewriting a SKILL.md file, auditing a skill's description for discoverability, or when user mentions "create a skill", "write a skill", "new skill", "modify skill", "improve skill", "edit the skill".
development
Use before any completion claim, success statement, or marking a task done. Triggers when about to say "Great!", "Perfect!", "Done", "All set", "Ready to commit", before creating a PR, before moving to the next task, or when code has changed since the last test run.
data-ai
Use when starting an isolated feature branch, when working on multiple features simultaneously, when experimenting without affecting the main workspace, or when a clean workspace is needed before beginning implementation. User phrases like "start a new branch", "set up a worktree", "isolated workspace", "work on feature X separately".
development
Use at the start of every session before any response or action. Also invoke whenever uncertain which gambit skill applies, when about to implement / debug / refactor / test / plan / brainstorm, or when a user request could match any gambit skill even at 1% probability.