skills/code-review/SKILL.md
Use this skill whenever the user wants a multi-agent review of local changes — triggers include "review my code", "review these changes", "do a code review", or "check my changes before I commit". Writes REVIEW.md. Do NOT use for an open PR by number (use /review) or a security-specific pass (use /security-review).
npx skillsauth add antjanus/skillbox code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Runs five narrow-lane reviewer agents in parallel, then a verifier pass that keeps only evidence-backed findings, re-rates each by real impact, and distills the "fix first" shortlist — merged into REVIEW.md at the repo root, sorted by severity. Core principle: narrow-scope reviewers find more real issues than one broad reviewer covering everything. The skill only scopes, dispatches, and renders — the reviewers and verifier do all the judging.
Model tier: Opus or Sonnet for the agents; Haiku needs more guidance. Not for: security passes (/security-review), open PRs by number (/review), or trivial one-line/doc changes.
detect scope → dispatch 5 reviewers (parallel Task) → verifier (1 Task) → synthesize → REVIEW.md
The five reviewers, each in a tight lane (full prompts in reference/AGENTS.md — load when dispatching):
| Lane | Owns | Out of scope |
|------|------|--------------|
| basics | unused/dead code, stale comments, leftover debug stmts, typos, orphaned new symbols (declared + used in-file but no external call site) | architecture, naming, tests, readability |
| architecture | pattern fit with siblings (must read 3-5 sibling files first) + structural holes: swallowed errors, missing retry/timeout, context not threaded | hygiene, naming, tests, design-health (/simplify) |
| clarity | readability, naming, function length, nesting; same-scope pre-existing issues → [Pre-existing] | unused code, arch, tests, design-health |
| testing | coverage of the change + assertion strength (membership vs exact, weak truthiness, mocked oracles) | test-file organization, prod code quality |
| repo-hygiene | secrets, env-var doc drift, manifest/lockfile alignment, README/CLAUDE.md/AGENTS.md drift (reads manifests, lockfiles, .env.example, docs) | code-level cleanups, arch, tests, readability |
Resolve files to review, in priority order: explicit path arg → --staged (git diff --cached) → --branch <base> (git diff <base>...HEAD) → default (staged + unstaged, git status --porcelain + git diff HEAD) → fallback (on a feature branch with nothing dirty, git diff main...HEAD).
Filter out binary/image assets, generated files, build/dependency/venv output. Do NOT filter lockfiles, package manifests, env templates, or project docs — repo-hygiene needs them (the other lanes skip them by definition; pass the full list to all five).
Before dispatching, you MUST have a concrete file list + the diff text, and report the scope to the user ("Reviewing N files: …"). If scope is empty, stop — don't invent work.
Send all five Task calls in a single message so they run concurrently; use subagent_type: Explore for all (read-only traversal). Each prompt includes the file list + diff, states the lane and what to ignore, specifies the output format, and demands file:line specificity. Use the exact skeletons in reference/AGENTS.md — each agent returns findings as [SEVERITY] path:line blocks or NO FINDINGS.
After all five return, dispatch one verifier (single Task, Explore) with the file list, diff, and merged candidates. Full prompt: reference/AGENTS.md#verifier. It runs two stages, then distills:
Stage 1 — Evidence. Re-read each finding's cited file:line against current code:
[Unverified], demote one tier (Critical→Major→Minor→Nit→drop), add a Verifier note:. Skips Stage 2.Stage 2 — Impact (evidence-holds findings only). Re-rate by real blast radius on this change, independent of the lane reviewer's severity (each saw only its own lane): promote under-rated findings, demote over-rated ones. The re-rated severity is authoritative and replaces the lane reviewer's; a Verifier note: records the reasoning whenever severity moves; the original is not shown. [Pre-existing] findings keep their tag and are never promoted into the blocking buckets — they stay informational.
Distillation — "what to fix first". Select what to address before shipping (every Critical + highest-impact Majors; 3-6 items), returned as an ordered path:line — why it matters list. If only Minor/Nit survive: Nothing blocking — only polish remains.
The verifier returns the distillation block, the kept findings (agents' format, at re-rated severities), and a summary line: Verifier summary: kept N of M; promoted P; demoted K; dropped J (demoted = Stage-1 thin demotions + Stage-2 lowers, combined). The evidence stage mirrors Anthropic's two-stage filter; Datadog reports a 60%→13% false-positive reduction with it.
The verifier already judged evidence, severity, and priority — synthesis only formats what it returned:
## What to fix first at the top (above ## Critical), each item the verifier's one-line path:line — why it matters, in order. If it returned Nothing blocking — only polish remains., render that single line.[Unverified] tag + Verifier note:. Don't show the lane reviewer's original severity.## Nit block, terse one-liners, no cap.REVIEW.md at the repo root (git rev-parse --show-toplevel) with the Write tool, overwriting any existing one. Then print a chat one-liner: REVIEW.md written — X Critical, Y Major, Z Minor, W Nit, P Pre-existing.Always write REVIEW.md even when clean (zeroed header + All five reviewers returned NO FINDINGS. Ship it.). Synthesizer scope is structural, not editorial — no fresh findings, no re-judging severity, no rewordings, no dumping the report into chat. Anything beyond render/parse/group/dedupe/write belongs to the verifier.
REVIEW.md at repo root, section order: header (Generated / Scope / Agents / Verifier: kept N of M; promoted P; demoted K; dropped J / Total findings) → ## What to fix first → ## Critical → ## Major → ## Minor (full blocks: [lane] line — issue, then risk/evidence + fix sub-bullets, grouped by file; re-rated and [Unverified] findings carry a Verifier note:) → ## Nit (grouped by file, no cap) → ## Pre-existing (informational) → Agents that found nothing: footer.
Full worked report (promoted finding, distillation, grouped nits, clean-report form): reference/EXAMPLE-REVIEW.md.
✅ Good: all five Task calls in one message → wait for all → verifier Task → write REVIEW.md → chat shows only REVIEW.md written — 2 Critical, 1 Major, 4 Minor, 3 Nit. A good finding is specific: [Major] src/lib/fetch-user:42 — new timeout error path has no test; risk: silent retry-loop exit loses the request; fix: add "retries once on timeout" test asserting the retry counter.
❌ Bad: agents dispatched sequentially (wastes time — they're independent); the full report dumped into chat instead of REVIEW.md; a finding like "could use more tests, consider error cases" (no file:line, no risk, no fix); lanes bleeding into each other (basics flagging naming, clarity flagging an unused import) — duplicates findings and dilutes attribution.
## What to fix first sits at the top; severities reflect the verifier's impact re-rating (moves carry a Verifier note:), not the lane.file:line, a concrete fix, and evidence the verifier confirmed..env.example/lockfile read.reference/AGENTS.md + "no preamble, no summary — only findings in the specified format."Lane overlap, env-var false positives, fixture vs real-secret, empty scope, huge diffs, verifier kept-everything, empty "what to fix first": reference/TROUBLESHOOTING.md.
Pairs with /security-review (security concerns), /review (open PRs), track-session (track fixes), ideal-react-component (React findings). Typical loop: edit → /code-review → read REVIEW.md, fix Critical/Major → re-run (overwrites) → commit when clean or only nits remain. Add REVIEW.md to .gitignore. Not a replacement for CI linting, static analysis, or human PR review — it's a pre-commit pass that catches what linters and humans miss. Maintainers: keep an evals/ dir with representative diffs (clean / real bug / tempting false positive); re-run after every prompt edit.
development
EXPERIMENTAL. Mine recent Claude Code transcripts for friction events, cluster them by active skill, propose patches for skills with 3+ friction events, validate each patch via headless replay, scrub the report through /publish-check, and present an EVOLUTION_REPORT.md for human review on a branch (never auto-merge). Use when asked to "evolve my skills", "audit skills against recent friction", "propose skill improvements from transcripts", "run the skill evolution pipeline", or as part of a weekly skill-quality cadence.
testing
Manual QA tracking — things tests can't verify. Use when asked to "create a QA list", "set up QA for this project", "what should I QA", "track manual QA", "audit the QA list", or "start manual QA".
development
Multi-source web research with cited synthesis in chat. Use when asked to "research X", "deep research on Y", "deep dive on Z", "investigate this topic", "compare X and Y", "pros and cons of X", or "survey the landscape of Y".
testing
Resume work, track progress across sessions, verify completion. Use when asked to "resume work", "pick up where I left off", "what was I doing", "save progress", or "are we done". For multi-session tasks.