skills/code-review/SKILL.md
Parallel multi-agent code review. Launch reviewer team, synthesize findings, auto-fix blocking issues, loop until clean. Use when: "review this", "code review", "is this ready to ship", "check this code", "review my changes". Trigger: /code-review, /review.
npx skillsauth add phrazzld/spellbook code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Multi-provider, multi-harness code review. You are the marshal — read the diff, select reviewers, craft prompts, dispatch everything in parallel, synthesize results, fix blockers, loop until clean.
Delegation floor applies: probe the roster first; dispatch two or more
providers for substantive work; direct solo only for mechanical, emergency,
user-forbidden, or fewer-than-two-providers cases. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use independent adversarial review lanes calibrated to what would embarrass us in production; give reviewers the diff, acceptance criteria, and risk lens, not author reasoning.
Completion evidence core applies: use harnesses/shared/AGENTS.md
(Completion Evidence) as the universal evidence shape, then fill the local
fields below for review verdicts.
Read the diff. git diff $BASE...HEAD (default base: main or master).
Classify: what changed? (API, UI, tests, infra, security, perf, data model, etc.)
Load review pattern catalogs. If the target repo has a local
references/review-patterns.md or equivalent repo-local pattern catalog,
load it before dispatching reviewers. When a local entry links to shared
Harness Kit references such as
references/bounded-payload-discipline.md, include that reference in the
reviewer prompt and require findings to cite the local entry ID.
Select internal reviewers (lens map). Do NOT hand-pick. Run the
selection algorithm in references/bench-map.yaml:
git diff --name-only $BASE...HEAD → changed filesdefault; for every rule whose glob matches any changed file,
apply replace, union add, de-dupe, then cap at 5 (critic pinned)references/bench-map.md for the full algorithm and
references/internal-bench.md for each reviewer lens. Then craft a
tailored prompt per selected reviewer. Load
references/deep-review-lens.md when the diff needs root-cause,
provenance, or long-running autoreview discipline.Dispatch all three tiers in parallel:
| Tier | What | How |
|------|------|-----|
| Internal bench | 3-5 Explore sub-agents with philosophy lenses | Agent tool, tailored prompts |
| Thinktank review | 10 agents, 8 model providers | thinktank review CLI. See references/thinktank-review.md |
| Cross-harness | Two or more available non-lead providers from .harness-kit/agents.yaml | See references/cross-harness.md |
Thinktank-specific rule: wait for the process to exit, or for
trace/summary.json to reach complete or degraded with a
run_completed event in trace/events.jsonl, before you consume the run.
Mid-run output directories are not final artifacts.
Synthesize. Collect all outputs. Deduplicate findings across tiers. Rank by severity: blocking (correctness, security) > important (architecture, testing) > advisory (style, naming).
Verdict. If no blocking findings → Ship. If blocking findings exist → fix loop (below).
For each blocking finding, spawn a builder sub-agent with the specific file:line and fix instruction. Builders fix strategically (Ousterhout) and simply (grug). Builder fixes, runs tests, commits.
After all fixes land, re-dispatch all three review tiers. Full re-review, not a spot-check. Loop until no blocking findings remain. Max 3 iterations — escalate to human if still blocked.
Trigger: the diff touches user-facing patterns — .tsx, .jsx, pages/,
app/, routes/, api/, endpoints/, or component directories.
Rule: at least one reviewer must exercise the affected routes/components. Ship verdict is blocked until live verification passes.
Skip: pure refactors, config-only, test-only, backend-only with no user-facing surface.
Trigger: the diff adds or materially changes an executable path —
script entrypoint, CLI, package.json / make target, Dagger function,
migration, runner, responder, or job entrypoint.
Rule: the marshal or a reviewer must either:
Helper tests, fixture-only critics, and adjacent lanes do not count.
Verdict: if the path was not exercised, label it an
unverified runtime path. That is a blocking finding for a Ship
verdict unless the diff is explicitly framed as scaffolding only.
LLMs optimize for plausibility, not correctness. Reviewers must hunt for:
After review passes, if diff > 200 LOC net:
For meaningful implementation diffs, apply a harsh maintainability lens before issuing a Ship verdict:
any casts, deep nesting, and codegen residueRoute behavior-preserving cleanup to /refactor; block Ship when the slop or
spaghetti makes the changed behavior harder to reason about.
Recommend /hardening when the diff exposes a test-strength gap:
/hardening property for broad input domains, parser/serializer laws,
normalization, state transitions, totals, allocation, scheduling, sorting, or
round trips;/hardening mutation when branch-heavy changed code is covered only by
shallow examples or assertion-light tests;/hardening acceptance when user-facing fixtures, Gherkin, API examples,
CLI transcripts, or golden paths should fail if important values change;/hardening risk when complexity and coverage need to pick the next target
before a Ship verdict is credible.Do not block Ship merely because a hardening mode exists. Block only when the changed behavior is high-risk, the current tests are visibly weak, or the review cannot name a live evidence path that would catch the likely failure.
Every Ship or Conditional verdict must include:
Completion evidence core applies: behavior changed or verified, live evidence,
exact command/path/route, repo-fit check, and residual risk. See
harnesses/shared/AGENTS.md (Completion Evidence).
Local fields include hardening recommendation / waiver.
## Completion Gate
- Exact end-user behavior changed: behavior or internal operator behavior the diff changes.
- Evidence that proves it: review finding, test output, QA artifact, or command result behind the verdict.
- Exact command/path/route exercised: command, route, file path, artifact, or tool call inspected.
- Repo-fit check: live repo pattern or contract the verdict compared against.
- Hardening recommendation / waiver: mode recommended, mode run, or waiver reason.
- Residual risk: unverified path, accepted survivor, or none with reason.
If there is no end-user behavior because the diff is internal, say so and name the developer/operator behavior that changed instead.
After the final verdict, append one JSON line to .groom/review-scores.ndjson
in the target project root (create .groom/ if needed):
{"date":"2026-04-06","pr":42,"correctness":8,"depth":7,"simplicity":9,"craft":8,"verdict":"ship","providers":["claude","thinktank","codex","gemini"]}
pr is the PR number, or null when reviewing a branch without a PR.verdict: "ship", "conditional", or "dont-ship".providers: which review tiers contributed./groom reads it for quality trends.After scoring, record the verdict as a git ref so /deliver --polish-only and
pre-merge hooks can enforce review requirements without GitHub PRs.
source scripts/lib/verdicts.sh
verdict_write "<branch>" '{"branch":"<branch>","base":"<base>","verdict":"<ship|conditional|dont-ship>","reviewers":[...],"scores":{...},"sha":"<HEAD-sha>","date":"<ISO-8601>"}'
/deliver --polish-only and the pre-merge hook).sha field MUST be git rev-parse HEAD at the time of review. If the branch
gets new commits after review, the verdict is stale and /deliver --polish-only will re-trigger review.refs/verdicts/<branch> and sync via git push/fetch..evidence/<branch>/<date>/review-synthesis.md and
.evidence/<branch>/<date>/verdict.json for browsability.HARNESS_KIT_NO_REVIEW=1) is handled at the caller (pre-merge-commit hook), never inside /code-review.Run the snippet from the target project root. Skip this step if
scripts/lib/verdicts.sh does not exist there (Harness Kit-only feature, not
expected in downstream repos).
git diff main...HEAD is the scope.review.md, summary.md, and
agents/*.md may not exist until late. Watch stderr progress or
trace/summary.json, not just the directory listing. thinktank review eval
is broken in 6.3.0, so consume the final stdout JSON or the finished files.development
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
testing
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
data-ai
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
testing
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.