skills/review/parallel-code-review/SKILL.md
Parallel 3-reviewer code review: Security, Business-Logic, Architecture.
npx skillsauth add notque/claude-code-toolkit parallel-code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Orchestrate three specialized code reviewers (Security, Business Logic, Architecture) in true parallel using the Fan-Out/Fan-In pattern. Each reviewer runs independently with domain-specific focus, then findings are aggregated by severity into a unified BLOCK/FIX/APPROVE verdict.
Goal: Determine changed files and select appropriate agents before dispatching.
Step 1: Read repository CLAUDE.md to load project-specific conventions that reviewers must respect.
Step 2: List changed files
# For recent commits:
git diff --name-only HEAD~1
# For PRs:
gh pr view --json files -q '.files[].path'
Step 3: Select architecture reviewer agent based on the dominant language. This ensures the architecture reviewer applies idiomatic standards rather than generic advice, because different languages have fundamentally different design patterns and conventions.
| File Types | Agent |
|-----------|-------|
| .go files | golang-general-engineer |
| .py files | python-general-engineer |
| .ts/.tsx files | typescript-frontend-engineer |
| Mixed or other | Explore |
Optional enrichments (only when user explicitly requests):
Gate: Changed files listed, architecture reviewer agent selected. Proceed only when gate passes.
Goal: Launch all 3 reviewers in a single message for true concurrent execution.
Critical constraint: All three Task calls MUST appear in ONE response. Sending them sequentially triples wall-clock time and defeats the purpose of parallel review. This is not optional—parallelism is the entire value proposition of this skill.
Dispatch exactly these 3 agents. This is a read-only review—reviewers observe and report but never modify code.
Reviewer 1 -- Security
file:line referencesReviewer 2 -- Business Logic
file:line referencesReviewer 3 -- Architecture (using agent selected in Phase 1)
file:line referencesDimension lenses (all 3 reviewers apply, in addition to their role focus). Roles cover who reviews; lenses cover what classes of issue every review must touch. Folding these into the existing briefs catches doc-accuracy and scope creep without paying for extra agents:
| Lens | What it catches | Owner reviewer (primary) | |------|-----------------|--------------------------| | Doc-accuracy | Comments, docstrings, READMEs, or PR description that no longer match the code's actual behavior | Business Logic | | Scope / simplicity | Changes beyond the stated task, speculative abstraction, dead options, YAGNI violations | Architecture |
Each reviewer reports lens findings inline with its role findings, tagged with the same [Reviewer] and file:line format so they flow through the schema gate unchanged.
Critical constraint: Always run all 3 reviewers regardless of perceived change simplicity. Config changes can expose secrets, "trivial" fixes can break authorization, and each reviewer's specialization catches issues the others miss. Let a reviewer report "no findings" rather than skip it—because silence is information too.
Gate: All 3 Task calls dispatched in a single message. Proceed only when ALL 3 return results—never issue a verdict from partial results, because the missing reviewer may hold the only CRITICAL finding. Partial results are worse than no review.
Goal: Merge all findings into a unified severity-classified report.
Critical constraint: Never dump raw reviewer outputs as three separate sections—the reader should not have to mentally merge findings across reviewers. Your job is to synthesize, not summarize.
Step 1: Classify each finding by severity
| Severity | Meaning | Action | |----------|---------|--------| | CRITICAL | Security vulnerability, data loss risk | BLOCK merge | | HIGH | Significant bug, logic error | Fix before merge | | MEDIUM | Code quality issue, potential problem | Should fix | | LOW | Minor issue, style preference | Nice to have |
Step 2: Deduplicate overlapping findings
Multiple reviewers may flag the same issue. Merge duplicates, keeping the highest severity. Overlap between reviewers is a feature (independent confirmation), but the report should consolidate it so readers see a unified issue once, not three times.
Step 3: Build reviewer summary matrix
Include this matrix in every report so stakeholders see the severity distribution at a glance:
| Reviewer | CRITICAL | HIGH | MEDIUM | LOW |
|----------------|----------|------|--------|-----|
| Security | N | N | N | N |
| Business Logic | N | N | N | N |
| Architecture | N | N | N | N |
| **Total** | **N** | **N**| **N** | **N**|
Gate: All findings classified, deduplicated, and summarized. Proceed only when gate passes.
Goal: Refute each surviving finding independently before it reaches the report, so speculative or already-handled findings are dropped instead of shipped. This is the behavior that cut a native review from 10 findings to 3 (~70% noise removed) and directly targets the false-positive class that produced a 25% false-positive rate earlier.
Run this phase ONLY when the gate condition below is true. Otherwise skip straight to Phase 4 — small, clean reviews pay nothing.
Gate condition (run the verify pass when EITHER is true):
findings_count >= 4 OR any finding is severity CRITICAL or HIGH
Otherwise (≤3 findings, all MEDIUM/LOW): skip this phase. State in the report: Adversarial verify: SKIPPED (N findings, max severity MEDIUM — below gate).
Step 1: Dispatch one verification check per finding. Each finding gets an independent check (a Task call) prompted to REFUTE the finding, not confirm it. Dispatch the checks for a single review in one message so they run concurrently, matching the parallel pattern of Phase 2. The verifier's default stance is not-real; the finding survives only if the verifier cannot refute it.
Verifier prompt contract (per finding):
file:line, its claimed severity, and read access to the cited code.CONFIRMED or REFUTED, a one-line justification with a file:line citation, and (for confirmed findings) a verified severity per Step 2.Step 2: Verify-and-downgrade severity. Reviewers over-grade. The verifier may re-grade a confirmed finding's severity, recording original→final with a written justification. Severity changes only on evidence (e.g., "downgraded CRITICAL→MEDIUM: the unsanitized value is server-issued UUID, not user input — auth/token.go:88"). Record every re-grade; never silently alter a severity.
Step 3: Keep only CONFIRMED findings. REFUTED findings are removed from the aggregate and listed separately as refuted (with the refutation reason) so the reader sees what was filtered and why — transparency, not a silent drop.
Honest framing: per-finding verify catches false positives at the structure-and-plausibility level — it refutes findings whose failure path is hypothetical or already-guarded. It is not a correctness oracle: it cannot prove a confirmed finding is a real exploitable bug, only that a refutation attempt failed. Treat CONFIRMED as "survived refutation," not "proven."
Cost guardrail: the verify pass scales linearly with finding count (one check per finding) and is bounded two ways — (1) the gate above skips it entirely for small/clean reviews, and (2) it runs at most once per finding (no verify-the-verifier recursion). For larger diffs, cap the number of verify checks to the right-sizing tier for the review (the tier rules and scripts/right-size-review.py live outside this skill; reference the tier the review was sized to). When the cap is hit, verify the highest-severity findings first and note the uncapped remainder in the report.
Gate: Verify pass was either skipped (gate condition false, stated in report) or completed (every finding marked CONFIRMED/REFUTED, severity re-grades recorded). Proceed only when gate passes.
Goal: Produce final report with clear recommendation.
Critical constraint: Every review must end with an explicit verdict. Ambiguity is a decision to merge untested code. Choose: BLOCK, FIX, or APPROVE.
The verdict is computed from confirmed findings only (after Phase 3.5). When the verify pass was skipped, all aggregated findings count as confirmed. Use each finding's final severity (post-downgrade), not its original.
Step 1: Determine verdict
| Condition | Verdict | |-----------|---------| | Any CRITICAL findings | BLOCK | | HIGH findings, no CRITICAL | FIX (fix before merge) | | Only MEDIUM/LOW findings | APPROVE (with suggestions) |
Step 2: Output structured report
## Parallel Review Complete
### Severity Matrix
| Severity | Count | Summary |
|----------|-------|---------|
| Critical | N | One-line aggregated summary |
| High | N | One-line aggregated summary |
| Medium | N | One-line aggregated summary |
| Low | N | One-line aggregated summary |
Counts above are CONFIRMED findings (post-verify). Details by reviewer below.
### Adversarial Verify
- Status: RAN (gate: N findings / max severity X) | SKIPPED (gate condition false)
- Confirmed: N | Refuted: N
- Severity re-grades: [original→final with justification, or "none"]
### Refuted Findings (filtered, not in verdict)
1. [Reviewer] Original finding - file:line — Refuted: [speculative | already-handled | not-actionable] ([citing file:line])
### Combined Findings
#### CRITICAL (Block Merge)
1. [Reviewer] Issue description - file:line
#### HIGH (Fix Before Merge)
1. [Reviewer] Issue description - file:line
#### MEDIUM (Should Fix)
1. [Reviewer] Issue description - file:line
#### LOW (Nice to Have)
1. [Reviewer] Issue description - file:line
### Summary by Reviewer
[Matrix from Phase 3]
### Recommendation
**VERDICT** - [1-2 sentence rationale]
Step 3: Validate the structured output (schema gate)
Each reviewer returns markdown. Before accepting a reviewer's output into the aggregate, run it through the deterministic schema validator so a malformed review is caught mechanically rather than trusted on sight:
# Pipe each reviewer's markdown directly (preferred — no shared temp file, so
# 3 parallel reviewers never overwrite each other):
echo "$reviewer_markdown" | python3 scripts/validate-review-output.py --type parallel -
# Or, if writing to disk, use a per-reviewer path (NOT a shared /tmp file):
python3 scripts/validate-review-output.py --type parallel /tmp/reviewer-<name>.md
Exit codes: 0 = structurally valid (verdict present, severity_matrix complete, every finding carries [Reviewer] and a file:line location); 1 = schema errors; 2 = unparseable; 3 = jsonschema not installed (pip install jsonschema).
This gate verifies the review is structurally well-formed (verdict, severity buckets, and locations present) — it does NOT verify findings completeness (no minimum count; a parser-dropped malformed finding leaves no trace) NOR that the severity_matrix counts agree with the findings array (no matrix↔findings cross-check).
On validation failure: retry that ONE reviewer exactly once, then stop.
MISSING: reviewer, MISSING: location) so it knows precisely what to repair.Validate the retried result. If it still fails, STOP and report the malformed output — proceed only on review data that passes the schema, because a verdict synthesized from broken findings is worse than no verdict.
Step 4: If BLOCK verdict, initiate re-review protocol
After user addresses CRITICAL issues, re-run ALL 3 reviewers (not just the one that found the issue) to verify:
Re-run all three because fixes often introduce new issues in adjacent code, and you need confirmation across all three domains that the solution is safe.
Gate: Each reviewer's output passed validate-review-output.py --type parallel (exit 0), structured report delivered with verdict. Review is complete.
Cause: One or more Task agents exceed execution time.
Solution:
Cause: Systemic issue (bad file paths, permission errors, context overflow).
Solution:
Cause: Two reviewers disagree on severity or interpretation of same code.
Solution:
findings_count >= 4 OR any finding is CRITICAL/HIGH; else skip. One check per finding, capped to the right-sizing tier; refuted findings filtered, severity re-grades recordeddata-ai
Extract video transcripts: yt-dlp subtitles to clean paragraphs.
tools
Collect, filter, and freshness-qualify news items.
development
Convert PDF, Office, HTML, data, media, ZIP to Markdown.
testing
Verify factual claims against sources before publish.