skills/review/systematic-code-review/SKILL.md
4-phase code review: UNDERSTAND, VERIFY, ASSESS risks, DOCUMENT findings.
npx skillsauth add notque/claude-code-toolkit systematic-code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic 4-phase code review: UNDERSTAND changes, VERIFY claims against actual behavior, ASSESS security/performance/architecture risks, DOCUMENT findings with severity classification. Each phase has an explicit gate that must pass before proceeding because skipping phases causes missed context, incorrect conclusions, and incomplete risk assessment.
| Signal | Load These Files | Why |
|---|---|---|
| implementation patterns | go-review-patterns.md | Loads detailed guidance from go-review-patterns.md. |
| tasks related to this reference | receiving-feedback.md | Loads detailed guidance from receiving-feedback.md. |
| tasks related to this reference | severity-classification.md | Loads detailed guidance from severity-classification.md. |
Goal: Map all changes and their relationships before forming any opinions.
Step 1: Read CLAUDE.md
Step 2: Read every changed file
Step 3: Identify dependencies
Step 3a: Caller Tracing (mandatory when diff modifies function signatures, parameter semantics, or introduces sentinel/special values)
When the change modifies how a function/method is called or what parameters mean:
.GetEvents( not just GetEvents) across the entire repo. For Go repos, prefer gopls go_symbol_references via ToolSearch("gopls") — it's type-aware and catches interface implementations.r.FormValue, r.URL.Query): user-controlled — ANY string including sentinel values like "*""*" meaning "all/unfiltered") that bypass security filtering.This step catches:
Step 4: Document scope
PHASE 1: UNDERSTAND
Changed Files:
- [file1.ext]: [+N/-M lines] [brief description of change]
- [file2.ext]: [+N/-M lines] [brief description of change]
Change Type: [feature | bugfix | refactor | config | docs]
Scope Assessment:
- Primary: [what's directly changed]
- Secondary: [what's affected by the change]
- Dependencies: [external systems/files impacted]
Caller Tracing (if signature/parameter semantics changed):
- [function/method]: [N] callers found
- [caller1:line] — parameter validated: [yes/no]
- [caller2:line] — parameter validated: [yes/no]
- Unvalidated paths: [list or "none"]
Questions for Author:
- [Any unclear aspects that need clarification]
Gate: All changed files read (not just some — reading 2 of 5 files and saying "I get the gist" fails this gate), scope fully mapped, callers traced (if applicable). Proceed only when gate passes.
Verification means execution, not reasoning. Run the command. Do not reason about whether the command would pass. Do not summarize the expected output. Execute the check, paste the exit code, paste the relevant output. A verification phase that produces a verdict without an observed tool result is not a verification — it is a guess with a rigor aesthetic.
Goal: Validate all assertions in code, comments, and PR description against actual behavior.
Step 1: Run tests
Step 2: Verify claims
Step 3: Document verification
PHASE 2: VERIFY
Claims Verification:
Claim: "[Quote from comment or PR description]"
Verification: [How verified - test run, code trace, etc.]
Result: VALID | INVALID | NEEDS-DISCUSSION
Test Execution:
$ [test command]
Result: [PASS/FAIL with summary]
Behavior Verification:
- Expected: [what change claims to do]
- Actual: [what code actually does]
- Match: YES | NO | PARTIAL
Gate: All assertions in code/comments verified against actual behavior. Tests executed with output captured. Proceed only when gate passes.
Goal: Evaluate security, performance, and architectural risks specific to these changes.
Step 1: Security assessment
Step 2: Performance assessment
Step 3: Architectural assessment
Step 4: Extraction severity escalation
Step 5: Document assessment
PHASE 3: ASSESS
Security Assessment:
SQL Injection: [N/A | CHECKED - how verified | ISSUE - details]
XSS: [N/A | CHECKED - how verified | ISSUE - details]
Input Validation: [N/A | CHECKED - how verified | ISSUE - details]
Auth: [N/A | CHECKED - how verified | ISSUE - details]
Secrets: [N/A | CHECKED - how verified | ISSUE - details]
Findings: [specific issues or "No security issues found"]
Performance Assessment:
Findings: [specific issues or "No performance issues found"]
Architectural Assessment:
Findings: [specific issues or "Architecturally sound"]
Risk Level: LOW | MEDIUM | HIGH | CRITICAL
Gate: Security, performance, and architectural risks explicitly evaluated (not skipped or hand-waved). Proceed only when gate passes.
Goal: Refute each candidate finding before it reaches the DOCUMENT report, so speculative or already-handled findings are filtered out instead of written up. This mirrors the per-finding adversarial verify that cut a parallel review from 10 findings to 3 and targets the false-positive class that drives over-grading.
Run this phase ONLY when the gate condition is true. Otherwise skip to Phase 4 — small, clean reviews pay nothing.
Gate condition (run the verify pass when EITHER is true):
candidate_findings_count >= 4 OR any finding is BLOCKING
Otherwise (≤3 findings, none BLOCKING): skip this phase and note in DOCUMENT: Finding verify: SKIPPED (N findings, none blocking — below gate).
Step 1: Refute each finding. For every candidate finding from Phases 2–3, attempt to refute it against the actual code. The default stance is not-real; a finding survives only if refutation fails. Mark REFUTED when:
Step 2: Verify-and-downgrade severity. Reviewers over-grade; the verify step may re-grade a confirmed finding's tier (BLOCKING / SHOULD FIX / SUGGESTIONS) with a written justification, recording original→final. Change severity only on evidence (e.g., "BLOCKING→SHOULD FIX: the value is a server-issued UUID, not user input — auth/token.go:88"). Never silently alter a tier.
Step 3: Keep only confirmed findings for DOCUMENT. List refuted findings separately with their refutation reason so the reader sees what was filtered and why.
Honest framing: this verify is structure-and-plausibility level — it refutes findings whose path is hypothetical or already-guarded. It is not a correctness oracle: a CONFIRMED finding survived a refutation attempt, it is not thereby proven a real exploitable bug.
Cost guardrail: the pass scales with finding count (one refutation per finding) and is bounded — the gate skips it for small/clean reviews, and each finding is verified at most once. For large reviews, cap verification to the right-sizing tier the review was sized to (tier rules and scripts/right-size-review.py live outside this skill); verify highest-severity findings first and note any uncapped remainder.
Gate: Verify pass was skipped (gate false, noted in DOCUMENT) or completed (each candidate finding CONFIRMED/REFUTED, severity re-grades recorded). Proceed only when gate passes.
Goal: Produce structured review output with clear verdict and rationale.
Report facts without self-congratulation. Show command output rather than describing it. Be concise but informative because the review consumer needs actionable findings, not commentary.
Only flag issues within the scope of the changed code because suggesting features outside PR scope is over-engineering — but DO flag all issues IN the changed code even if fixing them requires touching other files. No speculative improvements.
When classifying severity, use the Severity Classification Rules below and classify UP when in doubt because it is better to require a fix and have the author push back than to let a real issue slip through as "optional."
Document confirmed findings only (after Phase 3.5). When the verify pass was skipped, all candidate findings count as confirmed. Use each finding's final tier (post-downgrade).
PHASE 4: DOCUMENT
Review Summary:
Files Reviewed: [N]
Lines Changed: [+X/-Y]
Test Status: [PASS/FAIL/SKIPPED]
Risk Level: [LOW/MEDIUM/HIGH/CRITICAL]
Finding Verify: [RAN — confirmed N / refuted N; re-grades: original→final or "none"] | [SKIPPED — N findings, none blocking]
Refuted (filtered, not in verdict):
1. [Original finding with file:line] — Refuted: [speculative | already-handled | not-actionable] ([citing file:line])
Findings (use Severity Classification Rules - when in doubt, classify UP):
BLOCKING (cannot merge - security/correctness/reliability):
1. [Issue with file:line reference and category from rules]
SHOULD FIX (fix unless urgent - patterns/tests/debugging):
1. [Issue with file:line reference and category from rules]
SUGGESTIONS (author's choice - purely stylistic):
1. [Suggestion with benefit - only if genuinely optional]
POSITIVE NOTES:
1. [Good practice observed]
Verdict: APPROVE | REQUEST-CHANGES | NEEDS-DISCUSSION
Rationale: [1-2 sentences explaining verdict]
Validate the structured output (schema gate)
Before accepting this review as complete, run the DOCUMENT output through the deterministic schema validator so a malformed review is caught mechanically rather than trusted on sight:
# Write the review markdown to a temp file, then validate.
python3 scripts/validate-review-output.py --type systematic /tmp/review-output.md
# Or pipe directly:
echo "$review_markdown" | python3 scripts/validate-review-output.py --type systematic -
Exit codes: 0 = structurally valid (verdict in {APPROVE, REQUEST-CHANGES, NEEDS-DISCUSSION}, risk_level present, every finding carries a file:line location); 1 = schema errors; 2 = unparseable; 3 = jsonschema not installed (pip install jsonschema).
This gate verifies the review is structurally well-formed (verdict, risk level, and finding locations present) — it does NOT verify findings completeness; a finding the parser couldn't structure is silently dropped rather than flagged.
On validation failure: retry exactly once, then stop.
MISSING: risk_level, MISSING: location) to repair the precise defect.Validate the regenerated output. If it still fails, STOP and report the malformed output — deliver only a review that passes the schema, because a verdict built on findings without locations or a missing risk level is unactionable.
After producing the review, remove any temporary analysis files, notes, or debug outputs created during review because only the final review document should persist.
Gate: Output passed validate-review-output.py --type systematic (exit 0). Structured review output with clear verdict. Review is complete.
When conflicting information exists, trust in this order:
Three tiers: BLOCKING (cannot merge — security, correctness, reliability), SHOULD FIX (fix unless urgent — patterns, tests, debugging), SUGGESTIONS (genuinely optional — style, naming, micro-optimizations). When in doubt, classify UP.
See references/severity-classification.md for full classification tables, the decision tree, and common misclassification examples.
Watch for patterns that linters miss: type export design, concurrency patterns (batch+callback, loop variable capture, commit callbacks), resource management (defer placement, connection pool reuse), metrics pre-initialization, testing deduplication, and unnecessary function extraction.
For projects using shared organization libraries: check for manual SQL row iteration instead of helpers, incorrect assertion depth, raw sql.Open() in tests, dead migration files, and database-specific naming violations.
See references/go-review-patterns.md for full checklists and red flags.
When receiving feedback: read completely, restate requirement to confirm understanding, verify against codebase, evaluate technical soundness, respond with reasoning or just fix it. Never performative agreement. Apply YAGNI check before implementing "proper" features. Stop and clarify before implementing anything unclear — items may be related.
See references/receiving-feedback.md for the full reception pattern, pushback examples, implementation order, and external vs internal reviewer handling.
Cause: Missing context about the change or its purpose Solution:
Cause: Tests fail during Phase 2 verification Solution:
Cause: Cannot determine severity of an issue Solution:
User says: "Review this PR" Actions:
User says: "Check this before we merge" Actions:
documentation
Document translation: quick/normal/refined modes with chunked parallel subagents and glossary support.
development
AI image generation: Gemini and Nano Banana backends; single/series/batch workflows with prompt-to-disk.
testing
Unified voice content generation pipeline with mandatory validation and joy-check. 13-phase pipeline: LOAD, GROUND, STATS-CHECKPOINT, GENERATE, HOOK-GATE, VALIDATE, REFINE, VARIETY-GATE, JOY-CHECK, ANTI-AI, CLOSE-GATE, OUTPUT, CLEANUP. Use when writing articles, blog posts, or any content that uses a voice profile. Use for "write article", "blog post", "write in voice", "generate content", "draft article", "write about".
documentation
Critique-and-rewrite loop for voice fidelity validation.