skills/code-review/SKILL.md
Structured code reviews with severity-ranked findings and deep multi-agent mode. Use when performing a code review, auditing code quality, or critiquing PRs, MRs, or diffs.
npx skillsauth add iliaal/ai-skills code-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Stage 1 -- Spec compliance (do this FIRST): verify the changes implement what was intended. Check against the PR description, issue, or task spec. Identify missing requirements, unnecessary additions, and interpretation gaps. If the implementation is wrong, stop here -- reviewing code quality on the wrong feature wastes effort.
Stage 2 -- Code quality: only after Stage 1 passes, review for correctness, maintainability, security, and performance.
Pre-flight: verify git rev-parse --git-dir exists before anything else. If not in a git repo, ask for explicit file paths.
When no specific files are given, resolve scope via this fallback chain:
git diff --name-only for unstaged + staged)git diff --name-only HEAD)git ls-files --others --exclude-standard) -- new files are often most review-worthyExclude: lockfiles, minified/bundled output, vendored/generated code.
When the review target is a branch (not a working-tree diff), the comparison range is the merge-base, not the working-tree delta — resolve it before reading any diff. Full fallback chain (PR base → default-branch inference → origin/* fallback list → git merge-base → shallow-clone retry) and the "never fall back to git diff HEAD" rule in scope-resolution.md.
When a branch is stacked on another unmerged branch, git merge-base HEAD <default-branch> over-covers — it sweeps in the sibling branch's commits, fabricating findings on files this change doesn't touch. Prefer the hosting platform's authoritative base SHA (PR/MR base_sha, or gh pr diff) over a locally computed merge-base. After the run, intersect every finding's path with the change's --name-only set and discard off-scope ones.
Run this BEFORE reading the full diff. Use metadata only (git diff --stat, file list from scope resolution) to count signals. Reading the diff first creates analysis momentum that bypasses mode selection.
| Signal | Threshold | Detect from |
|--------|-----------|-------------|
| Lines changed | >300 | git diff --stat insertion + deletion totals, excluding test files |
| Files touched | >8 | File count from scope resolution, excluding test files |
| Modules/directories spanned | >3 | Unique top-level directories from non-test file list |
| Security-sensitive files (auth, crypto, payments, permissions) | any | File path matching |
| Database migrations present | any | File path matching |
| API surface changes (public endpoints, exported interfaces) | any | File path matching |
Test file exclusion: exclude test paths (tests/, test/, __tests__/, *.test.*, *.spec.*, *_test.*) from the lines/files/directories signals — they inflate complexity without adding review risk. Filter with git diff --stat -- ':!tests/' ':!*.test.*' ':!*.spec.*' ':!*_test.*' and report both totals: "450 lines changed (280 excluding tests)."
3+ signals → deep review. Inform the user, then dispatch parallel specialist agents per deep-review.md. Pass the diff to agents -- do NOT read it first. Reading and analyzing the diff yourself before dispatching agents defeats the purpose of deep review. Stop here -- do not proceed to the Review Process section.
2 signals → suggest: "This touches N files across M modules. Deep review? (y/n)"
0-1 signals → standard review. Proceed to Review Process below.
Before auto-switching to deep review, check the exceptions list in deep-review.md -- certain change types (pure docs, mechanical refactors, single-file <50 lines) override signal count.
Override: deep forces multi-agent, quick forces single-pass.
Standard reviews only. If mode selection triggered deep review, specialist agents handle the review per deep-review.md -- do not run these steps yourself.
git diff --stat against the PR's stated intent. Classify as CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING. If DRIFT, note the drifted files and ask the author: ship as-is, split, or remove unrelated changes?gh pr view/gh api commands are in scope-resolution.md.A) in the diff, use the diff content directly -- don't attempt to read them from the working tree when reviewing a remote branch.input is empty here?") instead of declarative statements to encourage author thinking.Large diffs & PR sizing: For diffs >500 lines, review by module rather than file-by-file. Flag oversized PRs (ideal ~100-300 meaningful lines, excluding generated code) and suggest a split. Module-review approach, sizing thresholds, and the four split strategies (stack / by-file-group / horizontal / vertical) in pr-sizing.md.
Four severity tiers (Critical / Important / Medium / Minor) and a 5-band confidence rubric (0.0-1.0 → Report / Report-if-actionable / Suppress) govern what lands in the report. Full rules, false-positive suppression categories, and the LLM-specific prompt-injection exception in severity-and-confidence.md.
Tie every finding to concrete code evidence (file path, line number, specific pattern), carried in the CR-XXX entry itself — not only in the surrounding prose. An entry that names a function or describes a bug without its [file:line] and `quoted code` can't be verified by the reader. Never fabricate references.
For category checklists (Correctness, Maintainability & Readability, Performance, Adversarial red-team pass, AI-generated code lens), load check-categories.md. It's the structured checklist for the line-by-line review step.
Language-specific checks live in language-profiles.md — load the profile matching the file extensions in the diff (TypeScript/React, Python, PHP, Shell/CI, Configuration, Data Formats, Security, LLM Trust Boundaries).
For every finding, classify the fix into one of four tiers: safe_auto / gated_auto / manual / advisory. Full decision rules and conflict-resolution policy in action-routing.md. When in doubt, escalate to gated_auto — never promote toward safe_auto on disagreement.
Prefix inline review comments so authors know what requires action:
CLAUDE.md, AGENTS.md, or an inline comment documents a deliberate bypass (e.g., "we allow X because Y"), honor it: don't re-raise the concern or work around it "just to be safe". If the override lacks a rationale, suggest documenting one — don't argue the rule.git show <base>:<file>), not just the diff hunk. A re-spec may have intentionally redefined the contract (its description, not the OLD code, is the oracle); a dropped branch may have been a latent bug; a sibling may have always omitted the field. Conversely, pre-existing code outside the diff is this change's responsibility when the new feature makes a previously-invisible defect user-visible -- frame that as introduced here, not a follow-up. When a regression is confirmed against the baseline, cite the introducing commit (SHA, author -- via git blame or git bisect) as part of the finding's evidence, not just the symptom.*_count/*_ids/total, the summary projection AND the detail projection. A filter applied to one projection leaks the entity via a sibling field on the same response. Require a test asserting the hidden entity is absent from each surfacing field, not just the primary list.[Minor], [FYI]) are fine, but phrases like "INFO only", "no action required", "optional cleanup", "operational tradeoff" read to a downstream validator or second reviewer as a self-dismissal and get the finding dropped regardless of real severity. Anchor severity in concrete constants and numbers from the code ("20-minute floor", "every inactive bar"), not hypotheticals -- a named constant is harder to wave off than "potentially hours".## Review: [brief title]
### Critical
- **CR-001.** [file:line] `quoted code` -- [issue]. Score: [0.0-1.0]. [What happens if not fixed]. Fix: [concrete suggestion].
### Important
- **CR-002.** [file:line] `quoted code` -- [issue]. Score: [0.0-1.0]. [Why it matters]. Consider: [alternative approach].
### Medium
- **CR-003.** [file:line] -- [issue]. Score: [0.0-1.0]. [Why it matters].
### Minor
- **CR-004.** [file:line] -- [observation].
### What's Working Well
- [specific positive observation with why it's good]
### Residual Risks
- [unresolved assumptions, areas not fully covered, open questions]
### Verdict
Ready to merge / Ready with fixes / Not ready -- [one-sentence rationale]
Number findings CR-001, CR-002... sequentially across all severities so they're referenceable by ID. Limit to 10 per severity; if more exist, note the count and show the highest-impact ones.
Markdown safety: in table cells, escape literal | as \| — code excerpts with pipe operators (a | b, string | null) split rows silently otherwise. Bullet output is pipe-safe.
For multi-agent consolidation (deep/parallel review), apply the merge algorithm in deep-review.md — same-line dedupe, conflicting severity, NEEDS DECISION flagging, cross-lens confidence boosting.
Clean review (no findings): if the code is solid, say so explicitly — summarize what was checked and why no issues were found. A clean review is a valid outcome, not a sign of insufficient effort.
| Document | Load when — what it covers |
|----------|----------------------------|
| security-patterns.md | Security step — grep-able detection patterns, 11 vulnerability classes |
| security-test-coverage.md | Security-audit deliverable (ia-security-sentinel) — auth/authz, input-boundary, concurrency, session, output checklist |
| language-profiles.md | Language-specific checks — TS/React, Python, PHP, Shell/CI, Config, LLM Trust |
| deep-review.md | Mode triggers deep review — specialist agents, prompt template, merge algorithm, model selection |
| review-traps-catalog.md | Any non-trivial review; "should"/"could"/"what if" findings — reachability-first, convention-from-3, speculative-design, enum drift, contract staleness, version gotchas |
| check-categories.md | Line-by-line step — correctness, maintainability, performance, adversarial, AI-code lens |
| action-routing.md | Per-finding fix tier — safe_auto / gated_auto / manual / advisory, conflict resolution |
| severity-and-confidence.md | Severity + confidence — 4 tiers, 5-band rubric, FP suppression |
| false-positive-suppression.md | FP categories — framework-idiom and test-specific overridable patterns |
| scope-resolution.md | Branch review or prior PR comments — merge-base resolution, discussion-fetch commands |
| pr-sizing.md | Large/oversized diff — module review, sizing thresholds, split strategies |
| external-review-subprocess.md | External-CLI reviewer (codex/claude -p, /code-review ultra) — heartbeat tolerance, run-until-clean, frozen-diff binding |
ia-receiving-code-review -- the inbound side (processing review feedback received from others). Action-routing terminology maps across: safe_auto ≈ AUTO-FIX, gated_auto ≈ ESCALATE-for-approval, manual ≈ ESCALATE, advisory ≈ FYI (no-op).ia-kieran-reviewer agent -- persona-driven Python/TypeScript deep quality review (type safety, naming, modern patterns)/ia-review -- full ceremony review (worktrees, ultra-thinking, multi-agent). Deep review is lighter: no worktrees, no plan verification, just parallel specialist agents on the same diff./resolve-pr-parallel command -- batch-resolve PR comments with parallel agentsia-security-sentinel agent -- deep security audit beyond the security step in this skill. Also supports threat-model mode for architectural security analysis when the diff introduces new trust boundaries, auth flows, or external API surfaces.development
Generic test writing discipline: test quality, real assertions, anti-patterns, and rationalization resistance. Use when writing tests, adding test coverage, or fixing failing tests for any language or framework. Complements language-specific skills.
testing
Enforces fresh verification evidence before any completion claim. Use when about to claim "tests pass", "bug fixed", "done", "ready to merge", or handing off work.
tools
Tailwind CSS v4 patterns: CSS-first config, utility classes, component variants, v3 migration. Use when styling with Tailwind, configuring @theme tokens, using tailwind-variants/CVA, migrating v3 to v4, or fixing Tailwind styles and dark mode.
development
Simplifies, polishes, and declutters code without changing behavior. Use when asked to simplify, clean up, refactor, declutter, remove dead code or AI slop, or improve readability. For analysis-only reports without code changes, use code-simplicity-reviewer agent.