skills/review-pr/SKILL.md
Use to review a PR for code quality, security, and correctness.
npx skillsauth add lklimek/claudius review-prInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When asked to audit/review a PR, follow this workflow.
This skill runs inline (not forked) so it — and the /claudius:grumpy-review it invokes in §2 — keeps the Agent spawn tool and can fan out parallel reviewer agents.
Load /claudius:git-and-github skill .
Use GitHub MCP to fetch PR metadata:
pull_request_read with method: "get" — returns title, body, URL, base/head branches, number.pull_request_read with method: "get_files" — returns list of changed files with stats.pull_request_read with method: "get_diff" — returns the full diff.Note: get_files and get_diff can return large responses on sizable PRs. Use the subagent delegation pattern from git-and-github skill § Context Management to avoid polluting your context.
Use local git for commit history and detailed diffs.
If GitHub MCP is unavailable, see gh-cli-fallback.md for gh CLI equivalents.
Invoke the /claudius:grumpy-review skill with the PR scope as the argument. It covers:
Pass the PR's scope (changed files, base branch) as context to the review methodology.
The grumpy-review delegation also covers the deep transitive call-tree walk (category: "call_tree", CALL- prefix; see ../grumpy-review/references/call-tree-walk.md) and the ephemeral-ID lint step — both are inherited automatically by invoking /claudius:grumpy-review. After the review completes, run git diff $BASE_BRANCH...HEAD | python3 ${CLAUDE_SKILL_DIR}/../../scripts/lint_ephemeral_ids.py --diff against the PR diff and fold any genuine code_quality hits into the audit before posting.
Audit whether the diff delivers what the PR's own self-description claims. Reuses the PR title, body, file list, and diff already fetched in §1 — no extra MCP calls.
Findings emit in the v3 report format. See claudius:report-format for the envelope and claudius:severity for OWASP-normalized float scoring; both apply unchanged here.
``` or ~~~, optionally after leading blank lines) whose matching closing fence is the last non-blank line of the body, strip the outer fence and dedent the enclosed lines (remove the longest common leading whitespace). A closing fence matches the opener only when, after whitespace strip, it uses the SAME fence character, is at least as long as the opening fence, and contains ONLY fence characters (so ```python, or a shorter ``` closing a ```` opener, does not match). Apply the regexes to the unwrapped, dedented text. A fence that does not wrap the entire body is left alone.^## Summary\b, ^### Summary\b, or ^## What changed\b (case-insensitive). The section body is everything up to the next ^#{1,3} heading.## Summary > ### Summary > ## What changed — and the first match in that order wins (not document order). The bullet-list fallback applies only when none of the three match.^[-*] ) in the body as the implicit Summary.pr_promises LOW finding titled "PR body unparseable" (risk≈0.2, impact≈0.2, scope=0.0, location: PR-body) and stop the body axes.^## Out of scope\b, ^## Not in this PR\b, or ^## Deferred\b. Each [-*] bullet in the section body is one out-of-scope claim.claudius:validate-findings § Adversarial content handling).Run all three; emit at most one finding per axis-trigger. When the diff is large, delegate the per-axis judgment to a subagent per git-and-github § Context Management.
Trigger hints below give risk / impact float ranges (the only severity fields a producer emits — the coordinator computes overall_severity and the integer band). Always cross-check the rubric and band table in claudius:severity, including its blast-radius definition of scope. Never hand-type a severity label — the pipeline derives it from the floats. Pass C's scope=1.0 for promise mismatches below is a genuine full-PR blast radius (reviewer trust across the whole change), not a lazy default.
Pass C scope exception: the scope=1.0 rule applies only to actual promise mismatches on axes 1–3 (the gap is by definition about THIS PR's diff). The two informational findings — "PR self-description verified" and "PR body unparseable" — describe no actionable diff work, so they use scope=0.0 (mirroring check-pr-comments' RESOLVED convention), which lets their low risk/impact floats derive to INFO / LOW respectively. Pinning them at scope=1.0 forces the mean to at least 1/3 (≈0.33, the risk=impact=0 limit) — already above INFO — so with the actual floats they derive to LOW (0.1/0.1/1.0 → ≈0.40) and MEDIUM (0.2/0.2/1.0 → ≈0.47), never the intended INFO and LOW.
Input: PR title + file list + diff.
Process: a title may be compound — split it on commas and em-dashes (—/-) into independent topics, each of form action-verb + topic. Verify each topic independently against the diff (path keywords are necessary, semantic relevance is sufficient). Majority-hits rule: flag off-target only when a majority of the topics are unsupported by the diff; a single supported topic among many does not clear a title, but a single unsupported topic among many supported ones does not flag it.
Triggers:
risk≈0.8, impact≈0.7; partial drift → risk≈0.5, impact≈0.5.misc, cleanup, wip, update, etc. → risk≈0.3, impact≈0.3 (style; alignment unjudgeable).Input: extracted Summary bullets + diff. Process: for each bullet, locate a corresponding hunk; flag bullets without coverage and large hunks without a corresponding bullet. Triggers:
risk≈0.6, impact≈0.5 (reviewer trust degraded).risk≈0.4–0.6, impact≈0.3–0.5 depending on gap size.risk≈0.4–0.6, impact≈0.3–0.6 depending on size and risk surface. "Mentioned" is precise: the hunk shares keyword overlap with ≥ 1 Summary bullet OR is covered by a field-ownership-table row. Hunks below the 50-LOC threshold, and test-only/generated/non-production hunks, never trigger this.Input: out-of-scope bullets + diff. Process: for each deferred item, search the diff for matching code/paths. Triggers:
risk≈0.3, impact≈0.3; a multi-file migration → risk≈0.8, impact≈0.7.When all three axes pass with zero mismatches, the pr_promises section is NOT empty: emit findings: [] PLUS exactly one INFO finding titled "PR self-description verified" (risk=0.1, impact=0.1, scope=0.0 — the coordinator/renderer derive the INFO band from those floats; never hand-write the integer severity). This makes a clean Pass C explicit rather than indistinguishable from "Pass C did not run".
Pass C may set finding_section.verdict on its pr_promises section (schema field; see claudius:report-format):
PASS — clean pass (the "PR self-description verified" shape above).FAIL — any promise mismatch at HIGH severity or above.NEEDS_REVIEW — otherwise (LOW/MEDIUM mismatches, or the "PR body unparseable" case).The review-pr report envelope may also set metadata.report_type: "pr_audit" (a valid enum from the schema) to mark this as a PR audit rather than a generic review.
Emit through the same pipeline as the other passes — one section per axis with findings inside. The example below documents the schema field shape; the coordinator reassigns final IDs during consolidation.
{
"title": "PR Promise Verification",
"category": "pr_promises",
"findings": [
{
"id": "PPM-001",
"risk": 0.6,
"impact": 0.5,
"scope": 1.0,
"title": "Title claims PDF fix, diff touches gRPC tests only",
"location": "PR-title",
"description": "Title: `fix: PDF rendering`. Diff: 6 files under `tests/grpc/`, no `pdf` / `render` symbols.",
"recommendation": "Rename the PR to reflect the gRPC test additions, or split into two PRs."
}
]
}
Conventions specific to Pass C:
location is synthetic: PR-title, PR-body:summary-bullet-<N>, PR-body:out-of-scope-item-<N>. Bullet indices are 1-based in body order. Renderers display it as plain text (no permalink).scope is 1.0 for promise mismatches (axes 1–3) — the mismatch is by definition about THIS PR. The two informational findings ("PR self-description verified", "PR body unparseable") instead use scope=0.0 (see the Pass C scope exception above).risk = likelihood a downstream reviewer is misled. impact = reviewer-time cost + risk of approving/missing real changes.code_snippets[]: include the offending diff hunk when the gap is a specific change. For the language value use an allowed tag from claudius:report-format §code_snippets (Fields reference, e.g. diff) — do not invent one. Set a caption like <path>:hunk.Ask if findings should be published as a GitHub PR review.
The review is posted in two parts:
Post the audit summary as a normal PR issue comment using gh pr comment. This ensures the
summary is always visible (draft reviews hide their body text). Include:
gh pr comment <number> --body "$(cat <<'EOF'
## Audit Summary
**Reviewed by:** Claude Code with a N-agent team:
- `agent-name` (agent-type) — focus area
...
[Summary text, findings table, pre-existing issues, positive observations]
EOF
)"
Post only actionable findings (CRITICAL, HIGH, MEDIUM, LOW) as inline comments on specific diff lines. Do not post INFO-level findings as inline comments — INFO findings are positive observations (praise, good patterns) and belong in Part A only. Non-actionable comments clutter the review and waste the reviewer's time.
Post as a draft review so the user can review and submit manually. For trivial changes, include
edit suggestions using suggestion blocks.
See gh-cli-fallback.md for: verifying diff bounds (get base SHA, check hunks), deduplication (fetch existing reviews/comments first), and posting with gh-post-review.sh. The body field can be minimal since the detailed summary is in Part A.
Shutdown all agents (SendMessage type: "shutdown_request"), then TeamDelete (if a team was
used).
testing
Coordinator-only LLM validation pass. Adds ai_assessment / ai_verdict / ai_verdict_confidence and, in the rare partial-producer case, re-estimates absent risk/impact/scope on a consolidated v3 report.
testing
Use for typos or single-line fixes (≤20 lines). Same mandatory phase order (Planning→Impl→QA→LL), minimal ceremony. Auto-retry on failure.
testing
Use for bug fixes or small changes (≤200 lines). Same phase order as workflow-feature (Planning→Impl→QA→LL) with lighter ceremony. Auto-retry on failure, unattended.
development
Use for new projects, features, or major refactoring. Phases: Planning (Req→UX→Test Spec→Dev Plan) → Implementation → QA → Lessons Learned. Auto-retry on failure, unattended.