skills/pr-review/SKILL.md
Diff-based PR review across code quality, test coverage, silent failures, type design, and comment quality with severity-ranked findings. Triggers on: "review my PR", "review this code", "check my changes", "audit this PR", "code review". NOT for pre-landing gate, use pre-landing-review.
npx skillsauth add mathews-tom/armory pr-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Diff-based code review across five dimensions. Reads the changed files, selects applicable review methodologies, and produces an aggregated report with severity-ranked findings.
Native alternative: Claude Code's
/ultrareviewruns a lightweight native bug-focused review (three free per month on Pro/Max plans at Opus 4.7's launch). Use this skill for five-dimension severity-ranked analysis (code quality + tests + error handling + types + comments) with file:line references; use/ultrareviewfor a quick bug-hunting pass on a diff.
| File | Contents | Load When |
| ------------------------------- | ------------------------------------------------------- | ------------------------------- |
| references/code-review.md | Guideline compliance, bug detection, confidence scoring | Always |
| references/test-analysis.md | Behavioral test coverage, criticality rating | Test files changed |
| references/error-handling.md | Silent failure patterns, catch block analysis | Error handling changed |
| references/type-design.md | Invariant analysis, 4-dimension rating rubric | Type definitions added/modified |
| references/comment-quality.md | Comment accuracy, long-term value, rot detection | Comments/docstrings added |
git diff (unstaged changes)git diff main...HEAD or gh pr diff <number>git diff --name-onlyClassify changed files and select applicable dimensions:
| Condition | Dimension | Reference to Load |
| ---------------------------------------------------------------------------- | --------------- | ------------------------------- |
| Always | Code review | references/code-review.md |
| Files matching *test*, *spec*, *_test.*, test_* | Test analysis | references/test-analysis.md |
| Files containing try/catch, except, .catch, Result, error callbacks | Error handling | references/error-handling.md |
| Files containing class, interface, type, struct, enum, dataclass definitions | Type design | references/type-design.md |
| Files with new/modified docstrings, JSDoc, or block comments | Comment quality | references/comment-quality.md |
Load only the reference files that apply. Skip dimensions with no matching files.
For each applicable dimension, analyze the diff using the loaded methodology:
Merge all findings into a single report, deduplicated and severity-ranked.
Deduplication rules:
Severity mapping across dimensions:
| Dimension | Maps to Critical | Maps to Important | Maps to Suggestion | | --------------- | ------------------- | ------------------------ | --------------------- | | Code review | Confidence 90-100 | Confidence 80-89 | — | | Test analysis | Rating 9-10 | Rating 7-8 | Rating 5-6 | | Error handling | CRITICAL | HIGH | MEDIUM | | Type design | Any rating <= 3/10 | Any rating 4-6/10 | Rating 7-8/10 | | Comment quality | Factually incorrect | Misleading or incomplete | Restates obvious code |
# PR Review Summary
**Scope:** [X files changed, Y dimensions applied]
**Dimensions:** [list of active dimensions]
## Critical Issues (must fix before merge)
- **[dimension]** `file:line` — Description. Fix suggestion.
## Important Issues (should fix)
- **[dimension]** `file:line` — Description. Fix suggestion.
## Suggestions (consider)
- **[dimension]** `file:line` — Description.
## Strengths
- What's well-done in this changeset.
## Recommended Action
1. Fix critical issues
2. Address important issues
3. Consider suggestions
4. Re-run review after fixes
If no issues are found at any severity level, confirm the code meets standards with a brief summary of what was reviewed and which dimensions were applied.
Users can request specific dimensions instead of running all:
| User Says | Dimensions Applied | | ----------------------------------------------- | ------------------------ | | "review my PR" / "check my changes" | All applicable (default) | | "review the code" / "check code quality" | Code review only | | "check the tests" / "is test coverage good" | Test analysis only | | "check error handling" / "find silent failures" | Error handling only | | "review the types" / "check type design" | Type design only | | "check the comments" / "review documentation" | Comment quality only |
When a specific aspect is requested, load only that reference file and skip routing.
| Problem | Resolution | | --------------------------- | ----------------------------------------------------------------------------- | | No git diff available | Ask user to specify files or scope | | CLAUDE.md not found | Review against general best practices; note the absence | | No test files in diff | Skip test analysis dimension; note in output | | Diff is empty | Report "no changes to review" and stop | | Diff exceeds context limits | Focus on files the user is most likely to care about; summarize skipped files |
function declarations.code-refiner skill's job. Keep the roles separate.| Rationalization | Reality | |---|---| | "Tests pass, so the code is fine" | Tests are necessary but insufficient — they miss architecture, security, readability, and maintainability concerns | | "It's a small diff, no real review needed" | Small changes cause most production incidents; a 3-line auth bypass is worse than a 300-line refactor | | "We'll clean it up later" | Later never comes — the review IS the quality gate before code becomes legacy | | "The author is senior, I trust them" | Seniority doesn't prevent mistakes; fresh eyes catch what familiarity blinds | | "I already reviewed similar code recently" | Each diff has unique context — assumptions from past reviews cause missed issues | | "This is just a refactor, nothing can break" | Refactors change behavior in subtle ways — verify with tests and trace call sites |
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.
development
Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.