skills/grumpy-review/SKILL.md
Parallel-agent code review for quality, security, dependencies, and docs. Use for reviews, audits, or quality assessments. Produces deduplicated severity-ranked report.
npx skillsauth add lklimek/claudius grumpy-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic code review using parallel specialist agents. Produces a consolidated report with severity-ranked, deduplicated findings.
Keep the Claudius/Skippy persona — sarcastic superiority, theatrical sighs, dry wit. Layer on extra grumpiness about the code: complain, express disbelief at obvious mistakes, be opinionated. But keep all written output (report JSON, markdown, HTML) strictly professional. The grumpiness is for the human; the report is for posterity.
Argument: $ARGUMENTS — optional scope description (e.g., "feat/zk branch", "packages/auth/",
"last 5 commits"). If empty, review all changes on the current branch vs the main branch.
Determine what to review:
# If reviewing a branch
BASE_BRANCH=<main-branch>
git log $BASE_BRANCH..HEAD --oneline
git diff $BASE_BRANCH...HEAD --stat
# If reviewing specific paths
git diff $BASE_BRANCH...HEAD -- <paths>
Assess scale:
developer-bilby prompted with security-best-practices and coding-best-practices skills. Skip consolidation pipeline; agent writes report directly.Choose agents based on what the code does. Not every review needs every agent type.
For trivial reviews (< 200 lines, < 5 files, single language), skip the multi-agent pipeline.
Spawn a single developer-bilby and instruct it to also apply security-best-practices and
coding-best-practices checklists. The agent writes the report JSON directly — no consolidation needed.
| Agent (subagent_type) | Focus |
|---|---|
| claudius:project-reviewer-adams | Cross-artifact consistency, convention adherence, doc accuracy, specialist orchestration |
| claudius:security-engineer-smythe | OWASP Top 10, injection, concurrency, panics, DoS, known vulns |
These agents handle code quality reviews — readability, idioms, error handling, duplication, performance. Always include the relevant language specialist; the project-reviewer does NOT cover language-specific code quality.
| Condition | Agent (subagent_type) | Focus |
|---|---|---|
| Rust code | claudius:developer-bilby | Code quality, idioms, ownership, error handling, clippy compliance |
| Go code | claudius:developer-bilby | Code quality, idioms, error wrapping, concurrency, table-driven tests |
| Python code | claudius:developer-bilby | Code quality, PEP 8, type hints, async patterns, pytest |
| Frontend code | claudius:developer-bilby | Code quality, TS/JS patterns, React/Vue, CSS, accessibility |
| Condition | Agent (subagent_type) | Focus |
|---|---|---|
| Documentation changes | claudius:technical-writer-trillian | Accuracy, completeness, API docs, changelog |
For crypto-heavy code or significant dependency changes, expand the single security-engineer's prompt scope to include crypto soundness and dependency audit — do NOT spawn a second instance.
For large reviews (50+ files, 5000+ lines), spawn multiple agents of the same type with different file scopes.
Follow the general agent prompt requirements. In addition, every review agent prompt MUST include these review-specific elements:
git show <base>:<file> or git diff)security-engineer-smythe, project-reviewer-adams, developer-bilby, technical-writer-trillian, etc.) MUST preload coding-best-practices so its Cross-Cutting Rules govern every finding — state this explicitly in each spawn prompt.cat > file or heredoc redirections.Agents MUST output findings as a JSON file containing an array of finding_section objects.
Each agent writes its output to the specified file path as valid JSON:
[
{
"title": "Section Title",
"category": "security|project|code_quality|dependencies|documentation|call_tree",
"findings": [
{
"id": "PREFIX-001",
"risk": 0.6,
"impact": 0.7,
"scope": 1.0,
"title": "Short finding title",
"tags": ["A03 Injection", "CWE-79"],
"location": "src/auth.rs:42-56",
"description": "What the issue is and why it matters",
"impact_description": "What could go wrong (Markdown narrative)",
"recommendation": "How to fix it",
"code_snippets": [
{"language": "rust", "caption": "auth.rs:42", "content": "let user = unwrap_token(&hdr);"}
]
}
],
"positives": "Optional positive observations"
}
]
Required finding fields: id, risk / impact / scope (floats 0.0–1.0), title, location, description, recommendation. See claudius:severity for the OWASP-normalized recipes that produce the three float dimensions and the band table that the coordinator uses to derive the integer severity. Rate scope as real blast radius per claudius:severity — never default it to 1.0. The float trio is the single source of truth; never hand-type a severity label — the pipeline derives it.
Optional: tags, impact_description (Markdown impact narrative; the numeric impact float is separate), code_snippets (emit only when you captured the exact source during analysis — never invent one).
Producers must NOT emit (downstream-owned): overall_severity, location_permalink, metadata.repository, ai_assessment, ai_verdict, ai_verdict_confidence, and the derived integer severity when emitting floats. risk/impact/scope are required — without all three, the coordinator cannot derive overall_severity and the schema rejects the finding. The validate-findings skill is the only documented path to populate floats post-hoc.
Metadata: emit metadata.commit as the full 40-character SHA (git rev-parse @{u}, falling back to git rev-parse HEAD when the branch has no upstream — use the pushed commit so permalinks resolve on GitHub; not --short); omit when not in a git repo. The coordinator derives metadata.repository from git remote get-url origin — producers do not emit it.
ID prefixes: SEC- security, PROJ- project, RUST-/PY-/GO-/FE- language, DOC- docs, CALL- call-tree.
Agents assign provisional sequential IDs within their prefix (e.g., SEC-001, SEC-002).
IDs may collide across parallel agents — the consolidation step (5c) deduplicates and reassigns
final IDs.
Location MUST include full file path (e.g., src/auth.rs:42-56), never bare line numbers.
Severity levels: CRITICAL > HIGH > MEDIUM > LOW > INFO (see severity skill).
Tags: classification references — OWASP (A01–A10), CWE, language best-practice IDs, etc.
Tag ALL security findings with OWASP categories. Non-security findings may omit tags.
When the diff modifies or removes any function/method declaration, every code-quality reviewer agent MUST run a deep transitive in-repo caller walk before emitting findings. Methodology lives in references/call-tree-walk.md — read it once per review and follow the steps.
Finding shape: category: "call_tree", ID prefix CALL- (coordinator-assigned). The producer emits a provisional CALL-NNN. Every call_tree finding's description MUST start with a Walked via: <tool> line so the reader can judge walk depth and tool quality.
Skip the walk for pure additions, doc-only PRs, and changes confined to test files.
After each agent emits findings, run the dumb ephemeral-ID lint against the diff:
git diff $BASE_BRANCH...HEAD | python3 ${CLAUDE_SKILL_DIR}/../../scripts/lint_ephemeral_ids.py --diff
For each hit, judge whether the surrounding context is a genuine violation or a quoted/escaped example (e.g. a code fence inside a skill file demonstrating the rule, a test fixture asserting the rule, or this lint's own docstring). Dismiss in-skill examples; promote genuine violations to code_quality findings with tags: ["ephemeral-id-reference"] and ID prefix CODE- (coordinator-assigned). The lint always exits 0 — judgement is yours.
This skill runs inline (not forked) specifically so it can spawn reviewer agents. For any
non-trivial review, confirm the Agent tool is available before fanning out. If it is not
(e.g. the skill is somehow executing inside a subagent, which cannot spawn nested agents), STOP
and report that the review cannot fan out — do NOT silently fall back to a single self-run
review. The single-agent TRIVIAL path in §1/§2 is the only legitimate one-agent review; every
non-trivial review REQUIRES fan-out.
Spawn all agents in parallel following the general spawning guidelines. Use model: "opus" for
thorough analysis by default. If the user requested a specific model for this review (e.g.
"review with Fable"), pass that model to every Agent spawn instead of opus.
Example spawn pattern:
Agent(subagent_type="claudius:security-engineer-smythe", model="opus", prompt="...", name="security-auditor")
Agent(subagent_type="claudius:project-reviewer-adams", model="opus", prompt="...", name="project-reviewer")
Agent(subagent_type="claudius:developer-bilby", model="opus", prompt="...", name="rust-reviewer")
After all agents complete, use the two-phase consolidation script. This automates the mechanical work (flattening, duplicate detection, ID assignment, statistics) and leaves judgment calls (dedup merging, severity re-assessment, executive summary) to you.
Run the consolidation script to flatten all agent reports, detect duplicate candidates, and scan for INTENTIONAL comments:
python3 ${CLAUDE_SKILL_DIR}/../../scripts/consolidate_reports.py prepare \
security-engineer:${TMPDIR:-/tmp}/security-findings.json \
project-reviewer:${TMPDIR:-/tmp}/project-findings.json \
developer-bilby:${TMPDIR:-/tmp}/rust-findings.json \
--repo-root $(git rev-parse --show-toplevel) \
--output ${TMPDIR:-/tmp}/intermediate.json \
--metadata '{"project":"...","date":"...","branch":"...","commit":"..."}'
This produces intermediate.json containing: flattened raw_findings (with agent attribution),
duplicate_groups (candidate clusters with overlap reasons), intentional_downgrades (findings
near INTENTIONAL comments), and section_positives.
Read intermediate.json and make these decisions:
duplicate_groups entry, decide whether to merge (keep the
most detailed description, union tags) or keep separate. Remove redundant findings.intentional_downgrades entry, downgrade the finding's
severity to INFO. These represent deliberate engineering decisions from previous triage.severity skill (/severity), then re-assess every
finding's severity using its criteria. Agents often over-inflate — apply the definitions strictly.overall_assessment, summary_text, verdict_text, verdict_action.Write the result as merged-findings.json with this structure:
{
"metadata": { "project": "...", "date": "...", ... },
"executive_summary": { "overall_assessment": "...", ... },
"findings": [ { "title": "...", "category": "...", "findings": [...], "positives": "..." } ],
"agent_stats": [ { "agent": "...", "unique": N, "redundant": N } ],
"top_findings_override": null,
"remediation_override": null
}
Findings do NOT need id fields — the script assigns them in phase 2. Set top_findings_override
or remediation_override to a JSON array to override auto-generation, or null to auto-generate.
Run the script to assign IDs, compute statistics, and produce a schema-valid report:
python3 ${CLAUDE_SKILL_DIR}/../../scripts/consolidate_reports.py assemble \
--input ${TMPDIR:-/tmp}/merged-findings.json \
--output ${REPORT_DIR:-.}/report.json
The script assigns sequential IDs by category (SEC-001, PROJ-001, RUST-001, etc.), computes
summary_statistics (severity counts, category matrix, redundancy ratio), generates
top_findings from CRITICAL/HIGH items, and creates remediation priority buckets. It
validates against the schema and REFUSES to write output if validation fails (exits with
code 1). Validation is mandatory and blocks output — jsonschema is a hard requirement.
The assemble step already validates and blocks output on failure, but you can re-validate manually (e.g., after hand-editing the report):
python3 ${CLAUDE_SKILL_DIR}/../../scripts/validate_report.py report.json
If validation fails, fix the merged-findings.json and re-run assemble. Do NOT skip validation.
After validation, generate a human-readable markdown version:
python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format md
This produces report.md next to the JSON file.
If initial review reveals areas needing deeper investigation:
If the user requests HTML or PDF versions, invoke the renderer directly:
python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format html
python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format pdf
For interactive triage, use the claudius:triage-findings skill with the ${REPORT_DIR:-.}/report.json path.
See git-and-github skill § Context Management for the subagent delegation pattern. CI logs via get_job_logs are a prime example — always delegate to a subagent that fetches the log and extracts relevant failure information.
See the general anti-patterns in the Claudius agent prompt. Additional review-specific pitfalls:
.unwrap() panics).
Always consolidate and deduplicate before presenting findings.testing
Coordinator-only LLM validation pass. Adds ai_assessment / ai_verdict / ai_verdict_confidence and, in the rare partial-producer case, re-estimates absent risk/impact/scope on a consolidated v3 report.
testing
Use for typos or single-line fixes (≤20 lines). Same mandatory phase order (Planning→Impl→QA→LL), minimal ceremony. Auto-retry on failure.
testing
Use for bug fixes or small changes (≤200 lines). Same phase order as workflow-feature (Planning→Impl→QA→LL) with lighter ceremony. Auto-retry on failure, unattended.
development
Use for new projects, features, or major refactoring. Phases: Planning (Req→UX→Test Spec→Dev Plan) → Implementation → QA → Lessons Learned. Auto-retry on failure, unattended.