Code Review Methodology

Systematic code review using parallel specialist agents. Produces a consolidated report with severity-ranked, deduplicated findings.

Tone

Keep the Claudius/Skippy persona — sarcastic superiority, theatrical sighs, dry wit. Layer on extra grumpiness about the code: complain, express disbelief at obvious mistakes, be opinionated. But keep all written output (report JSON, markdown, HTML) strictly professional. The grumpiness is for the human; the report is for posterity.

Argument: $ARGUMENTS — optional scope description (e.g., "feat/zk branch", "packages/auth/", "last 5 commits"). If empty, review all changes on the current branch vs the main branch.

1. Scope the Review

Determine what to review:

# If reviewing a branch
BASE_BRANCH=<main-branch>
git log $BASE_BRANCH..HEAD --oneline
git diff $BASE_BRANCH...HEAD --stat

# If reviewing specific paths
git diff $BASE_BRANCH...HEAD -- <paths>

Assess scale:

Trivial (< 200 lines, < 5 files, single language): 1 agent — single developer-bilby prompted with security-best-practices and coding-best-practices skills. Skip consolidation pipeline; agent writes report directly.
Small (< 500 lines, < 10 files): 2 agents
Medium (500-5000 lines, 10-50 files): 3-4 agents
Large (5000+ lines, 50+ files): 5+ agents, split by file groups

2. Select Agent Mix

Choose agents based on what the code does. Not every review needs every agent type.

Trivial reviews (single agent)

For trivial reviews (< 200 lines, < 5 files, single language), skip the multi-agent pipeline. Spawn a single developer-bilby and instruct it to also apply security-best-practices and coding-best-practices checklists. The agent writes the report JSON directly — no consolidation needed.

Core agents (always include)

| Agent (subagent_type) | Focus | |---|---| | claudius:project-reviewer-adams | Cross-artifact consistency, convention adherence, doc accuracy, specialist orchestration | | claudius:security-engineer-smythe | OWASP Top 10, injection, concurrency, panics, DoS, known vulns |

Language specialists (add per language in scope)

These agents handle code quality reviews — readability, idioms, error handling, duplication, performance. Always include the relevant language specialist; the project-reviewer does NOT cover language-specific code quality.

| Condition | Agent (subagent_type) | Focus | |---|---|---| | Rust code | claudius:developer-bilby | Code quality, idioms, ownership, error handling, clippy compliance | | Go code | claudius:developer-bilby | Code quality, idioms, error wrapping, concurrency, table-driven tests | | Python code | claudius:developer-bilby | Code quality, PEP 8, type hints, async patterns, pytest | | Frontend code | claudius:developer-bilby | Code quality, TS/JS patterns, React/Vue, CSS, accessibility |

Other conditional agents

| Condition | Agent (subagent_type) | Focus | |---|---|---| | Documentation changes | claudius:technical-writer-trillian | Accuracy, completeness, API docs, changelog |

For crypto-heavy code or significant dependency changes, expand the single security-engineer's prompt scope to include crypto soundness and dependency audit — do NOT spawn a second instance.

Scaling for large codebases

For large reviews (50+ files, 5000+ lines), spawn multiple agents of the same type with different file scopes.

3. Craft Agent Prompts

Follow the general agent prompt requirements. In addition, every review agent prompt MUST include these review-specific elements:

Comparison base: How to see what changed (git show <base>:<file> or git diff)
Finding format: Use the severity levels and structure defined below
Review checklists: Embed relevant checklist content or rely on the agent's preloaded skills
BP preload: every spawned reviewer agent (security-engineer-smythe, project-reviewer-adams, developer-bilby, technical-writer-trillian, etc.) MUST preload coding-best-practices so its Cross-Cutting Rules govern every finding — state this explicitly in each spawn prompt.
UX/DX lens: instruct agents to assess how findings affect end-user workflows and developer experience, not just code correctness
CI context: When MemCan/WebSearch are unavailable (e.g., CI), instruct agents: "Do not use memcan tools or WebSearch/WebFetch."
File output: Instruct agents to use the Write tool for creating files — never cat > file or heredoc redirections.

Finding format (JSON)

Agents MUST output findings as a JSON file containing an array of finding_section objects. Each agent writes its output to the specified file path as valid JSON:

[
  {
    "title": "Section Title",
    "category": "security|project|code_quality|dependencies|documentation|call_tree",
    "findings": [
      {
        "id": "PREFIX-001",
        "risk": 0.6,
        "impact": 0.7,
        "scope": 1.0,
        "title": "Short finding title",
        "tags": ["A03 Injection", "CWE-79"],
        "location": "src/auth.rs:42-56",
        "description": "What the issue is and why it matters",
        "impact_description": "What could go wrong (Markdown narrative)",
        "recommendation": "How to fix it",
        "code_snippets": [
          {"language": "rust", "caption": "auth.rs:42", "content": "let user = unwrap_token(&hdr);"}
        ]
      }
    ],
    "positives": "Optional positive observations"
  }
]

Required finding fields: id, risk / impact / scope (floats 0.0–1.0), title, location, description, recommendation. See claudius:severity for the OWASP-normalized recipes that produce the three float dimensions and the band table that the coordinator uses to derive the integer severity. Rate scope as real blast radius per claudius:severity — never default it to 1.0. The float trio is the single source of truth; never hand-type a severity label — the pipeline derives it.

Optional: tags, impact_description (Markdown impact narrative; the numeric impact float is separate), code_snippets (emit only when you captured the exact source during analysis — never invent one).

Producers must NOT emit (downstream-owned): overall_severity, location_permalink, metadata.repository, ai_assessment, ai_verdict, ai_verdict_confidence, and the derived integer severity when emitting floats. risk/impact/scope are required — without all three, the coordinator cannot derive overall_severity and the schema rejects the finding. The validate-findings skill is the only documented path to populate floats post-hoc.

Metadata: emit metadata.commit as the full 40-character SHA (git rev-parse @{u}, falling back to git rev-parse HEAD when the branch has no upstream — use the pushed commit so permalinks resolve on GitHub; not --short); omit when not in a git repo. The coordinator derives metadata.repository from git remote get-url origin — producers do not emit it.

ID prefixes: SEC- security, PROJ- project, RUST-/PY-/GO-/FE- language, DOC- docs, CALL- call-tree. Agents assign provisional sequential IDs within their prefix (e.g., SEC-001, SEC-002). IDs may collide across parallel agents — the consolidation step (5c) deduplicates and reassigns final IDs.

Location MUST include full file path (e.g., src/auth.rs:42-56), never bare line numbers.

Severity levels: CRITICAL > HIGH > MEDIUM > LOW > INFO (see severity skill).

Tags: classification references — OWASP (A01–A10), CWE, language best-practice IDs, etc. Tag ALL security findings with OWASP categories. Non-security findings may omit tags.

Call-tree inspection

When the diff modifies or removes any function/method declaration, every code-quality reviewer agent MUST run a deep transitive in-repo caller walk before emitting findings. Methodology lives in references/call-tree-walk.md — read it once per review and follow the steps.

Finding shape: category: "call_tree", ID prefix CALL- (coordinator-assigned). The producer emits a provisional CALL-NNN. Every call_tree finding's description MUST start with a Walked via: <tool> line so the reader can judge walk depth and tool quality.

Skip the walk for pure additions, doc-only PRs, and changes confined to test files.

Ephemeral-ID lint

After each agent emits findings, run the dumb ephemeral-ID lint against the diff:

git diff $BASE_BRANCH...HEAD | python3 ${CLAUDE_SKILL_DIR}/../../scripts/lint_ephemeral_ids.py --diff

For each hit, judge whether the surrounding context is a genuine violation or a quoted/escaped example (e.g. a code fence inside a skill file demonstrating the rule, a test fixture asserting the rule, or this lint's own docstring). Dismiss in-skill examples; promote genuine violations to code_quality findings with tags: ["ephemeral-id-reference"] and ID prefix CODE- (coordinator-assigned). The lint always exits 0 — judgement is yours.

4. Spawn Agents

This skill runs inline (not forked) specifically so it can spawn reviewer agents. For any non-trivial review, confirm the Agent tool is available before fanning out. If it is not (e.g. the skill is somehow executing inside a subagent, which cannot spawn nested agents), STOP and report that the review cannot fan out — do NOT silently fall back to a single self-run review. The single-agent TRIVIAL path in §1/§2 is the only legitimate one-agent review; every non-trivial review REQUIRES fan-out.

Spawn all agents in parallel following the general spawning guidelines. Use model: "opus" for thorough analysis by default. If the user requested a specific model for this review (e.g. "review with Fable"), pass that model to every Agent spawn instead of opus.

Example spawn pattern:

Agent(subagent_type="claudius:security-engineer-smythe", model="opus", prompt="...", name="security-auditor")
Agent(subagent_type="claudius:project-reviewer-adams", model="opus", prompt="...", name="project-reviewer")
Agent(subagent_type="claudius:developer-bilby", model="opus", prompt="...", name="rust-reviewer")

5. Consolidate Findings

After all agents complete, use the two-phase consolidation script. This automates the mechanical work (flattening, duplicate detection, ID assignment, statistics) and leaves judgment calls (dedup merging, severity re-assessment, executive summary) to you.

5a. Phase 1 — Prepare

Run the consolidation script to flatten all agent reports, detect duplicate candidates, and scan for INTENTIONAL comments:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/consolidate_reports.py prepare \
    security-engineer:${TMPDIR:-/tmp}/security-findings.json \
    project-reviewer:${TMPDIR:-/tmp}/project-findings.json \
    developer-bilby:${TMPDIR:-/tmp}/rust-findings.json \
    --repo-root $(git rev-parse --show-toplevel) \
    --output ${TMPDIR:-/tmp}/intermediate.json \
    --metadata '{"project":"...","date":"...","branch":"...","commit":"..."}'

This produces intermediate.json containing: flattened raw_findings (with agent attribution), duplicate_groups (candidate clusters with overlap reasons), intentional_downgrades (findings near INTENTIONAL comments), and section_positives.

5b. Review and merge (LLM judgment)

Read intermediate.json and make these decisions:

Duplicate resolution: For each duplicate_groups entry, decide whether to merge (keep the most detailed description, union tags) or keep separate. Remove redundant findings.
INTENTIONAL downgrade: For each intentional_downgrades entry, downgrade the finding's severity to INFO. These represent deliberate engineering decisions from previous triage.
Severity re-evaluation: Load the severity skill (/severity), then re-assess every finding's severity using its criteria. Agents often over-inflate — apply the definitions strictly.
Merge sections: Combine agent sections with the same category into unified sections.
Executive summary: Write overall_assessment, summary_text, verdict_text, verdict_action.
Agent stats: Record per-agent unique vs redundant counts.

Write the result as merged-findings.json with this structure:

{
  "metadata": { "project": "...", "date": "...", ... },
  "executive_summary": { "overall_assessment": "...", ... },
  "findings": [ { "title": "...", "category": "...", "findings": [...], "positives": "..." } ],
  "agent_stats": [ { "agent": "...", "unique": N, "redundant": N } ],
  "top_findings_override": null,
  "remediation_override": null
}

Findings do NOT need id fields — the script assigns them in phase 2. Set top_findings_override or remediation_override to a JSON array to override auto-generation, or null to auto-generate.

5c. Phase 2 — Assemble

Run the script to assign IDs, compute statistics, and produce a schema-valid report:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/consolidate_reports.py assemble \
    --input ${TMPDIR:-/tmp}/merged-findings.json \
    --output ${REPORT_DIR:-.}/report.json

The script assigns sequential IDs by category (SEC-001, PROJ-001, RUST-001, etc.), computes summary_statistics (severity counts, category matrix, redundancy ratio), generates top_findings from CRITICAL/HIGH items, and creates remediation priority buckets. It validates against the schema and REFUSES to write output if validation fails (exits with code 1). Validation is mandatory and blocks output — jsonschema is a hard requirement.

5d. Validate report against schema

The assemble step already validates and blocks output on failure, but you can re-validate manually (e.g., after hand-editing the report):

python3 ${CLAUDE_SKILL_DIR}/../../scripts/validate_report.py report.json

If validation fails, fix the merged-findings.json and re-run assemble. Do NOT skip validation.

5e. Render markdown report

After validation, generate a human-readable markdown version:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format md

This produces report.md next to the JSON file.

6. Iterate if Needed

If initial review reveals areas needing deeper investigation:

Spawn additional agents with narrower scope
Re-review specific files with different checklists
Audit forked dependencies against upstream

7. Additional Report Formats (Optional)

If the user requests HTML or PDF versions, invoke the renderer directly:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format html
python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format pdf

For interactive triage, use the claudius:triage-findings skill with the ${REPORT_DIR:-.}/report.json path.

CI Log Retrieval

See git-and-github skill § Context Management for the subagent delegation pattern. CI logs via get_job_logs are a prime example — always delegate to a subagent that fetches the log and extracts relevant failure information.

Anti-Patterns (Review-Specific)

See the general anti-patterns in the Claudius agent prompt. Additional review-specific pitfalls:

Skipping scope assessment: Always assess scale first. The agent mix and split strategy depend on whether the review is small, medium, or large.
Missing comparison base: Review agents need to know what changed. Always include the git diff or git show commands in the prompt.
No deduplication: Multiple agents will flag the same issue (e.g., .unwrap() panics). Always consolidate and deduplicate before presenting findings.

Code Review Methodology

Systematic code review using parallel specialist agents. Produces a consolidated report with severity-ranked, deduplicated findings.

Tone

Argument: $ARGUMENTS — optional scope description (e.g., "feat/zk branch", "packages/auth/", "last 5 commits"). If empty, review all changes on the current branch vs the main branch.

1. Scope the Review

Determine what to review:

# If reviewing a branch
BASE_BRANCH=<main-branch>
git log $BASE_BRANCH..HEAD --oneline
git diff $BASE_BRANCH...HEAD --stat

# If reviewing specific paths
git diff $BASE_BRANCH...HEAD -- <paths>

Assess scale:

Trivial (< 200 lines, < 5 files, single language): 1 agent — single developer-bilby prompted with security-best-practices and coding-best-practices skills. Skip consolidation pipeline; agent writes report directly.
Small (< 500 lines, < 10 files): 2 agents
Medium (500-5000 lines, 10-50 files): 3-4 agents
Large (5000+ lines, 50+ files): 5+ agents, split by file groups

2. Select Agent Mix

Choose agents based on what the code does. Not every review needs every agent type.

Trivial reviews (single agent)

Core agents (always include)

Language specialists (add per language in scope)

Other conditional agents

| Condition | Agent (subagent_type) | Focus | |---|---|---| | Documentation changes | claudius:technical-writer-trillian | Accuracy, completeness, API docs, changelog |

For crypto-heavy code or significant dependency changes, expand the single security-engineer's prompt scope to include crypto soundness and dependency audit — do NOT spawn a second instance.

Scaling for large codebases

For large reviews (50+ files, 5000+ lines), spawn multiple agents of the same type with different file scopes.

3. Craft Agent Prompts

Follow the general agent prompt requirements. In addition, every review agent prompt MUST include these review-specific elements:

Comparison base: How to see what changed (git show <base>:<file> or git diff)
Finding format: Use the severity levels and structure defined below
Review checklists: Embed relevant checklist content or rely on the agent's preloaded skills
BP preload: every spawned reviewer agent (security-engineer-smythe, project-reviewer-adams, developer-bilby, technical-writer-trillian, etc.) MUST preload coding-best-practices so its Cross-Cutting Rules govern every finding — state this explicitly in each spawn prompt.
UX/DX lens: instruct agents to assess how findings affect end-user workflows and developer experience, not just code correctness
CI context: When MemCan/WebSearch are unavailable (e.g., CI), instruct agents: "Do not use memcan tools or WebSearch/WebFetch."
File output: Instruct agents to use the Write tool for creating files — never cat > file or heredoc redirections.

Finding format (JSON)

Agents MUST output findings as a JSON file containing an array of finding_section objects. Each agent writes its output to the specified file path as valid JSON:

[
  {
    "title": "Section Title",
    "category": "security|project|code_quality|dependencies|documentation|call_tree",
    "findings": [
      {
        "id": "PREFIX-001",
        "risk": 0.6,
        "impact": 0.7,
        "scope": 1.0,
        "title": "Short finding title",
        "tags": ["A03 Injection", "CWE-79"],
        "location": "src/auth.rs:42-56",
        "description": "What the issue is and why it matters",
        "impact_description": "What could go wrong (Markdown narrative)",
        "recommendation": "How to fix it",
        "code_snippets": [
          {"language": "rust", "caption": "auth.rs:42", "content": "let user = unwrap_token(&hdr);"}
        ]
      }
    ],
    "positives": "Optional positive observations"
  }
]

Location MUST include full file path (e.g., src/auth.rs:42-56), never bare line numbers.

Severity levels: CRITICAL > HIGH > MEDIUM > LOW > INFO (see severity skill).

Tags: classification references — OWASP (A01–A10), CWE, language best-practice IDs, etc. Tag ALL security findings with OWASP categories. Non-security findings may omit tags.

Call-tree inspection

Skip the walk for pure additions, doc-only PRs, and changes confined to test files.

Ephemeral-ID lint

After each agent emits findings, run the dumb ephemeral-ID lint against the diff:

git diff $BASE_BRANCH...HEAD | python3 ${CLAUDE_SKILL_DIR}/../../scripts/lint_ephemeral_ids.py --diff

4. Spawn Agents

Example spawn pattern:

Agent(subagent_type="claudius:security-engineer-smythe", model="opus", prompt="...", name="security-auditor")
Agent(subagent_type="claudius:project-reviewer-adams", model="opus", prompt="...", name="project-reviewer")
Agent(subagent_type="claudius:developer-bilby", model="opus", prompt="...", name="rust-reviewer")

5. Consolidate Findings

5a. Phase 1 — Prepare

Run the consolidation script to flatten all agent reports, detect duplicate candidates, and scan for INTENTIONAL comments:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/consolidate_reports.py prepare \
    security-engineer:${TMPDIR:-/tmp}/security-findings.json \
    project-reviewer:${TMPDIR:-/tmp}/project-findings.json \
    developer-bilby:${TMPDIR:-/tmp}/rust-findings.json \
    --repo-root $(git rev-parse --show-toplevel) \
    --output ${TMPDIR:-/tmp}/intermediate.json \
    --metadata '{"project":"...","date":"...","branch":"...","commit":"..."}'

5b. Review and merge (LLM judgment)

Read intermediate.json and make these decisions:

Duplicate resolution: For each duplicate_groups entry, decide whether to merge (keep the most detailed description, union tags) or keep separate. Remove redundant findings.
INTENTIONAL downgrade: For each intentional_downgrades entry, downgrade the finding's severity to INFO. These represent deliberate engineering decisions from previous triage.
Severity re-evaluation: Load the severity skill (/severity), then re-assess every finding's severity using its criteria. Agents often over-inflate — apply the definitions strictly.
Merge sections: Combine agent sections with the same category into unified sections.
Executive summary: Write overall_assessment, summary_text, verdict_text, verdict_action.
Agent stats: Record per-agent unique vs redundant counts.

Write the result as merged-findings.json with this structure:

{
  "metadata": { "project": "...", "date": "...", ... },
  "executive_summary": { "overall_assessment": "...", ... },
  "findings": [ { "title": "...", "category": "...", "findings": [...], "positives": "..." } ],
  "agent_stats": [ { "agent": "...", "unique": N, "redundant": N } ],
  "top_findings_override": null,
  "remediation_override": null
}

Findings do NOT need id fields — the script assigns them in phase 2. Set top_findings_override or remediation_override to a JSON array to override auto-generation, or null to auto-generate.

5c. Phase 2 — Assemble

Run the script to assign IDs, compute statistics, and produce a schema-valid report:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/consolidate_reports.py assemble \
    --input ${TMPDIR:-/tmp}/merged-findings.json \
    --output ${REPORT_DIR:-.}/report.json

5d. Validate report against schema

The assemble step already validates and blocks output on failure, but you can re-validate manually (e.g., after hand-editing the report):

python3 ${CLAUDE_SKILL_DIR}/../../scripts/validate_report.py report.json

If validation fails, fix the merged-findings.json and re-run assemble. Do NOT skip validation.

5e. Render markdown report

After validation, generate a human-readable markdown version:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format md

This produces report.md next to the JSON file.

6. Iterate if Needed

If initial review reveals areas needing deeper investigation:

Spawn additional agents with narrower scope
Re-review specific files with different checklists
Audit forked dependencies against upstream

7. Additional Report Formats (Optional)

If the user requests HTML or PDF versions, invoke the renderer directly:

python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format html
python3 ${CLAUDE_SKILL_DIR}/../../scripts/generate_review_report.py ${REPORT_DIR:-.}/report.json --format pdf

For interactive triage, use the claudius:triage-findings skill with the ${REPORT_DIR:-.}/report.json path.

CI Log Retrieval

Anti-Patterns (Review-Specific)

See the general anti-patterns in the Claudius agent prompt. Additional review-specific pitfalls:

Skipping scope assessment: Always assess scale first. The agent mix and split strategy depend on whether the review is small, medium, or large.
Missing comparison base: Review agents need to know what changed. Always include the git diff or git show commands in the prompt.
No deduplication: Multiple agents will flag the same issue (e.g., .unwrap() panics). Always consolidate and deduplicate before presenting findings.

Adoption

lklimek/grumpy-review

$ install --global

Security Scan Results

SKILL.md

Code Review Methodology

Tone

1. Scope the Review

2. Select Agent Mix

Trivial reviews (single agent)

Core agents (always include)

Language specialists (add per language in scope)

Other conditional agents

Scaling for large codebases

3. Craft Agent Prompts

Finding format (JSON)

Call-tree inspection

Ephemeral-ID lint

4. Spawn Agents

5. Consolidate Findings

5a. Phase 1 — Prepare

5b. Review and merge (LLM judgment)

5c. Phase 2 — Assemble

5d. Validate report against schema

5e. Render markdown report

6. Iterate if Needed

7. Additional Report Formats (Optional)

CI Log Retrieval

Anti-Patterns (Review-Specific)

Related Skills

lklimek/validate-findings

lklimek/workflow-trivial

lklimek/workflow-simplified

lklimek/workflow-feature

lklimek/grumpy-review

$ install --global

Security Scan Results

SKILL.md

Code Review Methodology

Tone

1. Scope the Review

2. Select Agent Mix

Trivial reviews (single agent)

Core agents (always include)

Language specialists (add per language in scope)

Other conditional agents

Scaling for large codebases

3. Craft Agent Prompts

Finding format (JSON)

Call-tree inspection

Ephemeral-ID lint

4. Spawn Agents

5. Consolidate Findings

5a. Phase 1 — Prepare

5b. Review and merge (LLM judgment)

5c. Phase 2 — Assemble

5d. Validate report against schema

5e. Render markdown report

6. Iterate if Needed

7. Additional Report Formats (Optional)

CI Log Retrieval

Anti-Patterns (Review-Specific)

Related Skills

lklimek/validate-findings

lklimek/workflow-trivial

lklimek/workflow-simplified

lklimek/workflow-feature