plugins/agent-teams/skills/multi-reviewer-patterns/SKILL.md
Coordinate parallel code reviews across multiple quality dimensions with finding deduplication, severity calibration, and consolidated reporting. Use this skill when organizing multi-reviewer code reviews, calibrating finding severity, or consolidating review results.
npx skillsauth add acaprino/alfio-claude-plugins multi-reviewer-patternsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Patterns for coordinating parallel code reviews across multiple quality dimensions, deduplicating findings, calibrating severity, and producing consolidated reports.
| Dimension | Focus | When to Include | | ----------------- | --------------------------------------- | ------------------------------------------- | | Security | Vulnerabilities, auth, input validation | Always for code handling user input or auth | | Performance | Query efficiency, memory, caching | When changing data access or hot paths | | Architecture | SOLID, coupling, patterns | For structural changes or new modules | | Testing | Coverage, quality, edge cases | When adding new functionality | | Accessibility | WCAG, ARIA, keyboard nav | For UI/frontend changes |
| Scenario | Dimensions | | ---------------------- | -------------------------------------------- | | API endpoint changes | Security, Performance, Architecture | | Frontend component | Architecture, Testing, Accessibility | | Database migration | Performance, Architecture | | Authentication changes | Security, Testing | | Full feature review | Security, Performance, Architecture, Testing |
When /team-review runs in pipeline mode (no --skip-interconnect), reviewers do not receive raw code only -- they receive two context artifacts produced in Phase 1:
.deep-dive/ (from deep-dive-analysis plugin): 01-structure.md, 02-interfaces.md, 05-risks.md, and optionally 03-flows.md, 04-semantics.md, 06-documentation.md, 07-final-report.md..team-review/02-interconnect.md (from senior-review:semantic-interconnect-mapper): contracts (formal / structural / implicit), invariants, domain rules, assumptions (verified / documented / unverified), integration hot-spots, change impact radius.Without shared context, each reviewer re-reads the code from scratch. This is wasteful and, more importantly, blinds them to bugs that only manifest across components -- broken implicit contracts, invariant drift, bypass paths, non-idempotent retries, terminal state mutations. Phase 1 surfaces those concerns in the interconnect map, and Phase 2 reviewers use the map as a checklist.
Reviewers should not read the entire context file. They should Grep or read only the anchors relevant to their dimension, guided by the ## Reviewer Hints section at the bottom of .team-review/02-interconnect.md.
Default anchor routing:
| Reviewer dimension | Primary anchors in interconnect map |
|--------------------|-------------------------------------|
| security | ## Integration Hot-Spots (inbound), ## Assumptions (unverified), ## Contracts (implicit, input validation) |
| architecture (code-auditor) | ## Invariants, ## Contracts (structural + implicit), ## Call Graph |
| logic-integrity | ## Contracts (implicit, unverified), ## Invariants, ## Assumptions (unverified), ## Domain Rules |
| distributed-flows | ## Integration Hot-Spots (HTTP / queue / IPC), ## Call Graph (cross-service) |
| chicken-egg | ## Assumptions (initialization order), ## Integration Hot-Spots (Env / config), ## Invariants (cross-component) |
| ui-races | ## Invariants (temporal), ## Integration Hot-Spots (UI state) |
| api-contracts (future) | ## Contracts (formal) |
You are reviewing for the {dimension} dimension.
## Target
[...]
## Diff
[...]
## Context files
- Deep-dive output: .deep-dive/
- Interconnect map: .team-review/02-interconnect.md
Per `## Reviewer Hints` in the interconnect map, focus your reading on these anchors:
{anchors-for-this-dimension}
## Instructions
Follow your agent definition's phases and output format. Cite file:line for every finding.
Every finding that relates to a contract/invariant/assumption in the interconnect map should
also cite the map anchor that surfaced the concern.
Write your output to .team-review/findings-{dimension}.md.
A useful quality signal at the end of a review: what fraction of findings cite an interconnect-map anchor?
The logic-integrity-auditor dimension should be at >= 70% (its findings are almost entirely driven by the map).
--skip-interconnect modeWhen the pipeline is skipped, reviewers receive only target + diff. In this mode:
logic-integrity-auditor is not spawned (no map to drive it)..deep-dive/ or .team-review/02-interconnect.md references should appear in reviewer prompts.Every reviewer agent that runs as part of /team-review Phase 2 carries a ## Pipeline Conventions section in its system prompt with four cross-cutting rules. The orchestrator does not need to repeat them in the spawn prompt, but should be aware of what they mandate:
## Cross-Reviewer Notes section. Phase 3 consolidation must scan for this section and route the observations to the appropriate reviewer (or surface them in the consolidated report under the recipient dimension).Reference: docs/references/agent-teams-best-practices.md § Pipeline Conventions.
When multiple reviewers report issues at the same location:
For each finding in all reviewer reports:
1. Check if another finding references the same file:line
2. If yes, check if they describe the same issue
3. If same issue: merge, keeping the more detailed description
4. If different issue: keep both, tag as "co-located"
5. Use highest severity among merged findings
| Severity | Impact | Likelihood | Examples | | ------------ | --------------------------------------------- | ---------------------- | -------------------------------------------- | | Critical | Data loss, security breach, complete failure | Certain or very likely | SQL injection, auth bypass, data corruption | | High | Significant functionality impact, degradation | Likely | Memory leak, missing validation, broken flow | | Medium | Partial impact, workaround exists | Possible | N+1 query, missing edge case, unclear error | | Low | Minimal impact, cosmetic | Unlikely | Style issue, minor optimization, naming |
## Code Review Report
**Target**: {files/PR/directory}
**Reviewers**: {dimension-1}, {dimension-2}, {dimension-3}
**Date**: {date}
**Files Reviewed**: {count}
### Critical Findings ({count})
#### [CR-001] {Title}
**Location**: `{file}:{line}`
**Dimension**: {Security/Performance/etc.}
**Description**: {what was found}
**Impact**: {what could happen}
**Fix**: {recommended remediation}
### High Findings ({count})
...
### Medium Findings ({count})
...
### Low Findings ({count})
...
### Summary
| Dimension | Critical | High | Medium | Low | Total |
| ------------ | -------- | ----- | ------ | ----- | ------ |
| Security | 1 | 2 | 3 | 0 | 6 |
| Performance | 0 | 1 | 4 | 2 | 7 |
| Architecture | 0 | 0 | 2 | 3 | 5 |
| **Total** | **1** | **3** | **9** | **5** | **18** |
### Recommendation
{Overall assessment and prioritized action items}
After findings are consolidated (deduplicated), each selected finding is judged by a panel of 3 verifiers run in parallel, each with a distinct lens. This replaces single-judge validation: three independent mandates catch more failure modes than three identical refuters.
This section is the source of truth. /agent-teams:team-review (Phase 4b) and /senior-review:code-review (Step 4b) both drive the panel from here.
Spawn one Agent per lens per finding. Use subagent_type: general-purpose. Use model: opus for lenses 1 and 2 (reasoning-heavy), model: sonnet for lens 3 (calibration). Run all three (and across findings) in parallel via run_in_background: true.
Lens 1 prompt (Reachability / Correctness):
You are verifier LENS 1 of 3 (Reachability / Correctness) for one code-review finding.
Your job: determine whether the described defect REALLY exists and is reachable.
## The Finding
[severity, file:line, description, suggested fix]
## The Diff
[diff for the relevant file]
## Full File Content
[full content of the file containing the finding]
## Instructions
1. Locate the exact file:line. Is the citation correct?
2. Trace the control/data flow: is the buggy path actually reachable in normal or error execution?
3. Does the code truly exhibit the described problem, or is the description a misread?
Return REAL only if you can point to the concrete lines and the path that triggers the defect.
Respond with EXACTLY:
- Verdict: REAL or FALSE_POSITIVE
- Confidence: 0-100
- Reason: 1-2 sentences citing file:line
Lens 2 prompt (False-Positive Causes):
You are verifier LENS 2 of 3 (False-Positive Causes) for one code-review finding.
Your job: actively try to REFUTE the finding. Default to FALSE_POSITIVE if uncertain.
## The Finding
[severity, file:line, description, suggested fix]
## The Diff
[diff for the relevant file]
## Full File Content
[full content of the file containing the finding]
## Instructions
Try to explain the flagged code away as one of:
1. Framework convention (Django/FastAPI/pytest/etc. idiom that is correct by design)
2. Intentional design choice consistent with surrounding code or CLAUDE.md
3. Pre-existing code not introduced or made newly relevant by the diff
4. A misunderstanding of the code's actual behavior or context
Return REAL only if the finding survives refutation on all four counts.
Respond with EXACTLY:
- Verdict: REAL or FALSE_POSITIVE
- Confidence: 0-100
- Reason: 1-2 sentences citing file:line; if FALSE_POSITIVE, name the refutation category
Lens 3 prompt (Severity Calibration):
You are verifier LENS 3 of 3 (Severity Calibration) for one code-review finding.
Assume the finding is REAL. Your only job is to vote the correct severity.
## The Finding
[severity, file:line, description, suggested fix]
## The Diff
[diff for the relevant file]
## Full File Content
[full content of the file containing the finding]
## Calibration criteria
- Critical: data loss, security breach, complete failure; certain or very likely
- High: significant functionality impact or degradation; likely
- Medium: partial impact, workaround exists; possible
- Low: minimal or cosmetic; unlikely
Respond with EXACTLY:
- Verdict: REAL
- Severity_vote: Critical or High or Medium or Low
- Confidence: 0-100
- Reason: 1-2 sentences citing file:line
Each verifier returns: verdict (REAL or FALSE_POSITIVE; lens 3 always REAL), confidence (0-100), severity_vote (lens 3 only), reason (with a file:line citation).
filtered (never silently dropped: the count appears in the report).contested. A flagged false positive is cheaper than a killed real bug.severity_vote when the finding is confirmed real; otherwise the original reviewer severity.If a verifier errors or returns a malformed verdict, treat it as an abstention. If fewer than 2 valid verdicts return for a finding, apply the tie rule (survives, contested). The panel never crashes the pipeline and never silently drops a finding.
>= 50% that survived deduplication, regardless of severity.--rigorous not set): narrow to stakes + uncertainty band, which is all Critical/High findings plus any Medium/Low in the 50-75% confidence band or with a severity that conflicted between reviewers. The remaining findings pass through unverified, tagged unverified (cost-guard). Declare the narrowing in the report.--rigorous: ignore the cap; verify everything above the floor.--fast: skip the entire gate (panel + critic).In the prose/Agent substrate there is no token-budget API. The guard triggers on the number of surviving findings (threshold 25), not on real token consumption. State this wherever the guard is documented so no false precision is implied.
After verification, one critic agent asks what the review failed to cover. It turns blind spots from passive side effects into active output and, when warranted, into one more round of work.
This section is the source of truth. /agent-teams:team-review (Phase 4c) and /senior-review:code-review (Step 4c) both drive the critic from here.
The critic reads: the verified findings, the review scope, the list of dimensions that ran, and whatever context exists (deep-dive output and the interconnect map for team-review; .deep-dive/ if present for code-review).
The critic evaluates coverage against a fixed taxonomy and writes a ## Coverage Gaps block:
05-risks.md or the map's Integration Hot-Spots) with zero findings.You are the completeness critic for a multi-dimensional code review. Your job is NOT
to find new bugs directly. It is to find what the review did not examine.
## Verified findings
[the consolidated, verified findings]
## Scope
[changed files / target]
## Dimensions that ran
[list]
## Context available
[deep-dive paths and interconnect map path, or "none"]
## Instructions
Produce a "## Coverage Gaps" list across these categories, each item actionable and specific:
1. Dimensions warranted by the scope but not run
2. In-scope files cited by no finding
3. Interconnect-map assumptions marked unverified that no finding addressed
4. High-risk hot-spots (05-risks.md / Integration Hot-Spots) with zero findings
Then, if and ONLY if one gap is a high-risk uncovered area, name the single most valuable
follow-up: which dimension/agent should review which files. Output it under
"## Recommended follow-up" with one entry, or "## Recommended follow-up: none".
If the critic names a high-risk uncovered area, spawn one targeted reviewer (the most specialized agent for that area) for a single round. Its findings re-enter deduplication and then the verification panel. One round only: the critic does not run again on the follow-up output.
Under the cost guard or budget pressure, the critic degrades to report-only: it emits the ## Coverage Gaps list with no follow-up spawn, and the report states that the follow-up was skipped. --fast skips the critic entirely.
development
Unified web frontend knowledge base covering CSS architecture, UX psychology, UI components, distinctive aesthetics, and interface design generation. TRIGGER WHEN: working on web styling, design systems, component decisions, responsive strategy, distinctive frontend aesthetics, or exploring multiple interface designs. DO NOT TRIGGER WHEN: the task is purely backend or unrelated to web frontend.
tools
Knowledge base for the codebase-mapper plugin. Provides writing guidelines, tone rules, and diagram conventions for generating human-readable project guides. Referenced by all codebase-mapper agents during document generation. TRIGGER WHEN: referenced by codebase-mapper pipeline agents (codebase-explorer, overview-writer, tech-writer, flow-writer, onboarding-writer, ops-writer, config-writer, guide-reviewer) during document generation. DO NOT TRIGGER WHEN: outside the /map-codebase pipeline (general documentation work should use docs:readme-craft or codebase-mapper:docs-create).
tools
Progressive Web App knowledge base for 2025-2026: Web App Manifest, Service Workers (Workbox 7, Serwist), Web Push (VAPID, RFC 8030/8291/8292, Declarative Push for Safari 18.4+), install flows (beforeinstallprompt, Window Controls Overlay), OPFS storage, Project Fugu, Core Web Vitals (INP < 200ms), security (HTTPS, CSP, COOP/COEP), and distribution (Bubblewrap, PWA Builder MSIX, Capacitor). TRIGGER WHEN: building, auditing, or debugging PWAs, including manifest, service worker, Web Push, install flow, OPFS, Background Sync, Wake Lock, vite-plugin-pwa, Next.js Serwist, @angular/pwa, @vite-pwa/nuxt, Bubblewrap, TWA, PWA Builder, or Capacitor wrapping. DO NOT TRIGGER WHEN: the task is generic frontend styling (use frontend), React performance (use react-development:review-react), cross-platform security unrelated to PWA (use platform-engineering), Tauri or Electron wrappers (use tauri-development), or GA4 / analytics (use digital-marketing).
development
Knowledge base for pure-architecture decisions on when to unify duplicated logic into a shared abstraction versus leave it duplicated. Covers the canonical theory (Rule of Three, DRY/WET/AHA, Wrong Abstraction, Locality of Behaviour, Bounded Contexts, Tidy First options framing, CUPID vs SOLID), 12 essential-duplication patterns that justify unification, 12 wrong-abstraction patterns that justify inlining or decomposition, an operational decision frame, and a verified reading list. TRIGGER WHEN: the user is making an architectural decision about whether to centralize, extract, or remove a layer; reviewing an abstraction for premature generality; auditing scattered cross-cutting concerns; spawned by the abstraction-architect agent during /abstraction-architect:audit; the user asks "should I extract this into a service" / "is this DRY enough" / "is this wrong abstraction". DO NOT TRIGGER WHEN: the task is code formatting and readability cleanup (use clean-code:clean-code), Python-specific refactoring with metrics (use python-development:python-refactor), generic dead-code removal (use senior-review:cleanup-dead-code), security review (use senior-review:security-auditor), or pure pattern-consistency review without an architecture lens (use senior-review:code-auditor).