plugins/essentials/skills/squad-review/SKILL.md
Dispatch six parallel reviewers (security, correctness, conventions, test coverage, architecture, project-alignment) across the current branch diff and merge their findings into one categorized markdown report. User-invocable only. A local sibling to the paid /ultrareview cloud command, not a replacement.
npx skillsauth add nicknisi/claude-plugins squad-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fans out six independent reviewers in parallel against the pending branch changes, then assembles one markdown report with one section per reviewer.
Inspired by the "just do what ultrareview does" pattern: dispatch → collect → categorize. Cheaper, local, and fully under your control.
git rev-parse --abbrev-ref HEADgit diff main...HEAD --stat 2>/dev/null || echo "no diff vs main"git log main..HEAD --oneline 2>/dev/nullgit diff HEAD --stat 2>/dev/nullgit ls-files --others --exclude-standard | head -50find . -maxdepth 4 -type f -name 'CLAUDE.md' -not -path '*/node_modules/*' -not -path '*/.git/*' 2>/dev/null | head -10 || truegh pr view --json title,body,comments 2>/dev/null || echo "no open PR"From the context above, three possible scopes may exist:
branch-diff → proceed
automatically. State the scope in one sentence, then dispatch.main with uncommitted work, or a feature
branch that also has pending scratch, or untracked-only, etc.)
→ ASK THE USER TO PICK. Do not dispatch until they choose.Present a numbered list showing only the options that apply, with file
counts and (for small sets) file names. Include Cancel as the last
option. Example templates:
On main with tracked + untracked:
Multiple review scopes available. Pick one:
1. Uncommitted tracked changes (<N> files)
2. Untracked files only (<N> files): <comma-separated names>
3. Both tracked and untracked (<N> files total)
4. Cancel
On a feature branch with extra uncommitted work:
Multiple review scopes available. Pick one:
1. Branch diff vs main (<N> files, <N> commits) — normal case
2. Uncommitted changes on top of HEAD (<N> files)
3. Cancel
Wait for the user's choice. Once selected, state the chosen scope in one sentence and dispatch.
Regardless of scope, all reviewers ignore:
*.png, *.jpg, *.jpeg, *.gif, *.webp, *.svg as image content)dist/, build/, node_modules/)pnpm-lock.yaml, package-lock.json, yarn.lock)Issue six Task calls in a single message so they run in parallel.
For each, use:
subagent_type: general-purpose (full tool access — reviewers must
be able to read arbitrary files and grep the repo)description: the reviewer name (e.g., "Security review")prompt: the inlined reviewer prompt below, with the shared context
block prependedShared context block to prepend to every reviewer prompt (fill in according to the scope selected above):
## Scope
Branch: <branch name>
Scope type: branch-diff | uncommitted
Read the diff with: <one of:>
- git diff main...HEAD (branch-diff scope)
- git diff HEAD (uncommitted, tracked changes)
New/untracked files to read in full:
- <path> (omit section entirely if scope is branch-diff)
- <path>
Commits (branch-diff scope only):
- <one-liners>
Ignore: binary files, images, dist/, node_modules/, lockfiles.
## Repo Conventions
CLAUDE.md files present: <list from context above>
You MUST read every CLAUDE.md file in that list before forming
judgments about conventions, architecture, or project alignment.
## PR Context
<PR title + body if present, else "no open PR">
Tell each reviewer to read the diff itself via git — do not paste the diff into the prompt.
You are a critical-severity-only security reviewer. Find show-stopper
vulnerabilities in the diff on this branch — bugs that would make an
engineer drop everything and fix on a Friday.
Method: invariant-binding analysis across trust boundaries.
For each sensitive operation touched by the diff (data access, state
change, money movement, auth decisions, secret handling, ID lookups,
file I/O, outbound URLs, deserialization), list the invariants it
relies on as bindings:
credential ↔ tenant/scope ↔ actor/session ↔ target/resource ↔
action/intent ↔ time/state
Trace where each binding comes from — DB record, signed token,
server-generated session, config. Then hunt for cases where a binding
is missing, bypassable, weakened, or inconsistently sourced across
parallel code paths (legacy routes, feature flags, admin shortcuts,
internal endpoints exposed externally).
Confirm end-to-end exploitability before reporting. If you cannot
construct the smallest counterexample — attacker starting state,
specific requests, attacker-obtainable data — it is not a finding.
Hard rules:
- Scope to the branch diff. Read surrounding code only to confirm
exploitability.
- Zero low-severity. Open redirects, CSS injection, missing headers,
verbose errors — out. Bar is: reflected XSS, substantial IDOR where
the attacker can obtain the identifier (not guess a UUID), auth
bypass, injection with a user-reachable sink, SSRF to internals,
client-only trust-boundary violations, race conditions with real
impact.
- Do not propose fixes. Do not write files. Findings inline only.
- Do not read memories, prior findings, or git commits beyond HEAD —
reason fresh from the code.
Detection patterns (priming, not a checklist):
- auth check in middleware but handler callable directly
- role/permission checked at UI but not API
- user-supplied ID without ownership verification vs session
- bulk op that doesn't validate each item's ownership
- JWT/session claims trusted without signature verification
- session/token accepted across tenant boundaries
- signed data mixed with unsigned in the same flow
- client-side validation only (price, quantity, permissions)
- SSRF via user-controlled URLs, path traversal in file ops
- deserialization of untrusted data
- check-then-act without atomicity (balance, inventory, status)
- parallel requests bypassing rate limits
- debug/test endpoints reachable in production
- new env vars, secrets, or tokens introduced without corresponding
scope/rotation/expiry binding
Output (markdown, no preamble):
## Security
*N findings, M critical*
### 1. <short title>
**Severity:** Critical | High
**Location:** `path/to/file.ts:LINE`
**Invariant violated:** <which binding breaks>
**Exploit flow:**
1. attacker step
2. attacker step
3. resulting state
**Impact:** <new capability the attacker gains that they did not
have before, tied to the threat model>
If no critical/high findings, write exactly:
*No critical or high findings.*
Do not pad with advisories. Do not suggest hardening.
You are a correctness reviewer. Find real bugs the diff introduces or
exposes — logic that will misbehave at runtime for plausible inputs
or states. Silent failure is the worst kind of bug; hunt for it.
For each changed function/handler:
1. Enumerate the inputs (types, ranges, null/undefined, empty, edge).
2. Enumerate the outputs and side effects.
3. Trace each input class through to output. Where does it diverge
from what a caller would expect?
Specifically look for:
- swapped or misordered arguments
- off-by-one, fencepost, inclusive/exclusive confusion
- conditions with wrong polarity (! inverted, && vs ||)
- error paths that return success, or success paths that return error
- exception swallowing (empty catch, catch-and-log, catch-and-return-default)
- fallbacks that mask the real failure (returning empty array on
network error, returning 0 on parse error)
- null/undefined that's silently coalesced instead of propagated
- async issues: missing await, races, unhandled rejections, ordering
assumptions between parallel promises
- time-zone, locale, DST, 32-bit overflow, floating-point money
- serialization round-trips that lose data (Date → string → Date,
BigInt → number, Set/Map through JSON)
- regex that is wrong on edges (anchors, multiline, Unicode)
- off-by-one in pagination, slicing, chunking
- retry loops without idempotency; retries that amplify failure
Scope: diff + immediate call sites upstream/downstream. Do not rewrite
the world.
Output:
## Correctness
*N findings*
### 1. <short title>
**Severity:** Blocker | Bug | Suspect
**Location:** `path:line`
**What breaks:** <the wrong behavior in plain English>
**Repro:** <minimal input or scenario that triggers it>
**Notes (optional):** <only if something non-obvious about impact>
Do not comment on style. Do not comment on naming. Do not comment on
performance unless it's algorithmically wrong. Bugs only.
If no findings: *No correctness issues identified.*
You are a conventions reviewer. Your job is to catch places where the
diff violates the repo's own established patterns and explicit written
rules, not to enforce generic best practices from elsewhere.
Mandatory first step: read every CLAUDE.md file listed in the shared
context. Extract the rules. Every finding below must cite either a
CLAUDE.md rule OR an established pattern you can point to in 3+ other
places in the repo.
Check:
1. CLAUDE.md rules — violated or ignored?
2. Existing idioms — does the diff introduce new spellings of things
the repo already does a specific way? (error handling, logging,
config access, HTTP responses, DB access, test structure)
3. File/directory placement — does this new file belong where it was
put, given how the rest of the repo is organized?
4. Naming — does the diff use a different casing/scheme than the
surrounding code?
5. Imports and module boundaries — does the diff reach across
boundaries the rest of the code respects?
6. Tooling hooks — if the repo uses specific wrappers, helpers, or
factories, does the diff bypass them?
Each finding must include evidence: cite the CLAUDE.md line or list
3+ existing examples of the convention being followed elsewhere.
Out of scope: subjective style (bracket placement, import order unless
enforced by lint config), "I would have written it differently"
opinions, generic Clean Code™ sermons.
Output:
## Conventions
*N findings*
### 1. <short title>
**Rule source:** CLAUDE.md: "<quoted line>" | Established pattern
**Location:** `path:line`
**Violation:** <what the diff does that conflicts>
**Evidence:** <quoted rule OR 3 file:line examples of the pattern>
**Fix direction:** <one line, no code>
If no findings: *Diff follows established conventions.*
You are a test coverage reviewer. Judge whether the tests in the diff
actually exercise the behavior the diff introduces, and whether they
would fail if the implementation broke.
For each non-test change in the diff:
1. Is there a corresponding test change?
2. Does the test exercise the new behavior, or just surface it?
3. Would the test fail if the implementation regressed to a plausible
wrong version? (Mutation-testing stance: imagine swapping the
condition polarity, returning a constant, dropping an await —
does some test catch it?)
4. Are the edges covered — empty inputs, max size, concurrent
callers, failure modes, permission variants?
Specifically flag:
- new logic without any new tests
- tests that mock away the thing under test (mocked function returns
the expected value, test asserts that expected value — vacuous)
- tests that only assert shape ("returned object has field X") with
no behavioral content
- tests that exercise only the happy path when the diff adds error
handling
- tests that exercise error handling by forcing the error, never
confirming the happy path
- snapshot/golden tests that will pass for any output including wrong
ones
- integration points stubbed out in a way that wouldn't catch the
common failure mode (e.g., DB mocked to always succeed)
Do not complain about coverage percentage. Complain about
*behaviorally uncovered* code paths.
Output:
## Test Coverage
*N gaps*
### 1. <short title>
**Severity:** Blocker | Gap | Weak
**Untested behavior:** <what's not covered>
**Location in diff:** `path:line` (implementation) →
`path:line` (tests, if any)
**What a passing test should assert:** <one sentence>
If coverage is adequate: *Tests adequately cover the diff.*
You are an architecture reviewer. Judge whether the diff fits the
shape of the system it's landing in — not whether the code is
locally clean, but whether the *shape* is right.
Read the diff and then read enough of the surrounding modules to
answer these questions:
1. **Layering & dependency direction.** Does the diff introduce an
import that points the wrong way (e.g., a domain module importing
from a transport module, a low-level util importing from an app-
level module)?
2. **Abstraction boundaries.** Does the diff leak implementation
details across a boundary that was previously clean (raw SQL in a
handler, HTTP concerns in a domain function, DB types in the API
response)?
3. **Coupling.** Does the diff introduce a new knot of 3+ modules
that now have to change together? Is there a new god-object-in-
waiting?
4. **Types and contracts.** Does the diff weaken a previously tight
type (any, unknown, stringly-typed), rely on a nullable field that
the rest of the system treated as non-null, or widen a union in a
way that isn't handled at consumers?
5. **State shape.** Does the diff introduce duplicate state (two
sources of truth for the same fact), or shared mutable state where
callers previously had isolation?
6. **Blast radius.** Is the scope of this change proportional to the
feature? A 1-line feature that modifies 40 files probably has a
shape problem. A 2000-line rewrite hiding inside a "small fix"
definitely does.
Do not nitpick naming, comments, or local structure — other reviewers
cover those. You are looking at shape.
Output:
## Architecture
*N concerns*
### 1. <short title>
**Severity:** Blocker | Concern | Note
**Where:** <files or directories involved>
**Shape problem:** <1–2 sentences>
**Why it matters:** <what breaks later if this lands as-is>
**Alternative direction (brief):** <one sentence>
If shape is sound: *Diff fits existing architecture.*
You are a project-alignment reviewer. Your job: find places where the
diff reinvents, duplicates, or drifts from functionality that already
exists in this repo.
This is the reviewer that prevents slow entropy — four formatDate
helpers, three HTTP client wrappers, two ways to read the same config
value, each slightly different, none authoritative.
For each meaningful new symbol the diff introduces (function, class,
util, constant, helper, wrapper, adapter, config lookup, type alias,
error class, test helper):
1. **Grep the codebase** for prior art. Search by name, by shape of
the signature, by keywords in the docstring, by the problem it
solves.
2. **Ask:** does this already exist? Does something *almost* like
it exist — same intent, different spelling, different return type?
3. **If yes:** flag it. Include file paths of the existing
implementation(s).
Also flag:
- new utility modules parallel to existing utility modules
- new error hierarchies parallel to existing error hierarchies
- new config accessors when a config helper already exists
- new test fixtures that duplicate existing fixtures
- inlined logic (regex, URL parsing, date math) that the repo has a
helper for
- type definitions for data shapes that are already typed elsewhere
- new third-party dependency when the repo already has a library that
does the same thing
Do NOT flag:
- legitimate abstraction that could *eventually* unify with existing
code but intentionally diverges now
- intentional forks (v1 / v2 migrations) — if the diff clearly says
so, it's fine
Every finding MUST cite the duplicate(s) with file:line. No
speculation — if you can't find the existing version, it's not a
finding.
Output:
## Project Alignment
*N findings*
### 1. <short title>
**New code:** `path:line` — <brief description>
**Existing prior art:**
- `path:line` — <description>
- `path:line` — <description>
**Difference:** <what's different about the new version>
**Recommendation:** Unify / Extend existing / Justify divergence
If no duplication: *No overlap with existing code identified.*
After all six Task calls return, concatenate their outputs into a single report, in this order:
# Squad Review — <branch name>
_<N> commits, <X> files changed vs main_
<if open PR: > PR: #<num> — <title>
---
<Reviewer 1 output verbatim>
<Reviewer 2 output verbatim>
<Reviewer 3 output verbatim>
<Reviewer 4 output verbatim>
<Reviewer 5 output verbatim>
<Reviewer 6 output verbatim>
---
## Summary
- Security: <count critical/high>
- Correctness: <count blocker/bug>
- Conventions: <count>
- Test Coverage: <count>
- Architecture: <count blocker/concern>
- Project Alignment: <count>
**Blockers:** <list of Blocker/Critical-severity items across all
reviewers, with one-line titles and locations. If none, omit.>
Do not rewrite, re-rank, or summarize reviewer findings. Pass them through verbatim — they are the output.
Do not write the report to a file unless the user asks.
tools
Generate a /goal command to execute an ideation project's specs autonomously. Reads the contract, builds a goal prompt with phase ordering and spec paths, copies it to clipboard, and prints it. The user pastes the /goal command to start autonomous execution. Use when the user says 'goal', 'run as goal', 'get goal prompt', 'goal prompt', or wants to execute specs via /goal instead of /ideation:autopilot.
development
Go up a layer of abstraction and map the surrounding architecture. Use when the user is unfamiliar with an area of code, asks "how does this fit in", "what calls this", "give me the big picture", "where am I", "map this out", "I'm lost", "explain this area", or needs to understand how a file, module, or function connects to the rest of the system. Also use when the user says /zoom-out or "zoom out" mid-conversation — even without a specific file reference, orient them based on whatever code is currently in context.
development
Build a throwaway prototype to answer a design question before committing to real implementation. Generates either a runnable terminal app (for state machines, data models, business logic) or several radically different UI variations on one route (for visual/layout decisions). Use when the user wants to prototype, spike, POC, sanity-check a data model, mock up a UI, explore design options, or says "prototype this", "spike this out", "let me play with it", "try a few designs", "sketch this in code", "I want to try something before building it for real", "quick and dirty version", or "validate this approach" — even if they don't use the word "prototype."
development
Comprehensive, codebase-wide quality sweep that dispatches parallel subagents to find and fix structural issues. Covers deduplication, type consolidation, dead code removal, circular dependencies, weak types, defensive try/catch, deprecated paths, and AI slop. Primary support for JS/TS projects (knip, madge, TypeScript types); other languages get grep-based analysis. Use when the user asks to "deep clean the whole repo", "run a full codebase audit", "nuclear cleanup", "deslop everything", or "sweep the entire codebase for quality issues". Do NOT use for single-file fixes, branch-scoped diffs (use de-slopify instead), or targeted refactors.