skills/review/SKILL.md
Collect critical feedback from all registered LLMs on an artifact (architecture doc, implementation, plan). Intellectual debate with push-back — no sycophancy. Reports findings and unresolved disagreements.
npx skillsauth add raine/consult-llm-mcp reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Collect critical, honest feedback from all LLMs on an artifact. Push back on weak arguments. Report both consensus findings and unresolved disagreements.
consult-llm skillLoad the consult-llm skill before proceeding — it defines the invocation contract (stdin heredoc, flags, output format, multi-turn). Do not call the CLI without loading it first.
Arguments: $ARGUMENTS
Check the arguments for flags:
Mode flags:
--rounds N → number of critique rounds (default: 2, max: 3)--dry-run → skip the final synthesis, just show raw reviews--models <list> → comma-separated selectors/model IDs to use as reviewers (default: gemini,openai,anthropic,deepseek)Strip all flags from arguments to get the review target — a file path, directory, or topic description.
Set variables:
REVIEWERS: list of model selectors from --models flag, or ["gemini", "openai", "anthropic", "deepseek"] if omitted-m flags by repeating -m <selector> for each reviewerDiscover which selectors and models are available in this environment:
!`consult-llm models`
Default reviewers (used when no --models flag is given): gemini, openai, anthropic, deepseek — all four selectors that have a configured backend.
Override with --models flag: --models gemini,openai to review with only two, or --models gemini,openai,anthropic for three. Any selector or exact model ID from the list above is accepted.
This skill exists to find problems, not to validate. Instruct every LLM call with:
Parse the arguments — determine what to review:
Gather context — use Glob, Grep, Read to understand:
Prepare the review brief — a summary of:
Have all four LLMs independently review the artifact in parallel using a single CLI call.
Review prompt:
You are a critical reviewer. Your job is to find problems, not to praise.
## What you are reviewing
[Review brief — artifact content and context]
## Your task
Provide a thorough, critical review:
1. **Problems found**: List concrete issues — bugs, logical errors, missing edge cases, architectural flaws, security concerns. Be specific with file paths and line numbers where applicable.
2. **Questionable decisions**: Decisions that might work but deserve scrutiny — are there better alternatives? What are the trade-offs not being considered?
3. **Missing considerations**: What's not addressed that should be? Gaps in error handling, testing, documentation, scalability, maintainability?
4. **Risks**: What could go wrong in production or during maintenance? What assumptions might not hold?
5. **What works well**: (Brief) What's genuinely solid and should be kept as-is?
Rules:
- Be direct and specific. "This could be improved" is useless. "The retry logic on line 45 silently swallows errors, which will make debugging impossible" is useful.
- Do NOT try to be balanced. If you find 10 problems and 1 good thing, report 10 problems and 1 good thing.
- Do NOT soften criticism. If something is bad, say it's bad and explain why.
- Prioritize your findings: critical issues first, minor nits last.
Invoke consult-llm with -m <selector> repeated for each reviewer in REVIEWERS, --task review, and -f <path> for each relevant file. Send the review prompt on stdin via quoted heredoc. All models are queried in parallel in a single call.
The response is in group format:
[thread_id:group_xxx]## Model: <id> header, then [model:<id>] [thread_id:<per-model-id>], then the response bodyExtract thread IDs: Parse each model's thread_id from the per-model header lines. These are needed for Phase 3 since each model receives the other three's responses.
Present all reviews to the user.
For each round (default 2, configurable with --rounds N, max 3):
Share a combined summary of all other reviewers' findings with each reviewer and ask them to challenge, validate, or push back. Use -t <thread_id> to continue each LLM's conversation.
Cross-review prompt (for each reviewer, include the other reviewers' findings):
The other reviewers provided these assessments:
[Combined summary of the other reviewers' latest responses, labeled by provider name]
Respond critically:
1. **Agree**: Which of their findings are valid? Don't just agree to be agreeable — only agree if you genuinely think they're right.
2. **Disagree**: Which findings are wrong, exaggerated, or missing context? Explain why. If they dismissed one of YOUR concerns, push back if you still think it's valid.
3. **New findings**: Did their reviews make you notice anything you missed?
4. **Priority adjustment**: Given all reviews, what are the TOP 3 most critical issues?
Do NOT be diplomatic. If they're wrong, say they're wrong and explain why. If you change your mind, say so explicitly — don't quietly drop a previous point.
Each model receives a different prompt (the other reviewers' responses embedded). Invoke consult-llm once with one --run flag per reviewer, continuing each model's thread:
consult-llm \
--run "model=<selector>,thread=$THREAD,prompt-file=$PROMPT" \
... # one --run per reviewer
-f <path> ...
Write each model's cross-review prompt to a temp file with mktemp, using __CONSULT_LLM_END__ as the heredoc terminator and >| to overwrite.
Present all responses to the user after each round.
If --dry-run: Present the raw reviews without synthesis.
Analyze all rounds and produce a structured report:
Go through every issue raised across all rounds and categorize:
## Review: [Artifact Name]
**Reviewed:** [What was reviewed — file paths or description]
**Reviewers:** Gemini, OpenAI, Anthropic, DeepSeek
**Rounds:** [N]
### Critical Issues (Consensus)
Issues where 3+ reviewers agree, ordered by severity:
1. **[Issue title]**
- **What:** [Specific description]
- **Where:** [File:line or section]
- **Why it matters:** [Impact]
- **Suggested fix:** [If one emerged from discussion]
- **Raised by:** [Which reviewers]
### Disputed Issues
Issues where reviewers disagree — all positions presented:
1. **[Issue title]**
- **For:** [Reviewers and their argument]
- **Against:** [Reviewers and their argument]
- **Moderator's take:** [Your assessment of who has the stronger argument]
### Minor Findings
Lower-severity issues and suggestions:
- [Finding 1]
- [Finding 2]
### What's Solid
Aspects reviewers consider well-done:
- [Strength 1]
- [Strength 2]
### Unresolved Questions
Open questions that need human judgment:
- [Question 1]
- [Question 2]
Add your own honest assessment as moderator:
Save the report to history/review-<artifact-name>.md.
-m flags in a single call. Do not show one reviewer's output to another until Phase 3.600000 on every consult-llm call — LLM responses routinely exceed the 2-minute default.data-ai
Coordinator workflow for multi-phase implementation across workmux worktrees. Generates or loads a master plan, dispatches phase agents using presets, verifies sentinels, merges serially, and performs integration verification.
testing
Interactive design session — agent facilitates a clarifying dialogue with the user, fans out to multiple LLMs in parallel for divergent approach generation, lets the user pick one, then co-designs the chosen approach with optional multi-LLM critique before saving.
development
Standalone multi-model code review of an existing diff. Multiple LLMs review in parallel; agent deduplicates, prioritizes by severity/confidence, and optionally applies localized fixes.
development
One-unit implementation workflow using presets. It writes a compact note or rich code-bearing plan, optionally consults external LLMs, optionally delegates execution to sideagent, verifies, commits, and summarizes.