skills/review-board/SKILL.md
Multi-LLM review board. Sends any document (architecture docs, design docs, RFCs, API specs, runbooks) to external LLM CLIs (Codex, Gemini, GitHub Copilot) for independent review, synthesizes feedback, then optionally drives rebuttal rounds and consensus building. Multi-round, each round is opt-in. Trigger phrases include "review this doc", "get external review", "send to chatgpt and gemini", "review board", or any request to have a document reviewed by other LLMs.
npx skillsauth add abhichandra21/dotclaude review-boardInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Multi-round document review process using external LLMs as independent reviewers and Claude as the synthesizer and moderator. Works with any technical document -- architecture docs, design docs, RFCs, API specs, runbooks, proposals. Each round after Round 1 is optional -- the user advances when they want to go deeper.
| Reviewer | CLI Binary | Display Name |
|----------|-----------|--------------|
| ChatGPT | codex | ChatGPT (Codex) |
| Gemini | gemini | Gemini |
| GitHub Copilot | gh copilot | GitHub Copilot |
At least one reviewer CLI must be installed. Check with:
which codex 2>/dev/null; which gemini 2>/dev/null; gh copilot --version 2>/dev/null
If none are found, report the error and stop. If only some are available, proceed with those.
At startup, detect all available CLIs. Use all that are found. This is the default -- no user prompt needed.
If the user specifies reviewers in their request (e.g., "just send to gemini" or "use codex and copilot"), respect that choice. Only use the specified reviewers.
The skill requires at least 1 reviewer. With a single reviewer, skip the "Agreed Issues" and "Contradictions" categories in synthesis -- everything becomes "Unique Insights" from that reviewer.
With 2+ reviewers, the full synthesis categories apply. More reviewers increase the chance of catching issues but also increase noise. The synthesis step handles any number of reviewers.
All intermediate files go in a temp working directory. Only the final deliverable(s) are saved next to the source document.
Temp directory: <doc_dir>/.review-board-<docname>/
Create this directory at the start of the workflow. All raw reviews, rebuttals, and intermediate files go here.
Final deliverables (saved next to the source doc):
<docname>-review-consolidated.md -- always produced (Round 1)<docname>-consensus.md -- only if Round 3 runs<docname>-decisions.md -- only if Round 4 runsTemp files (in the working directory, cleaned up at the end):
context-preamble.md -- project context prepended to doc for reviewersreview-<reviewer>.md -- raw review from each reviewer (e.g., review-chatgpt.md, review-gemini.md, review-copilot.md)rebuttal.md -- Claude's rebuttalrebuttal-<reviewer>.md -- each reviewer's response to the rebuttalCleanup: When the workflow completes (user chooses to stop or finishes all rounds), ask the user whether to keep or delete the temp working directory. Default: delete.
At the start of the workflow, run:
mkdir -p "<doc_dir>/.review-board-<docname>"
These apply to all rounds that send content to external LLMs.
Important shell details:
printf '%s\n\n%s' instead of echo to handle large documents safely.-p argument via $(cat "$FILE") command substitution. Do NOT pipe via stdin -- Gemini CLI does not reliably read stdin with -p.- to read from stdin.timeout: 600000 on all Bash calls (reviews can take a few minutes).run_in_background: true) so they execute concurrently.TaskOutput with block: true and timeout: 300000 to wait for completion of each.2>/dev/null on both CLIs.Give reviewers access to the project codebase so they can browse source code, not just the document text. Determine the project root: walk up from the document's directory looking for .git, go.mod, package.json, or similar project markers. If found, use that as $PROJECT_DIR. If not found, use the document's parent directory.
ChatGPT (Codex) pattern:
cd "$PROJECT_DIR" && printf '%s\n\n%s' "$PROMPT" "$(cat "$INPUT_FILE")" | codex exec -C "$PROJECT_DIR" --skip-git-repo-check -o "$OUTPUT_FILE" - 2>/dev/null
-C sets the working root so Codex can read project files. cd also ensures CWD matches. The review prompt should include the explicit path $PROJECT_DIR so the model knows where to look.
Gemini pattern:
cd "$PROJECT_DIR" && gemini --include-directories "$PROJECT_DIR" -p "${PROMPT}
$(cat "$INPUT_FILE")" -o text > "$OUTPUT_FILE" 2>/dev/null
--include-directories adds the project to Gemini's workspace. cd sets CWD for file resolution. The review prompt should include the explicit path $PROJECT_DIR.
GitHub Copilot pattern:
cd "$PROJECT_DIR" && gh copilot -- -p "${PROMPT}
$(cat "$PREAMBLE_FILE")
$(cat "$INPUT_FILE")" --model "claude-sonnet-4.5" --add-dir "$PROJECT_DIR" --allow-all-tools --no-custom-instructions -s > "$OUTPUT_FILE" 2>/dev/null
cd "$PROJECT_DIR" sets CWD so the model's file tools resolve relative paths correctly. --add-dir whitelists the directory for file access. --allow-all-tools and -s (silent) are required for non-interactive mode. The review prompt should include the explicit path $PROJECT_DIR so the model knows where to look.
This round always runs. It is the minimum viable review.
Always ask the user which file to review using AskUserQuestion. Do not auto-detect or assume. If the user provides a file path as an argument, confirm it and proceed.
Read tool. If it does not exist or is empty, report the error and stop.which codex 2>/dev/null; which gemini 2>/dev/null; gh copilot --version 2>/dev/null
Report which reviewers are available. If the user specified reviewers in their request, use only those. Otherwise use all available.
If zero reviewers are available, report the error and stop.mkdir -p "<doc_dir>/.review-board-<docname>"External reviewers receive only the document text -- they have no knowledge of the project, repo, codebase, or team constraints. Claude DOES have this context from the current session. Use it.
Generate a context preamble and write it to <workdir>/context-preamble.md. This file gets prepended to the document content before sending to each reviewer.
Build the preamble by gathering what you know from the conversation and the filesystem. Run ls on the project root to discover structure. Check for README, go.mod, package.json, etc. to identify tech stack.
Use this template:
## Context for Reviewers
> This context helps you understand the technical environment. Focus your review
> on engineering quality, not business strategy.
**System/Tool:** <what is this? e.g., "Kubernetes operator for DICOM VNA", "CLI for prompt engineering", "REST API for image processing">
**Problem Being Solved:** <1-2 sentences about the technical problem>
**Technology/Domain:** <e.g., "K8s operator (Go, controller-runtime)", "Python CLI", "REST API (Node.js, Express)", "ML pipeline (Python, PyTorch)">
**Environment:** <where it runs: AWS EKS, local CLI, edge devices, Docker, etc.>
**Key Integrations:** <external systems this interacts with: APIs, databases, services>
**Known Constraints:**
- <technical or business constraints that are non-negotiable>
- <e.g., "must use PostgreSQL", "cannot require root", "must handle 10K req/sec">
- <e.g., "must comply with HIPAA", "must work offline", "GitHub API rate limits">
**Current State:** <what exists in production vs. what's proposed>
**Recent Context:** <optional: recent incidents, similar tools, previous attempts>
---
Rules:
Send the context preamble + doc to all selected reviewers in parallel using this review prompt (replace $PROJECT_DIR with the actual absolute path):
You are a principal engineer reviewing this technical document.
The document is preceded by a "Context for Reviewers" section that describes the technical
environment and constraints. Use this context to calibrate your review -- do not flag things
that are explicitly listed as known constraints or out of scope.
IMPORTANT: The project source code is in your current working directory ($PROJECT_DIR).
You MUST browse the codebase to verify claims in the document. Start by listing the
top-level directory contents, then read key source files referenced in the document.
Check if referenced code, configs, or infrastructure actually exist and match what the
document describes. Cross-reference the doc against the implementation.
Focus on engineering quality and production readiness:
1) **Technical Correctness**
- Will this design/approach actually work for the stated problem?
- Are there fundamental flaws or wrong assumptions?
- Are dependencies, integrations, and prerequisites correctly identified?
2) **Failure Modes & Recovery**
- What breaks under failure scenarios? (Network, disk, API, service down)
- How does the system recover? Retry logic? Degraded mode?
- What's the blast radius of failures?
- Rollback/rollforward strategy?
3) **Implementation Gaps**
- What's missing to actually build this? (Configs, schemas, APIs, libraries)
- Are error handling paths specified?
- Edge cases or boundary conditions not addressed?
4) **Security & Safety**
- Authentication, authorization, input validation
- Secrets/credentials handling
- Privilege levels, access control
- Data exposure risks
5) **Observability & Operations**
- Monitoring, logging, metrics, alerting
- How do you debug "why didn't X work?"
- Health checks, status reporting
- Operational runbooks or procedures
6) **Production Readiness**
- Testing strategy (unit, integration, end-to-end)
- Deployment/upgrade approach
- Resource requirements (if relevant to design)
- Performance characteristics (if relevant)
7) **Design Quality**
- Is this over-engineered? Under-engineered? Right-sized?
- Does it follow established patterns for the domain?
- Are there simpler alternatives that achieve the same goals?
Be direct about flaws. Reference specific sections. Suggest concrete improvements.
If the document mentions strategic concerns (cost, team structure, training), note them
briefly but don't deep-dive unless they impact technical feasibility.
When building the input for each reviewer, concatenate: context preamble + blank line + document content. Use printf '%s\n\n%s\n\n%s' with three parts: prompt, preamble content, document content.
Run all reviewers as background Bash tasks concurrently. Save each to temp directory: <workdir>/review-<reviewer>.md (e.g., review-chatgpt.md, review-gemini.md, review-copilot.md).
If a reviewer fails, log the error and proceed with the others.
Read all review files and the original doc. Classify every finding into one engineering-centric category. Note which reviewers flagged each finding and how many agree.
With a single reviewer, all findings are single-source. With 2+ reviewers, note agreement counts -- findings flagged by multiple reviewers have higher confidence. With 3 reviewers, "all three agree" is stronger than "two of three."
Write the consolidated review to the final location (next to source doc): <docname>-review-consolidated.md
Use this structure (adapt reviewer names to actual reviewers used):
# Consolidated Review: <docname>
Reviewed by: <list of reviewer display names>
Synthesized by: Claude
## Summary
<2-3 sentences: what type of document, total findings, key themes>
## Critical Issues (Must Fix Before Implementation)
Design flaws, missing components, or unhandled failure modes that will cause production incidents.
| # | Issue | Severity | Flagged By | Affected Sections | Technical Risk |
|---|-------|----------|-----------|-------------------|----------------|
### Details
#### 1. <Issue title>
**<Reviewer A> said:** <quote or paraphrase>
**<Reviewer B> said:** <quote or paraphrase> (if multiple flagged)
**Affected sections:** <section names>
**Technical risk:** <what breaks if not fixed>
**Recommended action:** <specific fix>
## Implementation Gaps
Missing pieces needed to actually build/deploy this: configs, schemas, error handling, integrations.
| # | Gap | Flagged By | Affected Sections | What's Needed |
|---|-----|-----------|-------------------|---------------|
### Details
(same format -- what's missing and what's needed to fill it)
## Risk Factors (Edge Cases & Failure Scenarios)
Things that work in happy path but could break under load, network issues, or edge conditions.
| # | Risk | Flagged By | Failure Scenario | Mitigation |
|---|------|-----------|------------------|-----------|
## Operational Concerns
Gaps in monitoring, debugging, incident response, or operational procedures.
| # | Concern | Flagged By | Impact on Operations |
|---|---------|-----------|---------------------|
## Security Issues
Authentication, authorization, secrets, privilege escalation, data exposure risks.
| # | Issue | Flagged By | Security Impact |
|---|-------|-----------|-----------------|
## Design Improvements
Better patterns, simpler alternatives, or refactoring suggestions.
| # | Suggestion | Flagged By | Benefit |
|---|-----------|-----------|---------|
## Contradictions (Models Disagree)
Technical disagreements requiring engineering judgment.
| # | Topic | Positions | Recommendation |
|---|-------|-----------|----------------|
### Details
(For each contradiction, present technical arguments from each reviewer)
## False Positives (Already Addressed)
| # | Flagged Issue | Reviewer | Already Covered In |
|---|---------------|----------|--------------------|
## Action Items (Priority Order)
- [ ] <action> -- [Critical/Gap/Risk/Operational/Security/Design] -- Section: <x>
Report the summary to the user in chat. Then ask:
Round 1 complete. You have three options:
1) Stop here -- work from the action items list
2) Round 2: Rebuttal -- I'll respond to each finding (accept/reject/partial), send rebuttals back to the reviewers, and see if they hold their positions
3) Skip to updating the doc -- I'll apply the accepted changes directly
Use AskUserQuestion with these three options.
If the user stops here, proceed to Cleanup.
Only runs if the user chooses to continue. The goal is to challenge the reviewers' findings and let them defend or withdraw.
Read the consolidated review and original doc. For EACH finding (agreed, unique, contradiction), write one of:
Write to temp directory: <workdir>/rebuttal.md
Use this structure:
# Rebuttal: <docname>
## Accepted (will fix)
| # | Original Finding | Response | Planned Change |
|---|-----------------|----------|----------------|
## Rejected (disagree)
| # | Original Finding | Rejection Rationale |
|---|-----------------|---------------------|
### Details
#### <Finding title>
**Original claim:** <what was said>
**Why it's wrong:** <specific reference to doc section, technical argument>
## Partially Accepted
| # | Original Finding | What We Accept | What We Reject |
|---|-----------------|----------------|----------------|
## Deferred (valid, not now)
| # | Original Finding | Why Deferred | When to Address |
|---|-----------------|--------------|-----------------|
Send the rebuttal + original doc back to all reviewers in parallel with this prompt:
You previously reviewed a technical document and provided feedback.
The document author has responded to your findings with a rebuttal.
For each item in the rebuttal:
- If ACCEPTED: acknowledge, no further action needed
- If REJECTED: do you still hold your position? If yes, explain why the rebuttal is insufficient. If the rebuttal convinced you, withdraw your finding.
- If PARTIALLY ACCEPTED: is the partial acceptance sufficient? What's still missing?
- If DEFERRED: is deferral reasonable or is this a risk that must be addressed now?
Be direct. If you were wrong, say so. If you still disagree, strengthen your argument.
Append the rebuttal file content after the prompt. Save responses to temp directory: <workdir>/rebuttal-<reviewer>.md for each reviewer.
Summarize the second-round responses in chat. Highlight:
Then ask:
Round 2 complete. Options:
1) Stop here -- work from accepted items + contested items for your judgment
2) Round 3: Consensus -- I'll classify everything as Consensus/Resolved/Deadlocked and present deadlocked items for your decision
If the user stops here, proceed to Cleanup.
Only runs if the user chooses to continue. The goal is to reach final decisions.
Read all files produced so far. Two tasks: classify findings and detect holdouts.
Before classifying findings, compute each reviewer's flexibility score from their rebuttal responses:
For each reviewer, count:
Compute: flexibility_rate = withdrew / (withdrew + held)
Flag a reviewer as a potential holdout if:
flexibility_rate == 0 (withdrew nothing) AND they had 3+ findings challengedThis does NOT auto-dismiss their findings. It adds context for the user's decision-making. Include the scorecard in the consensus document.
Classify every original finding into:
For deadlocked items:
Write to final location (next to source doc): <docname>-consensus.md
Use this structure:
# Consensus: <docname>
## Reviewer Flexibility Scorecard
| Reviewer | Findings Challenged | Withdrew | Held | Escalated | Flexibility Rate | Flag |
|----------|-------------------|----------|------|-----------|-----------------|------|
| <name> | N | N | N | N | N% | --/Holdout |
(If any reviewer is flagged: "Holdout flag does not invalidate findings -- it provides
context for weighting contested positions.")
## Consensus Items (agreed by all)
| # | Item | Resolution | Owner |
|---|------|-----------|-------|
## Resolved Items (rebuttal accepted)
| # | Item | Original Concern | Why Resolved |
|---|------|-----------------|--------------|
## Deadlocked Items (needs human decision)
### DL-X: <Technical Issue Title>
**The technical problem:** <what will break or what's missing>
**Model positions:**
- **<Reviewer A>:** <technical argument with specifics>
- **<Reviewer B>:** <technical argument with specifics>
- **<Reviewer C>:** <technical argument with specifics>
**Support:** X reviewers support position A, Y reviewers support position B
**Held by:** <which reviewer(s)> <holdout flag if applicable>
**Blast radius if wrong:** <what breaks in production -- concrete failure scenario>
**Technical options:**
A) <approach> -- Complexity: [Low/Med/High], Risk: [Low/Med/High], Tradeoff: <what you give up>
B) <approach> -- Complexity: [Low/Med/High], Risk: [Low/Med/High], Tradeoff: <what you give up>
C) <approach> -- Complexity: [Low/Med/High], Risk: [Low/Med/High], Tradeoff: <what you give up>
**Recommendation:** <which option and why -- based on engineering judgment, not ROI>
## Final Action Items
- [ ] <item> -- [Consensus/Resolved/Deadlocked-decided] -- Section: <x>
Present each deadlocked item to the user with the options. Use AskUserQuestion for each one (or batch them if there are few). The user makes the final call.
After decisions are made, update the consensus doc with the decisions.
Then ask:
Round 3 complete. Options:
1) Stop here -- work from the final action items
2) Round 4: Decision Records -- I'll generate formal decision records for every significant decision
If the user stops here, proceed to Cleanup.
Only runs if the user chooses to continue. Produces a formal record of what was decided and why.
For every significant decision (accepted changes, rejected concerns, deadlocked items that were decided), write a decision record.
Write to final location (next to source doc): <docname>-decisions.md
Use this structure:
# Decision Records: <docname>
## DR-001: <Decision Title>
**Status:** Accepted
**Context:** <why this decision was needed>
**Options Considered:**
1. <option A> -- <pros/cons>
2. <option B> -- <pros/cons>
**Decision:** <what was decided>
**Consequences:** <what changes, what risks are accepted>
**Reviewed by:** <list of reviewers>, Claude, <user>
(repeat for each significant decision)
Present the complete set of output files to the user. Summarize the journey:
Proceed to Cleanup.
If models raise cost, training, capacity planning, or other strategic concerns:
Acknowledge briefly in consolidated review: "Note: Models flagged potential cost/training concerns. See individual reviews for details."
Do NOT elevate to deadlock unless the concern makes the solution technically infeasible:
Do NOT include in pre-mortem analysis or final action items.
The review board focuses on engineering quality. Strategic concerns belong in separate business case analysis.
At the end of the workflow (regardless of which round the user stopped at):
<workdir>/? The final deliverables next to your doc are kept either way."rm -rf "<workdir>"Use AskUserQuestion with two options: "Delete temp files (Recommended)" and "Keep temp files".
| Tool | Purpose |
|------|---------|
| AskUserQuestion | Ask for file, present round options, resolve deadlocks, cleanup |
| Read | Validate source doc, read all review/rebuttal files |
| Bash | Check CLI prerequisites, create/delete temp directory |
| Bash (background) | Run reviewer CLIs in parallel |
| TaskOutput | Wait for background task completion |
| Write | Save all output files |
2>/dev/null on all CLIsgh copilot explain fails, try gh copilot with alternative flags or skip with a warningdocumentation
Rephrase, correct, and expand writing with context-aware improvements. Use when the user needs help with (1) Rephrasing text for clarity, tone, or style improvements, (2) Correcting grammatical errors, typos, or spelling mistakes, (3) Expanding rough ideas, bullet points, or fragments into full prose. Trigger phrases include "rephrase this", "fix this text", "help me write", "expand this idea", or simply providing text that needs improvement.
tools
Multi-model review board powered by GitHub Copilot CLI. Sends any document to multiple AI models (GPT, Claude, Gemini) via a single Copilot CLI for independent review, synthesizes feedback, then optionally drives rebuttal rounds and consensus building. Uses Copilot subscription tokens instead of separate API keys. Trigger phrases include "copilot review", "review board copilot", "multi-model review", "debate this", "copilot debate", or any request for Copilot-powered document review or focused design debate.
tools
AI-powered prompt refinement and improvement using the Promptheus MCP server. Use this skill whenever the user asks to refine, improve, rewrite, or optimize a prompt. Trigger phrases include "refine this prompt", "improve this prompt", "make this prompt better", "optimize this prompt", "rewrite this prompt", or any request that involves taking an existing prompt/instruction and making it more effective. Also triggers when the user provides rough ideas, requirements, or specifications and asks for a well-crafted prompt to be generated from them.
tools
Multi-model code review board powered by GitHub Copilot CLI. Reviews PRs, branches, commits, or uncommitted changes by sending review targets to multiple AI models (GPT, Claude, Gemini) via Copilot CLI. Models fetch diffs and browse the codebase independently, then Claude synthesizes findings through optional rebuttal and consensus rounds. Uses Copilot subscription tokens instead of separate API keys. Trigger phrases include "code review copilot", "copilot code review", "review this PR", "review my changes", "review this branch", "copilot debate code", or any request for Copilot-powered code review or code-focused design debate.