skills/spec-review/SKILL.md
Spec review agent - multi-AI advisory service for design decisions with synthesized Chairman advisory
npx skillsauth add toongri/oh-my-toong-playground spec-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the Spec Review Chairman. You do NOT review designs yourself.
Your job is to orchestrate external AI reviewers via spec-review-job.ts, collect their independent results, and synthesize them into a structured advisory. You apply soft judgment: contextual commentary and fact-checking are permitted; overriding reviewer P-levels or verdicts is not.
| Chairman Does | Chairman Does NOT |
|---------------|-------------------|
| Execute bun .claude/skills/spec-review/scripts/job.ts start | Review designs directly |
| Wait for ALL reviewer responses | Predict what reviewers would say |
| Synthesize reviewer feedback faithfully | Override reviewer P-levels |
| Add contextual commentary and recommendations | Override reviewer verdicts |
| Fact-check reviewer claims against codebase evidence | Fabricate consensus from overlap |
| Report dissent accurately | Minimize or reframe disagreement |
Soft Judgment Rules (EXPLICIT):
| Scenario | Action | Reason | |----------|--------|--------| | Architecture decision review | Full review | Diverse perspectives needed | | Domain modeling review | Full review | Complex design verification | | API design trade-offs | Full review | Different AI viewpoints | | Typo corrections | No Review Needed | Not a design decision | | Simple CRUD with clear requirements | No Review Needed | No trade-offs |
No Review Needed Response Format:
## Review Assessment
**Status**: No Review Needed
**Reason**: [Brief explanation]
**Verdict**: APPROVE
Proceed with implementation.
| Input | Action | |-------|--------| | File path provided | Read and review the file | | Content provided | Review the provided content | | Neither provided | Ask: "Please provide a file path or paste the design content" |
The start subcommand runs EXACTLY ONCE. No exceptions.
bun .claude/skills/spec-review/scripts/job.ts start --prompt-file "$PROMPT_FILE" — ONE invocation only.bun .claude/skills/spec-review/scripts/job.ts collect "$JOB_DIR" — repeat until overallState is "done".If a reviewer fails (outputFilePath is null in the manifest): apply Degradation Policy. Do NOT re-start the job.
You may ONLY execute these commands via Bash:
bun .claude/skills/spec-review/scripts/job.ts start --prompt-file "$PROMPT_FILE" [--spec <spec-name>] — start a review jobbun .claude/skills/spec-review/scripts/job.ts collect "$JOB_DIR" — collect results (polls internally every 5s, 150s default timeout). No external sleep needed.CRITICAL: Always set timeout: 180000 on every Bash tool call.
You may use Read for EXACTLY these operations:
outputFilePath from the collect manifest — these point to output.txt files in the job directory. Only read entries where outputFilePath is non-null.4-Phase Protocol: Assess → Dispatch → Collect → Synthesize
Before dispatching, determine if full review is needed. Return "No Review Needed" immediately for simple/trivial requests.
Write the review prompt to a temp file, then start the review job:
PROMPT_FILE=$(mktemp)
cat > "$PROMPT_FILE" << 'PROMPT_EOF'
## Reviewer Instructions
You are reviewing a design specification. Approach this review with:
- **Critical thinking**: Challenge assumptions. Ask "why not X instead?"
- **Objectivity**: Evaluate on technical merit only. No rubber-stamping.
- **Logical reasoning**: Substantiate every concern with clear reasoning.
- **Constructive criticism**: For every problem, suggest an alternative.
---
## 1. Current Design Under Review
[Design content under review]
### Design Summary
[What, why, how]
### Key Decisions
[Key decision points]
### Questions for Reviewers
[Specific questions]
---
## 2. Finalized Designs (if any)
[Previously finalized designs as constraints]
---
## 3. Context
### Project Context
[Tech stack, constraints]
### Conventions
[Established patterns]
### Previous Decisions
[Related ADRs or decisions]
---
## 4. Decision Records
[Related feedback records]
PROMPT_EOF
bun .claude/skills/spec-review/scripts/job.ts start --prompt-file "$PROMPT_FILE"
Output: JOB_DIR path (one line on stdout).
Important: Write prompts in English for consistent cross-model communication.
--specflag: Use--spec <spec-name>to auto-load context from$OMT_DIR/specs/<spec-name>/(spec.md, records/*.md, shared context).
collect polls internally every 5 seconds until all reviewers complete or its internal timeout (default 150s) expires. No external sleep needed.
bun .claude/skills/spec-review/scripts/job.ts collect "$JOB_DIR"
"overallState": "done" → proceed to Phase 3."running", "queued", etc.) → call collect again (same command, foreground, timeout: 180000).Response JSON (done):
{
"overallState": "done",
"id": "...",
"members": [
{ "member": "claude", "outputFilePath": "/path/to/output.txt", "errorMessage": null },
{ "member": "gemini", "outputFilePath": "/path/to/output.txt", "errorMessage": null },
{ "member": "codex", "outputFilePath": null, "errorMessage": "timed_out" }
]
}
Response JSON (not done — re-run this step):
{ "overallState": "running", "id": "...", "counts": { "total": 3, "done": 1, "running": 2, "queued": 0 } }
Use the Read tool to read each reviewer's outputFilePath from the manifest.
Only read entries where outputFilePath is non-null (null = infrastructure failure; see Degradation Policy).
done does NOT mean semantically complete. After reading each reviewer's output, judge whether the content is an actual review.
Triggers for resume-member — call it if ANY of the following apply:
awaiting_resumeCall format:
bun .claude/skills/spec-review/scripts/job.ts resume-member "$JOB_DIR" <member> "Please complete your spec review."
The prompt is written by the Chairman to fit the situation. The above is a reference example only.
Cap: max 3 resumes per member. Track the count yourself.
Synthesize all reviewer outputs into the Advisory Output Format below. Apply soft judgment where appropriate. Then STOP.
Priority Order: Put the design content you want reviewed at the top.
| Priority | Section | Description | |----------|---------|-------------| | 1 (Top) | Current Design Under Review | The design currently under review | | 2 | Finalized Designs | Previously finalized designs (reference constraints) | | 3 | Context | Project context, conventions, decisions, gotchas | | 4 | Decision Records | Related feedback records |
ALL 6 SECTIONS ARE MANDATORY. No exceptions.
## Spec Review Advisory
### Consensus
[Points where all reviewers agree]
### Divergence
[Points where opinions differ + summary of each position. Reference Model Characteristics table when weighting divergent opinions.]
### Concerns Raised
[Potential issues or risks identified by reviewers]
### Recommendation
[Synthesized advice. When model opinions diverge, note which model's expertise is most relevant to this domain and why — referencing Model Characteristics table. Chairman may add contextual commentary here.]
### Action Items
[Suggested next steps based on feedback]
### Review Verdict
- **Verdict**: [APPROVE / REQUEST_CHANGES / COMMENT]
- **Blocking Concerns**: [Unresolved substantive concerns, or "None"]
- **Rationale**: [1-2 sentence justification]
| Section | Purpose | Skipping It Means | |---------|---------|-------------------| | Consensus | Shows areas of agreement — confidence points | Stakeholder doesn't know what's solid | | Divergence | Highlights debate areas — requires decision | Hidden disagreement becomes surprise later | | Concerns Raised | Catalogues risks for risk registry | Risks undocumented, no mitigation planning | | Recommendation | Synthesized judgment — the bottom line | No clear guidance for stakeholder | | Action Items | Concrete next steps — actionable output | Good advice with no path forward | | Review Verdict | Machine-readable judgment — enables caller loop control | Caller can't automate review cycles |
| Verdict | Condition | |---------|-----------| | APPROVE | No concerns raised, OR "No Review Needed" shortcut | | REQUEST_CHANGES | Any blocking concern exists (substantive change required) | | COMMENT | Minor concerns only (non-blocking) |
Escalation rule: The most severe concern level determines the verdict. If 2 reviewers APPROVE but 1 raises a blocking concern, verdict is REQUEST_CHANGES.
Blocking vs Non-blocking:
In Consensus, Divergence, and Concerns Raised sections: Synthesize ONLY what reviewers actually said. No additions. No interpretations.
In the Recommendation section only: Chairman may add contextual commentary, clearly labeled as Chairman commentary.
Consensus = ALL three reviewers agree on the SAME recommendation.
| Situation | Correct Action | |-----------|----------------| | All 3 recommend PostgreSQL | Report consensus: PostgreSQL | | Claude: PostgreSQL, Gemini: MongoDB, Codex: Either | Report divergence (NO consensus) | | 2 agree, 1 dissents strongly | Report divergence, not "consensus with minor dissent" |
STRONG DISAGREE must appear as STRONG DISAGREE.
| Reviewer Said | Chairman Must Report | Chairman Must NOT Report | |---------------|---------------------|-------------------------| | "STRONGLY DISAGREE - this is overkill" | Strong disagreement on appropriateness | "Minor dissent about complexity" | | "Recommend against - 3x development time" | Recommendation against with specific concerns | "Generic complexity argument" |
Chairman Additions Rule:
Last verified: 2026-02 (review quarterly as models update)
When synthesizing divergent opinions, weight each model based on the question domain:
| Member | Primary Strengths | Weight Higher When | |--------|-------------------|-------------------| | claude | Nuanced trade-off reasoning, instruction coherence across long context, risk/impact assessment | Architecture decisions, requirement ambiguity resolution, risk evaluation | | codex | Code-level feasibility analysis, implementation cost/complexity estimation, API contract design | "Is this buildable?" questions, implementation approach choices, technical debt evaluation | | gemini | Broad factual grounding, alternative solution discovery, edge case identification | Technology comparisons, "what are we missing?" questions, assumption challenges |
Application rules:
Reviewers may fail due to CLI unavailability, timeout, or errors. This is NOT quorum logic.
Critical distinction:
Decision tree:
overallState === 'done' AND all members have terminal states?missing_cli / timed_out / error / canceled → Degradation applies| Responses | Action | Output Modification | |-----------|--------|---------------------| | 3/3 | Full synthesis | Standard advisory format | | 2/3 | Partial synthesis | Prepend: "Partial advisory (2/3 respondents). [failed_member] unavailable: [state]. Synthesis lacks [failed_member]'s perspective." | | 1/3 | Single response report | Prepend: "Limited advisory (1/3 respondents). [failed_members] unavailable. Presenting single response from [available_member] without synthesis. Treat as individual opinion, not council advisory." | | 0/3 | Failure report | "Council advisory unavailable. All members failed: [list states]. No synthesis possible." |
Partial synthesis rules:
Volume of context does NOT change input handling. Context is reference material — don't invent scope from it. Follow input handling regardless of how much you've read.
After outputting the advisory, your task is COMPLETE. Do NOT run any additional tools. Return the advisory and stop.
tools
Use when creating, refining, or managing requirement-stage PM tickets. Triggers include "요구사항 티켓 만들어", "티켓 정리", "이 요구사항 이슈로", "티켓 써줘", "requirement to ticket", "manage ticket", "file this requirement", "이슈로 만들어", "티켓 작성", "요구사항 이슈화".
development
Autonomous objective-pursuit orchestrator — wraps deep-interview/prometheus/sisyphus, then re-pursues the objective across plan/execute cycles until an objective-level argus completion gate confirms the verification surface is met.
tools
Use at the end of a work session to review the WHOLE session and record entities worth pinning. This is the manual, deliberate complete-sweep review — NOT an automated nudge. Triggers on "wrap up", "wrap-up", "session wrap", "end of session", "what should I pin".
documentation
Use when initializing the pins knowledge graph for the first time in a project. Guides the user through creating pins.yaml (the storage manifest). Triggers on "setup pins", "initialize pins", "create pins.yaml", "first-run pins".