Review Task

Audits and enriches beads tasks created by /create-task (or any source) so that when a developer picks up the work, they can focus purely on implementation. Catches stale references, missing edge cases, scope bloat, design flaws, and ambiguity — while changes are still cheap (editing a description vs. reworking code).

Core principle: A task that survives this review should require zero research from the implementing developer. They read, they code, they ship.

Key capability: Domain-specific skills (testing, unsafe, distributed systems, SIMD, performance, security, etc.) are dispatched automatically when the task touches their domain. This adds specialized depth that generalist reviewers cannot provide.

Invocation

/review-task <task-id>
/review-task <task-id> --deep                 # Dispatch /deep-research for gaps
/review-task <task-id> --deeper               # Dispatch /deeper-research for gaps
/review-task <task-id> --split-only           # Skip verification, assess scope and split
/review-task <task-id> --dry-run              # Print findings without modifying the task
/review-task <task-id> --skip=performance     # Drop a specialist
/review-task <task-id> --focus=distributed    # Adds domain context to all agents

Flags:

| Flag | Effect | |------|--------| | --deep | Dispatch /deep-research for any identified knowledge gaps | | --deeper | Dispatch /deeper-research for critical knowledge gaps | | --split-only | Skip verification phases, assess scope and split if needed | | --dry-run | Print all findings and proposed edits without modifying the task | | --skip=<agent> | Drop a specialist (references, edge-cases, feasibility, scope). Min 2 must run. | | --focus=<domain> | Adds domain context to all agent prompts (e.g., distributed, concurrency, unsafe) | | --no-research | Never dispatch research skills, even if gaps are found (just flag them) | | --no-domain | Skip domain skill enrichment (Phase 1.5). Only run generalist verification. | | --domain=<skill> | Force-dispatch a specific domain skill regardless of auto-detection (e.g., --domain=test-strategy) |

Phase 0 — Load & Understand

Fetch the task:
```
bd show <task-id>
```
Parse the task description and extract:
- Title, type, priority, labels
- All file paths and line numbers referenced
- All struct/trait/function/type names mentioned
- All code snippets embedded in the description
- Acceptance criteria
- Related tasks and dependencies
- Any [NEEDS ENRICHMENT] markers from /create-task
Quick pre-flight checks (fail fast):
- If the task has no description or a stub description (< 5 lines), tell the user and recommend running /create-task first. Stop.
- If the task is already closed, warn and ask before proceeding.

Build a Review Brief:

## Review Brief

### Task
- **ID**: {task-id}
- **Title**: {title}
- **Type**: {type} | **Priority**: P{priority}
- **Labels**: {labels}

### Referenced Files
| File | Lines | Exists? |
|------|-------|---------|
{for each file path in the description, check with Glob/ls}

### Referenced Identifiers
| Name | Kind | Found? | Location |
|------|------|--------|----------|
{for each type/function/trait name, Grep the codebase}

### NEEDS ENRICHMENT Markers
{list any sections marked as needing enrichment}

### Scope Indicators
- Files referenced: {count}
- Modules crossed: {count of distinct top-level crate directories}
- Has acceptance criteria: {yes/no}
- Has code snippets: {yes/no}
- Description length: {line count}

### Domain Signals
{detected domain signals — see Domain Skill Dispatch Table below}

Detect domain signals for Phase 1.5 enrichment. Scan the task description AND referenced files for the triggers listed in the Domain Skill Dispatch Table (below). Record which skills should be dispatched.
If --split-only, skip to Phase 3 (Scope Assessment agent only).

Phase 1 — Verification (3-4 Parallel Agents)

Launch all selected agents in a single message using the Task tool with subagent_type=general-purpose. Each agent gets the full task description + Review Brief but a different verification lens.

Common Preamble (included in every agent's prompt)

You are a specialist task reviewer. You have ONE job: review the task below
through the lens of {SPECIALTY}. Ignore issues outside your specialty — other
specialists cover those.

## Task Under Review

{FULL_TASK_DESCRIPTION}

## Review Brief

{REVIEW_BRIEF}

## Rules

- Only report findings within your specialty. Do NOT stray.
- Only report findings that REQUIRE action. No nits, no "nice to have."
- Be concrete: cite the specific task section, quoted text, file path, or code
  snippet for every finding.
- Explore the codebase (Glob, Grep, Read) to ground findings in reality. The
  most valuable findings come from gaps between what the task says and what the
  codebase actually contains.
- For each finding, state the PROBLEM and the RECOMMENDED FIX (specific text
  to add, remove, or change in the task description).
- Rate each finding:
  - severity: BLOCKER / MAJOR / MINOR
    - BLOCKER: task cannot be implemented as written (wrong file paths, incorrect
      API, missing critical context)
    - MAJOR: task can be implemented but will likely produce incorrect or
      incomplete results (missing edge case, stale pattern, design flaw)
    - MINOR: task is implementable but could be clearer or more complete
  - confidence (0-100): How sure are you this is a real issue?

## Output Format

Return a markdown document starting with:
`# {SPECIALTY} Review`

Then a findings list. For each finding:

### Finding N: {title}

- **Task section**: {which part of the task description}
- **Problem**: {what's wrong or missing}
- **Evidence**: {codebase evidence — file paths, actual code, grep results}
- **Recommended fix**: {specific edit to the task description}
- **Severity**: {BLOCKER / MAJOR / MINOR}
- **Confidence**: {N}%

End with: "Total findings: N" (0 is valid — do not invent issues).

Agent 1 — Reference Accuracy

Your specialty: REFERENCE ACCURACY

Focus exclusively on:
- Do all file paths in the task exist? Have any moved or been renamed?
- Do line numbers match? Read cited files and verify the code at those lines
  matches what the task quotes.
- Do referenced struct names, trait names, function signatures, enum variants,
  and type aliases actually exist in the codebase?
- Are code snippets in the task accurate copies of the current codebase, or
  have they drifted?
- Does the task reference design docs? If so, does the design doc content
  match what the task claims?
- Are dependency/crate references current (Cargo.toml)?
- Do referenced beads task IDs in the "Related Work" section exist?
  Run `bd show <id>` for each.

For every file path: Read the file. For every code snippet: compare character
by character. For every type name: Grep the codebase. Be exhaustive.

Do NOT review edge cases, design quality, or scope — other specialists do that.

Agent 2 — Edge Cases & Completeness

Your specialty: EDGE CASES & COMPLETENESS

Focus exclusively on:
- What inputs, states, or conditions does the task not address?
  - Empty/zero/nil inputs
  - Boundary values (max capacity, zero-length, single element)
  - Concurrent access patterns if the code is shared across threads
  - Error paths and failure modes not mentioned
  - Rollback/cleanup on partial failure
- Does the "Desired State" cover ALL cases, or only the happy path?
- Are the acceptance criteria specific enough to verify? Could an
  implementer satisfy the criteria while missing the actual intent?
- Does the task account for existing callers/consumers of modified APIs?
  (Grep for call sites the task doesn't mention)
- Are there related invariants documented in design docs or code comments
  that the task should preserve but doesn't mention?
- Does the implementation guidance miss files that will obviously need
  changes? (e.g., mod.rs re-exports, test files, Cargo.toml features)

For each edge case found: describe the scenario, explain what would go wrong,
and propose a specific addition to the task description.

Do NOT review reference accuracy, design alternatives, or scope — other
specialists do that.

Agent 3 — Feasibility & Design

Your specialty: FEASIBILITY & DESIGN

Focus exclusively on:
- Is the proposed approach actually feasible given the codebase architecture?
  Read the relevant modules and assess whether the task's implementation
  guidance is compatible with existing patterns.
- Are there simpler alternatives the task overlooks? Check for existing
  utilities in gossip-stdx, sibling modules, or trait implementations that
  could be reused (per duplication prevention rules).
- Does the approach introduce unnecessary complexity? (New abstractions,
  generics, indirection that aren't justified)
- Are there performance concerns the task should flag?
  - Hot path allocations (check if touched code is in HOT tier)
  - Lock contention or oversized critical sections
  - Unbounded growth patterns
- Does the approach contradict any project conventions?
  - No-versioning policy (no V1/V2, no deprecated, no compatibility shims)
  - Allocation policy tiers (HOT/WARM/COLD)
  - Comment policy (no tracking IDs, no temporal narration)
- Are there design trade-offs the task should document but doesn't?
- Will the approach compose well with in-flight work? Check `bd list --status=in_progress`
  for potentially conflicting changes.

Do NOT review reference accuracy, edge case enumeration, or scope — other
specialists do that.

Agent 4 — Scope Assessment

Your specialty: SCOPE ASSESSMENT

Focus exclusively on:
- Is this task appropriately sized for a single implementation session?
  A well-scoped task modifies 1-4 files in 1-2 modules. Flag if it exceeds:
  - 6+ files modified
  - 3+ modules crossed
  - 3+ distinct behavioral changes
  - Both production code AND test infrastructure changes that could be separate
- Can this task be decomposed into independent sub-tasks that deliver
  incremental value? If so, propose specific splits with:
  - Sub-task title
  - Which files/sections of the original task belong to each
  - Dependency ordering between sub-tasks
  - Whether each sub-task is independently testable
- Does the task mix concerns? Common anti-patterns:
  - Refactor + new feature in one task
  - Bug fix + performance optimization in one task
  - API change + migration of all callers in one task
- Are there prerequisite tasks that should be extracted?
  (e.g., "add trait X" before "implement trait X for types A, B, C")
- Is the task UNDER-scoped? Does it describe a change that won't be useful
  without follow-up work that isn't tracked?

For each scope finding, provide a concrete split recommendation with titles,
file assignments, and dependency ordering.

Do NOT review reference accuracy, edge cases, or design quality — other
specialists do that.

Phase 1.5 — Domain Enrichment

After Phase 1 specialists complete but before synthesis, dispatch domain-specific skills to provide specialized depth. Skip this phase if --no-domain is set.

Domain Skill Dispatch Table

The orchestrator detects signals from the task description, referenced files, and Phase 1 findings to determine which domain skills to dispatch. Maximum 3 domain skills per review to keep scope manageable.

| Signal | Skill | What It Adds to the Task | |--------|-------|--------------------------| | Task mentions testing strategy, or acceptance criteria lack test type guidance, or task touches code with no test coverage | /test-strategy | Specific test types (unit, rstest, proptest, fuzz, Kani, sim), patterns, and commands | | Referenced files contain unsafe blocks, or task adds new unsafe code | /unsafe-review | Safety invariant audit, test coverage matrix (Miri/Kani/fuzz/proptest gaps) | | Task adds unsafe AND needs safe public API wrapping | /safe-over-unsafe | API boundary design, module privacy soundness checklist | | Referenced files are in gossip-coordination/, gossip-contracts/src/coordination/, or task touches leases/shards/epochs/fencing | /dist-sys-auditor | Citation verification, locked decision compliance, anti-pattern check | | Task touches coordination AND mentions simulation or fault tolerance | /sim-review | DST compatibility check, sans-IO pattern enforcement | | Task is labeled performance or touches HOT-tier code paths, or Phase 1 feasibility agent flagged performance concerns | /performance-analyzer | Allocation audit, cache analysis, hot-path verification | | Task involves SIMD, vectorization, or std::arch intrinsics | /simd-optimize | ISA detection, pattern classification, implementation strategy | | Task involves assembly-level optimization or codegen quality | /asm-forge | ASM audit scope, codegen red flags to include in task guidance | | Task modifies public API surface (pub fn, pub struct, pub trait) | /interface-design-review | Misuse-resistance audit, enforcement hierarchy check | | Task touches parsing, buffer handling, or security-sensitive operations | /security-reviewer | Memory safety audit, CWE mapping, high-risk file identification |

Detection Logic

File-based signals: For each referenced file, check which crate and module it belongs to. Map to domain skills:
- gossip-coordination/ or gossip-contracts/src/coordination/ → /dist-sys-auditor
- gossip-stdx/src/ data structures with unsafe → /unsafe-review
- scanner-engine/src/engine/ hot paths → /performance-analyzer
- Files containing std::arch:: or SIMD intrinsics → /simd-optimize
Content-based signals: Grep the task description for keywords:
- "unsafe", "SAFETY:", "raw pointer", "MaybeUninit" → /unsafe-review
- "lease", "epoch", "fence", "shard", "cursor", "coordination" → /dist-sys-auditor
- "SIMD", "vectorize", "intrinsic", "NEON", "AVX" → /simd-optimize
- "benchmark", "latency", "throughput", "hot path", "allocation" → /performance-analyzer
- "test strategy", "property test", "fuzz", "simulation" → /test-strategy
- "public API", "trait", "builder", "interface" → /interface-design-review
Phase 1 escalation: If a Phase 1 specialist flags a domain-specific concern that they cannot fully evaluate (e.g., feasibility agent says "this touches unsafe code but I can't assess soundness"), escalate to the corresponding domain skill.
Forced dispatch: If --domain=<skill> is set, always include that skill regardless of auto-detection.
Priority when > 3 skills triggered: Rank by relevance to the task type:
- Bug tasks: prioritize /test-strategy, /unsafe-review, /security-reviewer
- Feature tasks: prioritize /interface-design-review, /dist-sys-auditor, /test-strategy
- Performance tasks: prioritize /performance-analyzer, /asm-forge, /simd-optimize
- Safety tasks: prioritize /unsafe-review, /safe-over-unsafe, /security-reviewer

Dispatch Protocol

For each selected domain skill, the orchestrator:

Invokes the skill with a scoped prompt. The skill receives:
- The full task description
- The specific files/code relevant to its domain (not the entire codebase)
- A directive to produce an enrichment report (not a full audit):
You are being invoked as a domain enrichment step during task review. Your job is NOT to do a full audit. Instead, produce a focused report answering: "What domain-specific information should be added to this task description to make it implementable without further research?"

Specifically:
- What domain-specific edge cases or gotchas does the task miss?
- What domain-specific patterns, utilities, or conventions should it reference?
- What domain-specific acceptance criteria should be added?
- What domain-specific risks should be called out?
Keep output concise — aim for 5-15 specific, actionable items.
Collects the enrichment report and passes it to the Phase 2 synthesizer alongside the Phase 1 specialist reports.

Parallel Dispatch

If multiple domain skills are selected, dispatch them in parallel in a single message — they operate on different domains and don't conflict.

If a domain skill invocation fails or times out, proceed without it. Record the failure in the synthesis so the user knows what enrichment was skipped.

Phase 2 — Synthesize Findings (Single Agent)

After Phase 1 specialists AND Phase 1.5 domain enrichment complete, launch 1 synthesizer agent using the Task tool with subagent_type=general-purpose.

Synthesizer Prompt

You are the Task Review Synthesizer. Specialist reviewers have independently
audited the same beads task, and domain-specific skills have provided
specialized enrichment. Your job is to merge ALL inputs into one actionable
report and determine whether the task needs revision, research, splitting,
or is ready for implementation.

## Original Task
{FULL_TASK_DESCRIPTION}

## Specialist Reports (Phase 1)
{ALL_SPECIALIST_REPORTS}

## Domain Enrichment Reports (Phase 1.5)
{ALL_DOMAIN_ENRICHMENT_REPORTS}
(or "No domain skills dispatched" if Phase 1.5 was skipped)

## Your Task

### 1. Deduplicate

Multiple specialists may flag the same underlying issue from different angles.
Group these into single findings and note which specialists flagged it.

### 2. Score & Filter

For every unique finding, assess:

- **Severity**: BLOCKER / MAJOR / MINOR
  - BLOCKER: Task cannot be correctly implemented as written
  - MAJOR: Implementation will likely produce incorrect or incomplete results
  - MINOR: Task is implementable but could be clearer
- **Confidence** (0-100): How sure is this a real issue?

Discard any finding with confidence < 50. Every finding in the final report
must require action.

### 3. Integrate Domain Enrichment

For each domain enrichment report, extract actionable items and classify:

- **Add to task**: Domain-specific edge cases, patterns, conventions,
  acceptance criteria, or risk callouts that should be folded into the task
  description.
- **Contradicts task**: Domain skill found something that conflicts with the
  task's proposed approach. Elevate to a MAJOR or BLOCKER finding.
- **Confirms task**: Domain skill validated the approach. Note as supporting
  evidence (no action needed).

### 4. Classify Into Verdicts

Based on the surviving findings (from both specialists and domain enrichment),
assign the task ONE overall verdict:

- **READY**: 0 BLOCKERs, 0-2 MAJORs, task is implementable. List MINORs as
  optional improvements.
- **REVISE**: 0 BLOCKERs, but 3+ MAJORs or significant gaps, or domain
  enrichment produced items that should be folded into the task. Task needs
  specific edits before implementation.
- **RESEARCH**: Findings reveal ambiguity or unknowns that cannot be resolved
  by reading the codebase alone — external research is needed. Flag specific
  questions for `/deep-research` or `/deeper-research`.
- **SPLIT**: Task scope is too large for a single implementation session.
  Propose concrete sub-tasks.
- **REWORK**: 2+ BLOCKERs. Task description is fundamentally flawed. Recommend
  re-running `/create-task` with corrected information.

A task can receive multiple verdicts (e.g., REVISE + SPLIT).

### 5. Research Questions (if verdict includes RESEARCH)

For each knowledge gap that requires external research:

- **Question**: {specific question that needs answering}
- **Why it matters**: {impact on the implementation}
- **What we know**: {best available evidence from the codebase}
- **What's missing**: {the specific gap}
- **Recommended skill**: `/deep-research` or `/deeper-research`
- **Suggested problem statement**: {ready-to-paste prompt for the research skill}

### 6. Split Recommendations (if verdict includes SPLIT)

For each proposed sub-task:

- **Title**: {imperative statement}
- **Scope**: {which files and sections from the original task}
- **Dependencies**: {which sub-tasks must complete first}
- **Independently testable**: {yes/no}
- **Priority relative to original**: {same / higher / lower}

### 7. Revision Checklist (if verdict includes REVISE)

A numbered list of specific edits to make to the task description. Include
both specialist-sourced revisions and domain enrichment items:

1. {section} — {what to change and why} — source: {specialist / domain skill}
2. ...

Each item must be concrete enough that the edit can be made mechanically.

### Output Format

```markdown
## Task Review Report

**Task**: {id} — {title}
**Verdict**: {READY / REVISE / RESEARCH / SPLIT / REWORK}
**Findings**: {N total — X BLOCKERs, Y MAJORs, Z MINORs}
**Domain skills dispatched**: {list or "none"}

### Findings

| # | Finding | Severity | Confidence | Source |
|---|---------|----------|------------|--------|

**Details:**

#### 1. {Finding title}
- **Problem**: {description}
- **Evidence**: {codebase evidence}
- **Recommended fix**: {specific edit}
- **Source**: {specialist name and/or domain skill name}

### Domain Enrichment Summary

| Domain Skill | Items to Add | Contradictions | Confirmations |
|--------------|-------------|----------------|---------------|
| /test-strategy | 3 | 0 | 1 |
| /unsafe-review | 2 | 1 | 0 |

{Details of each domain enrichment item}

### {Verdict-specific sections as applicable}

Rules

Do NOT add your own findings — synthesize, don't review.
If a specialist's finding seems wrong, lower its confidence. If it drops below 50%, discard it.
Preserve file paths and codebase citations from specialist reports.
Be honest about the verdict. Do not inflate READY to avoid work. A task that ships with BLOCKERs wastes more time than revising the description.


---

## Phase 3 — Act on Verdict

Based on the synthesizer's verdict, take the appropriate action. If `--dry-run`
is active, print the proposed actions without executing them.

### Verdict: READY

Present the report to the user. No modifications needed.

Task {id} passed review. Verdict: READY Findings: {summary — e.g., "2 MINORs (optional improvements)"}

{list MINORs if any, as optional suggestions}


### Verdict: REVISE

Apply the revision checklist to the task description:

1. Read the current task description: `bd show <task-id>`
2. Apply each revision from the checklist.
3. If `--dry-run`, print the diff and stop.
4. Update the task: `bd update <task-id> --description="..."`
5. Show the updated task to the user.

Task {id} revised. Verdict: REVISE Changes applied: {count} {summary of each change}


### Verdict: RESEARCH

Dispatch the appropriate research skill for each identified gap:

1. If `--no-research` is set, present the research questions to the user
   and stop. The user can manually run `/deep-research` or `/deeper-research`.

2. If `--deep` is set (or inferred from the gap severity):
   - Invoke `/deep-research` with the suggested problem statement.
   - After research completes, fold key findings into the task description:
     add them to the Implementation Guidance, Design Notes, or Risk Analysis
     sections as appropriate.
   - Update the task: `bd update <task-id> --description="..."`

3. If `--deeper` is set (for critical/highest-stakes gaps):
   - Invoke `/deeper-research` with the suggested problem statement.
   - Fold findings into the task description as above.
   - Update the task.

4. After research is folded in, re-run Phase 1-2 (without the research
   dispatch) to verify the enriched task is now READY or REVISE.

### Verdict: SPLIT

Create sub-tasks from the split recommendations:

1. For each proposed sub-task:
   - Extract the relevant sections from the original task description.
   - Create a new task using `/create-task --quick` with the extracted scope.
     Use `--quick` because the context already exists — no need for fresh
     research.
   - Register dependencies: `bd dep add <child-id> <dependency-id>`
   - Set the parent: if the original task is an epic, use `--parent=<original-id>`.
     Otherwise, convert the original task to an epic first:
     `bd update <original-id> --type=epic`

2. Present the split to the user:

Task {id} split into {N} sub-tasks:

{original-id} (epic): {original title} ├── {child-1-id}: {title} [no dependencies] ├── {child-2-id}: {title} [depends on {child-1-id}] └── {child-3-id}: {title} [depends on {child-1-id}]


3. If `--dry-run`, print the proposed sub-tasks with descriptions without
creating them.

### Verdict: REWORK

Do NOT attempt to patch the task. Present the findings and recommend
re-running `/create-task`:

Task {id} needs rework. Verdict: REWORK ({N} BLOCKERs found)

BLOCKERs:

{finding title}: {problem}
{finding title}: {problem}

Recommendation: Re-run /create-task with corrected context. Key corrections:

{what was wrong in the original task}
{what the correct information is}


### Compound Verdicts

When multiple verdicts apply (e.g., REVISE + SPLIT):

1. Apply REVISE first (fix the content).
2. Then apply SPLIT (decompose the corrected task).
3. If RESEARCH is part of the compound, research first, then revise, then split.

Order: RESEARCH → REVISE → SPLIT. REWORK supersedes all others.

---

## Convergence Defaults

Use **fast mode** (2 agents) when all of these are true:

- Task modifies <= 3 files
- Task crosses <= 1 module boundary
- No unsafe code, concurrency, or distributed systems concerns
- Task type is not `epic`
- No `[NEEDS ENRICHMENT]` markers

Fast mode agents: **Reference Accuracy** + **Edge Cases & Completeness**.

Use **standard mode** (4 agents) for everything else.

---

## Anti-Patterns

| Anti-Pattern | Why It's Wrong | Do This Instead |
|--------------|----------------|-----------------|
| Approving a task with stale file paths | Implementer wastes time finding moved code | Verify every path with Glob/Read |
| Adding findings without codebase evidence | Speculation wastes revision effort | Grep and Read before claiming a problem |
| Splitting a task that's already well-scoped | Creates overhead without benefit | Only split when scope criteria are exceeded |
| Patching a fundamentally broken task | Lipstick on a pig — BLOCKERs compound | REWORK verdict, re-run /create-task |
| Running /deeper-research for simple gaps | Token-expensive overkill | Use /deep-research for standard gaps, codebase reading for simple ones |
| Skipping review for "obvious" tasks | Obvious tasks have the most hidden assumptions | Review everything; fast mode exists for small tasks |
| Revising without showing the user | User loses visibility into what changed | Always present changes before or after applying |
| Dispatching 5+ domain skills | Context overload, diminishing returns | Cap at 3 domain skills, prioritize by task type |
| Running domain skills on tasks with no domain-specific code | Wasted tokens, noise in the report | Let auto-detection decide; only force with --domain when justified |
| Ignoring domain skill contradictions | Domain expert found a real problem the generalist missed | Always elevate contradictions to findings |

## Related Skills

**Complementary pair:**
- `/create-task` — creates the tasks this skill reviews

**Research (dispatched for knowledge gaps in Phase 3):**
- `/deep-research` — 7-agent research for standard gaps
- `/deeper-research` — 21-agent research for critical gaps

**Domain enrichment (dispatched in Phase 1.5):**
- `/test-strategy` — test type recommendations (unit, rstest, proptest, fuzz, Kani, sim)
- `/unsafe-review` — safety invariant audit, test coverage matrix
- `/safe-over-unsafe` — safe API boundary design for unsafe internals
- `/dist-sys-auditor` — distributed systems citation and invariant verification
- `/sim-review` — deterministic simulation testability compliance
- `/performance-analyzer` — allocation, cache, and CPU hotspot analysis
- `/simd-optimize` — SIMD pattern classification and ISA strategy
- `/asm-forge` — assembly codegen quality audit
- `/interface-design-review` — API misuse-resistance enforcement
- `/security-reviewer` — memory safety and CWE mapping

**Downstream:**
- `/plan-review` — reviews implementation plans (this skill reviews task descriptions)
- `/plan-forge` — creates implementation plans from tasks
- `/review-dispatch` — reviews code after implementation
- `/execute-review-findings` — executes review findings as tasks

Review Task

Core principle: A task that survives this review should require zero research from the implementing developer. They read, they code, they ship.

Invocation

/review-task <task-id>
/review-task <task-id> --deep                 # Dispatch /deep-research for gaps
/review-task <task-id> --deeper               # Dispatch /deeper-research for gaps
/review-task <task-id> --split-only           # Skip verification, assess scope and split
/review-task <task-id> --dry-run              # Print findings without modifying the task
/review-task <task-id> --skip=performance     # Drop a specialist
/review-task <task-id> --focus=distributed    # Adds domain context to all agents

Flags:

Phase 0 — Load & Understand

Fetch the task:
```
bd show <task-id>
```
Parse the task description and extract:
- Title, type, priority, labels
- All file paths and line numbers referenced
- All struct/trait/function/type names mentioned
- All code snippets embedded in the description
- Acceptance criteria
- Related tasks and dependencies
- Any [NEEDS ENRICHMENT] markers from /create-task
Quick pre-flight checks (fail fast):
- If the task has no description or a stub description (< 5 lines), tell the user and recommend running /create-task first. Stop.
- If the task is already closed, warn and ask before proceeding.

Build a Review Brief:

## Review Brief

### Task
- **ID**: {task-id}
- **Title**: {title}
- **Type**: {type} | **Priority**: P{priority}
- **Labels**: {labels}

### Referenced Files
| File | Lines | Exists? |
|------|-------|---------|
{for each file path in the description, check with Glob/ls}

### Referenced Identifiers
| Name | Kind | Found? | Location |
|------|------|--------|----------|
{for each type/function/trait name, Grep the codebase}

### NEEDS ENRICHMENT Markers
{list any sections marked as needing enrichment}

### Scope Indicators
- Files referenced: {count}
- Modules crossed: {count of distinct top-level crate directories}
- Has acceptance criteria: {yes/no}
- Has code snippets: {yes/no}
- Description length: {line count}

### Domain Signals
{detected domain signals — see Domain Skill Dispatch Table below}

Detect domain signals for Phase 1.5 enrichment. Scan the task description AND referenced files for the triggers listed in the Domain Skill Dispatch Table (below). Record which skills should be dispatched.
If --split-only, skip to Phase 3 (Scope Assessment agent only).

Phase 1 — Verification (3-4 Parallel Agents)

Common Preamble (included in every agent's prompt)

You are a specialist task reviewer. You have ONE job: review the task below
through the lens of {SPECIALTY}. Ignore issues outside your specialty — other
specialists cover those.

## Task Under Review

{FULL_TASK_DESCRIPTION}

## Review Brief

{REVIEW_BRIEF}

## Rules

- Only report findings within your specialty. Do NOT stray.
- Only report findings that REQUIRE action. No nits, no "nice to have."
- Be concrete: cite the specific task section, quoted text, file path, or code
  snippet for every finding.
- Explore the codebase (Glob, Grep, Read) to ground findings in reality. The
  most valuable findings come from gaps between what the task says and what the
  codebase actually contains.
- For each finding, state the PROBLEM and the RECOMMENDED FIX (specific text
  to add, remove, or change in the task description).
- Rate each finding:
  - severity: BLOCKER / MAJOR / MINOR
    - BLOCKER: task cannot be implemented as written (wrong file paths, incorrect
      API, missing critical context)
    - MAJOR: task can be implemented but will likely produce incorrect or
      incomplete results (missing edge case, stale pattern, design flaw)
    - MINOR: task is implementable but could be clearer or more complete
  - confidence (0-100): How sure are you this is a real issue?

## Output Format

Return a markdown document starting with:
`# {SPECIALTY} Review`

Then a findings list. For each finding:

### Finding N: {title}

- **Task section**: {which part of the task description}
- **Problem**: {what's wrong or missing}
- **Evidence**: {codebase evidence — file paths, actual code, grep results}
- **Recommended fix**: {specific edit to the task description}
- **Severity**: {BLOCKER / MAJOR / MINOR}
- **Confidence**: {N}%

End with: "Total findings: N" (0 is valid — do not invent issues).

Agent 1 — Reference Accuracy

Your specialty: REFERENCE ACCURACY

Focus exclusively on:
- Do all file paths in the task exist? Have any moved or been renamed?
- Do line numbers match? Read cited files and verify the code at those lines
  matches what the task quotes.
- Do referenced struct names, trait names, function signatures, enum variants,
  and type aliases actually exist in the codebase?
- Are code snippets in the task accurate copies of the current codebase, or
  have they drifted?
- Does the task reference design docs? If so, does the design doc content
  match what the task claims?
- Are dependency/crate references current (Cargo.toml)?
- Do referenced beads task IDs in the "Related Work" section exist?
  Run `bd show <id>` for each.

For every file path: Read the file. For every code snippet: compare character
by character. For every type name: Grep the codebase. Be exhaustive.

Do NOT review edge cases, design quality, or scope — other specialists do that.

Agent 2 — Edge Cases & Completeness

Your specialty: EDGE CASES & COMPLETENESS

Focus exclusively on:
- What inputs, states, or conditions does the task not address?
  - Empty/zero/nil inputs
  - Boundary values (max capacity, zero-length, single element)
  - Concurrent access patterns if the code is shared across threads
  - Error paths and failure modes not mentioned
  - Rollback/cleanup on partial failure
- Does the "Desired State" cover ALL cases, or only the happy path?
- Are the acceptance criteria specific enough to verify? Could an
  implementer satisfy the criteria while missing the actual intent?
- Does the task account for existing callers/consumers of modified APIs?
  (Grep for call sites the task doesn't mention)
- Are there related invariants documented in design docs or code comments
  that the task should preserve but doesn't mention?
- Does the implementation guidance miss files that will obviously need
  changes? (e.g., mod.rs re-exports, test files, Cargo.toml features)

For each edge case found: describe the scenario, explain what would go wrong,
and propose a specific addition to the task description.

Do NOT review reference accuracy, design alternatives, or scope — other
specialists do that.

Agent 3 — Feasibility & Design

Your specialty: FEASIBILITY & DESIGN

Focus exclusively on:
- Is the proposed approach actually feasible given the codebase architecture?
  Read the relevant modules and assess whether the task's implementation
  guidance is compatible with existing patterns.
- Are there simpler alternatives the task overlooks? Check for existing
  utilities in gossip-stdx, sibling modules, or trait implementations that
  could be reused (per duplication prevention rules).
- Does the approach introduce unnecessary complexity? (New abstractions,
  generics, indirection that aren't justified)
- Are there performance concerns the task should flag?
  - Hot path allocations (check if touched code is in HOT tier)
  - Lock contention or oversized critical sections
  - Unbounded growth patterns
- Does the approach contradict any project conventions?
  - No-versioning policy (no V1/V2, no deprecated, no compatibility shims)
  - Allocation policy tiers (HOT/WARM/COLD)
  - Comment policy (no tracking IDs, no temporal narration)
- Are there design trade-offs the task should document but doesn't?
- Will the approach compose well with in-flight work? Check `bd list --status=in_progress`
  for potentially conflicting changes.

Do NOT review reference accuracy, edge case enumeration, or scope — other
specialists do that.

Agent 4 — Scope Assessment

Your specialty: SCOPE ASSESSMENT

Focus exclusively on:
- Is this task appropriately sized for a single implementation session?
  A well-scoped task modifies 1-4 files in 1-2 modules. Flag if it exceeds:
  - 6+ files modified
  - 3+ modules crossed
  - 3+ distinct behavioral changes
  - Both production code AND test infrastructure changes that could be separate
- Can this task be decomposed into independent sub-tasks that deliver
  incremental value? If so, propose specific splits with:
  - Sub-task title
  - Which files/sections of the original task belong to each
  - Dependency ordering between sub-tasks
  - Whether each sub-task is independently testable
- Does the task mix concerns? Common anti-patterns:
  - Refactor + new feature in one task
  - Bug fix + performance optimization in one task
  - API change + migration of all callers in one task
- Are there prerequisite tasks that should be extracted?
  (e.g., "add trait X" before "implement trait X for types A, B, C")
- Is the task UNDER-scoped? Does it describe a change that won't be useful
  without follow-up work that isn't tracked?

For each scope finding, provide a concrete split recommendation with titles,
file assignments, and dependency ordering.

Do NOT review reference accuracy, edge cases, or design quality — other
specialists do that.

Phase 1.5 — Domain Enrichment

After Phase 1 specialists complete but before synthesis, dispatch domain-specific skills to provide specialized depth. Skip this phase if --no-domain is set.

Domain Skill Dispatch Table

Detection Logic

File-based signals: For each referenced file, check which crate and module it belongs to. Map to domain skills:
- gossip-coordination/ or gossip-contracts/src/coordination/ → /dist-sys-auditor
- gossip-stdx/src/ data structures with unsafe → /unsafe-review
- scanner-engine/src/engine/ hot paths → /performance-analyzer
- Files containing std::arch:: or SIMD intrinsics → /simd-optimize
Content-based signals: Grep the task description for keywords:
- "unsafe", "SAFETY:", "raw pointer", "MaybeUninit" → /unsafe-review
- "lease", "epoch", "fence", "shard", "cursor", "coordination" → /dist-sys-auditor
- "SIMD", "vectorize", "intrinsic", "NEON", "AVX" → /simd-optimize
- "benchmark", "latency", "throughput", "hot path", "allocation" → /performance-analyzer
- "test strategy", "property test", "fuzz", "simulation" → /test-strategy
- "public API", "trait", "builder", "interface" → /interface-design-review
Phase 1 escalation: If a Phase 1 specialist flags a domain-specific concern that they cannot fully evaluate (e.g., feasibility agent says "this touches unsafe code but I can't assess soundness"), escalate to the corresponding domain skill.
Forced dispatch: If --domain=<skill> is set, always include that skill regardless of auto-detection.
Priority when > 3 skills triggered: Rank by relevance to the task type:
- Bug tasks: prioritize /test-strategy, /unsafe-review, /security-reviewer
- Feature tasks: prioritize /interface-design-review, /dist-sys-auditor, /test-strategy
- Performance tasks: prioritize /performance-analyzer, /asm-forge, /simd-optimize
- Safety tasks: prioritize /unsafe-review, /safe-over-unsafe, /security-reviewer

Dispatch Protocol

For each selected domain skill, the orchestrator:

Invokes the skill with a scoped prompt. The skill receives:
- The full task description
- The specific files/code relevant to its domain (not the entire codebase)
- A directive to produce an enrichment report (not a full audit):
You are being invoked as a domain enrichment step during task review. Your job is NOT to do a full audit. Instead, produce a focused report answering: "What domain-specific information should be added to this task description to make it implementable without further research?"

Specifically:
- What domain-specific edge cases or gotchas does the task miss?
- What domain-specific patterns, utilities, or conventions should it reference?
- What domain-specific acceptance criteria should be added?
- What domain-specific risks should be called out?
Keep output concise — aim for 5-15 specific, actionable items.
Collects the enrichment report and passes it to the Phase 2 synthesizer alongside the Phase 1 specialist reports.

Parallel Dispatch

If multiple domain skills are selected, dispatch them in parallel in a single message — they operate on different domains and don't conflict.

If a domain skill invocation fails or times out, proceed without it. Record the failure in the synthesis so the user knows what enrichment was skipped.

Phase 2 — Synthesize Findings (Single Agent)

After Phase 1 specialists AND Phase 1.5 domain enrichment complete, launch 1 synthesizer agent using the Task tool with subagent_type=general-purpose.

Synthesizer Prompt

You are the Task Review Synthesizer. Specialist reviewers have independently
audited the same beads task, and domain-specific skills have provided
specialized enrichment. Your job is to merge ALL inputs into one actionable
report and determine whether the task needs revision, research, splitting,
or is ready for implementation.

## Original Task
{FULL_TASK_DESCRIPTION}

## Specialist Reports (Phase 1)
{ALL_SPECIALIST_REPORTS}

## Domain Enrichment Reports (Phase 1.5)
{ALL_DOMAIN_ENRICHMENT_REPORTS}
(or "No domain skills dispatched" if Phase 1.5 was skipped)

## Your Task

### 1. Deduplicate

Multiple specialists may flag the same underlying issue from different angles.
Group these into single findings and note which specialists flagged it.

### 2. Score & Filter

For every unique finding, assess:

- **Severity**: BLOCKER / MAJOR / MINOR
  - BLOCKER: Task cannot be correctly implemented as written
  - MAJOR: Implementation will likely produce incorrect or incomplete results
  - MINOR: Task is implementable but could be clearer
- **Confidence** (0-100): How sure is this a real issue?

Discard any finding with confidence < 50. Every finding in the final report
must require action.

### 3. Integrate Domain Enrichment

For each domain enrichment report, extract actionable items and classify:

- **Add to task**: Domain-specific edge cases, patterns, conventions,
  acceptance criteria, or risk callouts that should be folded into the task
  description.
- **Contradicts task**: Domain skill found something that conflicts with the
  task's proposed approach. Elevate to a MAJOR or BLOCKER finding.
- **Confirms task**: Domain skill validated the approach. Note as supporting
  evidence (no action needed).

### 4. Classify Into Verdicts

Based on the surviving findings (from both specialists and domain enrichment),
assign the task ONE overall verdict:

- **READY**: 0 BLOCKERs, 0-2 MAJORs, task is implementable. List MINORs as
  optional improvements.
- **REVISE**: 0 BLOCKERs, but 3+ MAJORs or significant gaps, or domain
  enrichment produced items that should be folded into the task. Task needs
  specific edits before implementation.
- **RESEARCH**: Findings reveal ambiguity or unknowns that cannot be resolved
  by reading the codebase alone — external research is needed. Flag specific
  questions for `/deep-research` or `/deeper-research`.
- **SPLIT**: Task scope is too large for a single implementation session.
  Propose concrete sub-tasks.
- **REWORK**: 2+ BLOCKERs. Task description is fundamentally flawed. Recommend
  re-running `/create-task` with corrected information.

A task can receive multiple verdicts (e.g., REVISE + SPLIT).

### 5. Research Questions (if verdict includes RESEARCH)

For each knowledge gap that requires external research:

- **Question**: {specific question that needs answering}
- **Why it matters**: {impact on the implementation}
- **What we know**: {best available evidence from the codebase}
- **What's missing**: {the specific gap}
- **Recommended skill**: `/deep-research` or `/deeper-research`
- **Suggested problem statement**: {ready-to-paste prompt for the research skill}

### 6. Split Recommendations (if verdict includes SPLIT)

For each proposed sub-task:

- **Title**: {imperative statement}
- **Scope**: {which files and sections from the original task}
- **Dependencies**: {which sub-tasks must complete first}
- **Independently testable**: {yes/no}
- **Priority relative to original**: {same / higher / lower}

### 7. Revision Checklist (if verdict includes REVISE)

A numbered list of specific edits to make to the task description. Include
both specialist-sourced revisions and domain enrichment items:

1. {section} — {what to change and why} — source: {specialist / domain skill}
2. ...

Each item must be concrete enough that the edit can be made mechanically.

### Output Format

```markdown
## Task Review Report

**Task**: {id} — {title}
**Verdict**: {READY / REVISE / RESEARCH / SPLIT / REWORK}
**Findings**: {N total — X BLOCKERs, Y MAJORs, Z MINORs}
**Domain skills dispatched**: {list or "none"}

### Findings

| # | Finding | Severity | Confidence | Source |
|---|---------|----------|------------|--------|

**Details:**

#### 1. {Finding title}
- **Problem**: {description}
- **Evidence**: {codebase evidence}
- **Recommended fix**: {specific edit}
- **Source**: {specialist name and/or domain skill name}

### Domain Enrichment Summary

| Domain Skill | Items to Add | Contradictions | Confirmations |
|--------------|-------------|----------------|---------------|
| /test-strategy | 3 | 0 | 1 |
| /unsafe-review | 2 | 1 | 0 |

{Details of each domain enrichment item}

### {Verdict-specific sections as applicable}

Rules

Do NOT add your own findings — synthesize, don't review.
If a specialist's finding seems wrong, lower its confidence. If it drops below 50%, discard it.
Preserve file paths and codebase citations from specialist reports.
Be honest about the verdict. Do not inflate READY to avoid work. A task that ships with BLOCKERs wastes more time than revising the description.


---

## Phase 3 — Act on Verdict

Based on the synthesizer's verdict, take the appropriate action. If `--dry-run`
is active, print the proposed actions without executing them.

### Verdict: READY

Present the report to the user. No modifications needed.

Task {id} passed review. Verdict: READY Findings: {summary — e.g., "2 MINORs (optional improvements)"}

{list MINORs if any, as optional suggestions}


### Verdict: REVISE

Apply the revision checklist to the task description:

1. Read the current task description: `bd show <task-id>`
2. Apply each revision from the checklist.
3. If `--dry-run`, print the diff and stop.
4. Update the task: `bd update <task-id> --description="..."`
5. Show the updated task to the user.

Task {id} revised. Verdict: REVISE Changes applied: {count} {summary of each change}


### Verdict: RESEARCH

Dispatch the appropriate research skill for each identified gap:

1. If `--no-research` is set, present the research questions to the user
   and stop. The user can manually run `/deep-research` or `/deeper-research`.

2. If `--deep` is set (or inferred from the gap severity):
   - Invoke `/deep-research` with the suggested problem statement.
   - After research completes, fold key findings into the task description:
     add them to the Implementation Guidance, Design Notes, or Risk Analysis
     sections as appropriate.
   - Update the task: `bd update <task-id> --description="..."`

3. If `--deeper` is set (for critical/highest-stakes gaps):
   - Invoke `/deeper-research` with the suggested problem statement.
   - Fold findings into the task description as above.
   - Update the task.

4. After research is folded in, re-run Phase 1-2 (without the research
   dispatch) to verify the enriched task is now READY or REVISE.

### Verdict: SPLIT

Create sub-tasks from the split recommendations:

1. For each proposed sub-task:
   - Extract the relevant sections from the original task description.
   - Create a new task using `/create-task --quick` with the extracted scope.
     Use `--quick` because the context already exists — no need for fresh
     research.
   - Register dependencies: `bd dep add <child-id> <dependency-id>`
   - Set the parent: if the original task is an epic, use `--parent=<original-id>`.
     Otherwise, convert the original task to an epic first:
     `bd update <original-id> --type=epic`

2. Present the split to the user:

Task {id} split into {N} sub-tasks:


3. If `--dry-run`, print the proposed sub-tasks with descriptions without
creating them.

### Verdict: REWORK

Do NOT attempt to patch the task. Present the findings and recommend
re-running `/create-task`:

Task {id} needs rework. Verdict: REWORK ({N} BLOCKERs found)

BLOCKERs:

{finding title}: {problem}
{finding title}: {problem}

Recommendation: Re-run /create-task with corrected context. Key corrections:

{what was wrong in the original task}
{what the correct information is}


### Compound Verdicts

When multiple verdicts apply (e.g., REVISE + SPLIT):

1. Apply REVISE first (fix the content).
2. Then apply SPLIT (decompose the corrected task).
3. If RESEARCH is part of the compound, research first, then revise, then split.

Order: RESEARCH → REVISE → SPLIT. REWORK supersedes all others.

---

## Convergence Defaults

Use **fast mode** (2 agents) when all of these are true:

- Task modifies <= 3 files
- Task crosses <= 1 module boundary
- No unsafe code, concurrency, or distributed systems concerns
- Task type is not `epic`
- No `[NEEDS ENRICHMENT]` markers

Fast mode agents: **Reference Accuracy** + **Edge Cases & Completeness**.

Use **standard mode** (4 agents) for everything else.

---

## Anti-Patterns

| Anti-Pattern | Why It's Wrong | Do This Instead |
|--------------|----------------|-----------------|
| Approving a task with stale file paths | Implementer wastes time finding moved code | Verify every path with Glob/Read |
| Adding findings without codebase evidence | Speculation wastes revision effort | Grep and Read before claiming a problem |
| Splitting a task that's already well-scoped | Creates overhead without benefit | Only split when scope criteria are exceeded |
| Patching a fundamentally broken task | Lipstick on a pig — BLOCKERs compound | REWORK verdict, re-run /create-task |
| Running /deeper-research for simple gaps | Token-expensive overkill | Use /deep-research for standard gaps, codebase reading for simple ones |
| Skipping review for "obvious" tasks | Obvious tasks have the most hidden assumptions | Review everything; fast mode exists for small tasks |
| Revising without showing the user | User loses visibility into what changed | Always present changes before or after applying |
| Dispatching 5+ domain skills | Context overload, diminishing returns | Cap at 3 domain skills, prioritize by task type |
| Running domain skills on tasks with no domain-specific code | Wasted tokens, noise in the report | Let auto-detection decide; only force with --domain when justified |
| Ignoring domain skill contradictions | Domain expert found a real problem the generalist missed | Always elevate contradictions to findings |

## Related Skills

**Complementary pair:**
- `/create-task` — creates the tasks this skill reviews

**Research (dispatched for knowledge gaps in Phase 3):**
- `/deep-research` — 7-agent research for standard gaps
- `/deeper-research` — 21-agent research for critical gaps

**Domain enrichment (dispatched in Phase 1.5):**
- `/test-strategy` — test type recommendations (unit, rstest, proptest, fuzz, Kani, sim)
- `/unsafe-review` — safety invariant audit, test coverage matrix
- `/safe-over-unsafe` — safe API boundary design for unsafe internals
- `/dist-sys-auditor` — distributed systems citation and invariant verification
- `/sim-review` — deterministic simulation testability compliance
- `/performance-analyzer` — allocation, cache, and CPU hotspot analysis
- `/simd-optimize` — SIMD pattern classification and ISA strategy
- `/asm-forge` — assembly codegen quality audit
- `/interface-design-review` — API misuse-resistance enforcement
- `/security-reviewer` — memory safety and CWE mapping

**Downstream:**
- `/plan-review` — reviews implementation plans (this skill reviews task descriptions)
- `/plan-forge` — creates implementation plans from tasks
- `/review-dispatch` — reviews code after implementation
- `/execute-review-findings` — executes review findings as tasks

Adoption

ahrav/review-task

$ install --global

Security Scan Results

SKILL.md

Review Task

Invocation

Phase 0 — Load & Understand

Phase 1 — Verification (3-4 Parallel Agents)

Common Preamble (included in every agent's prompt)

Agent 1 — Reference Accuracy

Agent 2 — Edge Cases & Completeness

Agent 3 — Feasibility & Design

Agent 4 — Scope Assessment

Phase 1.5 — Domain Enrichment

Domain Skill Dispatch Table

Detection Logic

Dispatch Protocol

Parallel Dispatch

Phase 2 — Synthesize Findings (Single Agent)

Synthesizer Prompt

Rules

Related Skills

ahrav/first-principles

ahrav/task-forge

ahrav/.claude/skills/autoresearch

ahrav/test-pipeline

ahrav/review-task

$ install --global

Security Scan Results

SKILL.md

Review Task

Invocation

Phase 0 — Load & Understand

Phase 1 — Verification (3-4 Parallel Agents)

Common Preamble (included in every agent's prompt)

Agent 1 — Reference Accuracy

Agent 2 — Edge Cases & Completeness

Agent 3 — Feasibility & Design

Agent 4 — Scope Assessment

Phase 1.5 — Domain Enrichment

Domain Skill Dispatch Table

Detection Logic

Dispatch Protocol

Parallel Dispatch

Phase 2 — Synthesize Findings (Single Agent)

Synthesizer Prompt

Rules

Related Skills

ahrav/first-principles

ahrav/task-forge

ahrav/.claude/skills/autoresearch

ahrav/test-pipeline