.claude/skills/review-task/SKILL.md
Use when a beads task exists and needs validation before implementation — verifies codebase references, identifies edge cases and design flaws, assesses scope and feasibility, splits oversized tasks, dispatches domain-specific skills (test-strategy, unsafe-review, dist-sys-auditor, simd-optimize, asm-forge, performance-analyzer, security-reviewer, interface-design-review, sim-review, safe-over-unsafe) for specialized enrichment, and dispatches /deep-research or /deeper-research for ambiguous areas. The complement of /create-task — ensures tasks are buttoned up and ready for mechanical implementation.
npx skillsauth add ahrav/gossip-rs review-taskInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Audits and enriches beads tasks created by /create-task (or any source) so that
when a developer picks up the work, they can focus purely on implementation. Catches
stale references, missing edge cases, scope bloat, design flaws, and ambiguity —
while changes are still cheap (editing a description vs. reworking code).
Core principle: A task that survives this review should require zero research from the implementing developer. They read, they code, they ship.
Key capability: Domain-specific skills (testing, unsafe, distributed systems, SIMD, performance, security, etc.) are dispatched automatically when the task touches their domain. This adds specialized depth that generalist reviewers cannot provide.
/review-task <task-id>
/review-task <task-id> --deep # Dispatch /deep-research for gaps
/review-task <task-id> --deeper # Dispatch /deeper-research for gaps
/review-task <task-id> --split-only # Skip verification, assess scope and split
/review-task <task-id> --dry-run # Print findings without modifying the task
/review-task <task-id> --skip=performance # Drop a specialist
/review-task <task-id> --focus=distributed # Adds domain context to all agents
Flags:
| Flag | Effect |
|------|--------|
| --deep | Dispatch /deep-research for any identified knowledge gaps |
| --deeper | Dispatch /deeper-research for critical knowledge gaps |
| --split-only | Skip verification phases, assess scope and split if needed |
| --dry-run | Print all findings and proposed edits without modifying the task |
| --skip=<agent> | Drop a specialist (references, edge-cases, feasibility, scope). Min 2 must run. |
| --focus=<domain> | Adds domain context to all agent prompts (e.g., distributed, concurrency, unsafe) |
| --no-research | Never dispatch research skills, even if gaps are found (just flag them) |
| --no-domain | Skip domain skill enrichment (Phase 1.5). Only run generalist verification. |
| --domain=<skill> | Force-dispatch a specific domain skill regardless of auto-detection (e.g., --domain=test-strategy) |
Fetch the task:
bd show <task-id>
Parse the task description and extract:
[NEEDS ENRICHMENT] markers from /create-taskQuick pre-flight checks (fail fast):
/create-task first. Stop.Build a Review Brief:
## Review Brief
### Task
- **ID**: {task-id}
- **Title**: {title}
- **Type**: {type} | **Priority**: P{priority}
- **Labels**: {labels}
### Referenced Files
| File | Lines | Exists? |
|------|-------|---------|
{for each file path in the description, check with Glob/ls}
### Referenced Identifiers
| Name | Kind | Found? | Location |
|------|------|--------|----------|
{for each type/function/trait name, Grep the codebase}
### NEEDS ENRICHMENT Markers
{list any sections marked as needing enrichment}
### Scope Indicators
- Files referenced: {count}
- Modules crossed: {count of distinct top-level crate directories}
- Has acceptance criteria: {yes/no}
- Has code snippets: {yes/no}
- Description length: {line count}
### Domain Signals
{detected domain signals — see Domain Skill Dispatch Table below}
Detect domain signals for Phase 1.5 enrichment. Scan the task description AND referenced files for the triggers listed in the Domain Skill Dispatch Table (below). Record which skills should be dispatched.
If --split-only, skip to Phase 3 (Scope Assessment agent only).
Launch all selected agents in a single message using the Task tool with
subagent_type=general-purpose. Each agent gets the full task description +
Review Brief but a different verification lens.
You are a specialist task reviewer. You have ONE job: review the task below
through the lens of {SPECIALTY}. Ignore issues outside your specialty — other
specialists cover those.
## Task Under Review
{FULL_TASK_DESCRIPTION}
## Review Brief
{REVIEW_BRIEF}
## Rules
- Only report findings within your specialty. Do NOT stray.
- Only report findings that REQUIRE action. No nits, no "nice to have."
- Be concrete: cite the specific task section, quoted text, file path, or code
snippet for every finding.
- Explore the codebase (Glob, Grep, Read) to ground findings in reality. The
most valuable findings come from gaps between what the task says and what the
codebase actually contains.
- For each finding, state the PROBLEM and the RECOMMENDED FIX (specific text
to add, remove, or change in the task description).
- Rate each finding:
- severity: BLOCKER / MAJOR / MINOR
- BLOCKER: task cannot be implemented as written (wrong file paths, incorrect
API, missing critical context)
- MAJOR: task can be implemented but will likely produce incorrect or
incomplete results (missing edge case, stale pattern, design flaw)
- MINOR: task is implementable but could be clearer or more complete
- confidence (0-100): How sure are you this is a real issue?
## Output Format
Return a markdown document starting with:
`# {SPECIALTY} Review`
Then a findings list. For each finding:
### Finding N: {title}
- **Task section**: {which part of the task description}
- **Problem**: {what's wrong or missing}
- **Evidence**: {codebase evidence — file paths, actual code, grep results}
- **Recommended fix**: {specific edit to the task description}
- **Severity**: {BLOCKER / MAJOR / MINOR}
- **Confidence**: {N}%
End with: "Total findings: N" (0 is valid — do not invent issues).
Your specialty: REFERENCE ACCURACY
Focus exclusively on:
- Do all file paths in the task exist? Have any moved or been renamed?
- Do line numbers match? Read cited files and verify the code at those lines
matches what the task quotes.
- Do referenced struct names, trait names, function signatures, enum variants,
and type aliases actually exist in the codebase?
- Are code snippets in the task accurate copies of the current codebase, or
have they drifted?
- Does the task reference design docs? If so, does the design doc content
match what the task claims?
- Are dependency/crate references current (Cargo.toml)?
- Do referenced beads task IDs in the "Related Work" section exist?
Run `bd show <id>` for each.
For every file path: Read the file. For every code snippet: compare character
by character. For every type name: Grep the codebase. Be exhaustive.
Do NOT review edge cases, design quality, or scope — other specialists do that.
Your specialty: EDGE CASES & COMPLETENESS
Focus exclusively on:
- What inputs, states, or conditions does the task not address?
- Empty/zero/nil inputs
- Boundary values (max capacity, zero-length, single element)
- Concurrent access patterns if the code is shared across threads
- Error paths and failure modes not mentioned
- Rollback/cleanup on partial failure
- Does the "Desired State" cover ALL cases, or only the happy path?
- Are the acceptance criteria specific enough to verify? Could an
implementer satisfy the criteria while missing the actual intent?
- Does the task account for existing callers/consumers of modified APIs?
(Grep for call sites the task doesn't mention)
- Are there related invariants documented in design docs or code comments
that the task should preserve but doesn't mention?
- Does the implementation guidance miss files that will obviously need
changes? (e.g., mod.rs re-exports, test files, Cargo.toml features)
For each edge case found: describe the scenario, explain what would go wrong,
and propose a specific addition to the task description.
Do NOT review reference accuracy, design alternatives, or scope — other
specialists do that.
Your specialty: FEASIBILITY & DESIGN
Focus exclusively on:
- Is the proposed approach actually feasible given the codebase architecture?
Read the relevant modules and assess whether the task's implementation
guidance is compatible with existing patterns.
- Are there simpler alternatives the task overlooks? Check for existing
utilities in gossip-stdx, sibling modules, or trait implementations that
could be reused (per duplication prevention rules).
- Does the approach introduce unnecessary complexity? (New abstractions,
generics, indirection that aren't justified)
- Are there performance concerns the task should flag?
- Hot path allocations (check if touched code is in HOT tier)
- Lock contention or oversized critical sections
- Unbounded growth patterns
- Does the approach contradict any project conventions?
- No-versioning policy (no V1/V2, no deprecated, no compatibility shims)
- Allocation policy tiers (HOT/WARM/COLD)
- Comment policy (no tracking IDs, no temporal narration)
- Are there design trade-offs the task should document but doesn't?
- Will the approach compose well with in-flight work? Check `bd list --status=in_progress`
for potentially conflicting changes.
Do NOT review reference accuracy, edge case enumeration, or scope — other
specialists do that.
Your specialty: SCOPE ASSESSMENT
Focus exclusively on:
- Is this task appropriately sized for a single implementation session?
A well-scoped task modifies 1-4 files in 1-2 modules. Flag if it exceeds:
- 6+ files modified
- 3+ modules crossed
- 3+ distinct behavioral changes
- Both production code AND test infrastructure changes that could be separate
- Can this task be decomposed into independent sub-tasks that deliver
incremental value? If so, propose specific splits with:
- Sub-task title
- Which files/sections of the original task belong to each
- Dependency ordering between sub-tasks
- Whether each sub-task is independently testable
- Does the task mix concerns? Common anti-patterns:
- Refactor + new feature in one task
- Bug fix + performance optimization in one task
- API change + migration of all callers in one task
- Are there prerequisite tasks that should be extracted?
(e.g., "add trait X" before "implement trait X for types A, B, C")
- Is the task UNDER-scoped? Does it describe a change that won't be useful
without follow-up work that isn't tracked?
For each scope finding, provide a concrete split recommendation with titles,
file assignments, and dependency ordering.
Do NOT review reference accuracy, edge cases, or design quality — other
specialists do that.
After Phase 1 specialists complete but before synthesis, dispatch domain-specific
skills to provide specialized depth. Skip this phase if --no-domain is set.
The orchestrator detects signals from the task description, referenced files, and Phase 1 findings to determine which domain skills to dispatch. Maximum 3 domain skills per review to keep scope manageable.
| Signal | Skill | What It Adds to the Task |
|--------|-------|--------------------------|
| Task mentions testing strategy, or acceptance criteria lack test type guidance, or task touches code with no test coverage | /test-strategy | Specific test types (unit, rstest, proptest, fuzz, Kani, sim), patterns, and commands |
| Referenced files contain unsafe blocks, or task adds new unsafe code | /unsafe-review | Safety invariant audit, test coverage matrix (Miri/Kani/fuzz/proptest gaps) |
| Task adds unsafe AND needs safe public API wrapping | /safe-over-unsafe | API boundary design, module privacy soundness checklist |
| Referenced files are in gossip-coordination/, gossip-contracts/src/coordination/, or task touches leases/shards/epochs/fencing | /dist-sys-auditor | Citation verification, locked decision compliance, anti-pattern check |
| Task touches coordination AND mentions simulation or fault tolerance | /sim-review | DST compatibility check, sans-IO pattern enforcement |
| Task is labeled performance or touches HOT-tier code paths, or Phase 1 feasibility agent flagged performance concerns | /performance-analyzer | Allocation audit, cache analysis, hot-path verification |
| Task involves SIMD, vectorization, or std::arch intrinsics | /simd-optimize | ISA detection, pattern classification, implementation strategy |
| Task involves assembly-level optimization or codegen quality | /asm-forge | ASM audit scope, codegen red flags to include in task guidance |
| Task modifies public API surface (pub fn, pub struct, pub trait) | /interface-design-review | Misuse-resistance audit, enforcement hierarchy check |
| Task touches parsing, buffer handling, or security-sensitive operations | /security-reviewer | Memory safety audit, CWE mapping, high-risk file identification |
File-based signals: For each referenced file, check which crate and module it belongs to. Map to domain skills:
gossip-coordination/ or gossip-contracts/src/coordination/ → /dist-sys-auditorgossip-stdx/src/ data structures with unsafe → /unsafe-reviewscanner-engine/src/engine/ hot paths → /performance-analyzerstd::arch:: or SIMD intrinsics → /simd-optimizeContent-based signals: Grep the task description for keywords:
/unsafe-review/dist-sys-auditor/simd-optimize/performance-analyzer/test-strategy/interface-design-reviewPhase 1 escalation: If a Phase 1 specialist flags a domain-specific concern that they cannot fully evaluate (e.g., feasibility agent says "this touches unsafe code but I can't assess soundness"), escalate to the corresponding domain skill.
Forced dispatch: If --domain=<skill> is set, always include that
skill regardless of auto-detection.
Priority when > 3 skills triggered: Rank by relevance to the task type:
/test-strategy, /unsafe-review, /security-reviewer/interface-design-review, /dist-sys-auditor, /test-strategy/performance-analyzer, /asm-forge, /simd-optimize/unsafe-review, /safe-over-unsafe, /security-reviewerFor each selected domain skill, the orchestrator:
Invokes the skill with a scoped prompt. The skill receives:
You are being invoked as a domain enrichment step during task review. Your job is NOT to do a full audit. Instead, produce a focused report answering: "What domain-specific information should be added to this task description to make it implementable without further research?"
Specifically:
- What domain-specific edge cases or gotchas does the task miss?
- What domain-specific patterns, utilities, or conventions should it reference?
- What domain-specific acceptance criteria should be added?
- What domain-specific risks should be called out?
Keep output concise — aim for 5-15 specific, actionable items.
Collects the enrichment report and passes it to the Phase 2 synthesizer alongside the Phase 1 specialist reports.
If multiple domain skills are selected, dispatch them in parallel in a single message — they operate on different domains and don't conflict.
If a domain skill invocation fails or times out, proceed without it. Record the failure in the synthesis so the user knows what enrichment was skipped.
After Phase 1 specialists AND Phase 1.5 domain enrichment complete, launch
1 synthesizer agent using the Task tool with subagent_type=general-purpose.
You are the Task Review Synthesizer. Specialist reviewers have independently
audited the same beads task, and domain-specific skills have provided
specialized enrichment. Your job is to merge ALL inputs into one actionable
report and determine whether the task needs revision, research, splitting,
or is ready for implementation.
## Original Task
{FULL_TASK_DESCRIPTION}
## Specialist Reports (Phase 1)
{ALL_SPECIALIST_REPORTS}
## Domain Enrichment Reports (Phase 1.5)
{ALL_DOMAIN_ENRICHMENT_REPORTS}
(or "No domain skills dispatched" if Phase 1.5 was skipped)
## Your Task
### 1. Deduplicate
Multiple specialists may flag the same underlying issue from different angles.
Group these into single findings and note which specialists flagged it.
### 2. Score & Filter
For every unique finding, assess:
- **Severity**: BLOCKER / MAJOR / MINOR
- BLOCKER: Task cannot be correctly implemented as written
- MAJOR: Implementation will likely produce incorrect or incomplete results
- MINOR: Task is implementable but could be clearer
- **Confidence** (0-100): How sure is this a real issue?
Discard any finding with confidence < 50. Every finding in the final report
must require action.
### 3. Integrate Domain Enrichment
For each domain enrichment report, extract actionable items and classify:
- **Add to task**: Domain-specific edge cases, patterns, conventions,
acceptance criteria, or risk callouts that should be folded into the task
description.
- **Contradicts task**: Domain skill found something that conflicts with the
task's proposed approach. Elevate to a MAJOR or BLOCKER finding.
- **Confirms task**: Domain skill validated the approach. Note as supporting
evidence (no action needed).
### 4. Classify Into Verdicts
Based on the surviving findings (from both specialists and domain enrichment),
assign the task ONE overall verdict:
- **READY**: 0 BLOCKERs, 0-2 MAJORs, task is implementable. List MINORs as
optional improvements.
- **REVISE**: 0 BLOCKERs, but 3+ MAJORs or significant gaps, or domain
enrichment produced items that should be folded into the task. Task needs
specific edits before implementation.
- **RESEARCH**: Findings reveal ambiguity or unknowns that cannot be resolved
by reading the codebase alone — external research is needed. Flag specific
questions for `/deep-research` or `/deeper-research`.
- **SPLIT**: Task scope is too large for a single implementation session.
Propose concrete sub-tasks.
- **REWORK**: 2+ BLOCKERs. Task description is fundamentally flawed. Recommend
re-running `/create-task` with corrected information.
A task can receive multiple verdicts (e.g., REVISE + SPLIT).
### 5. Research Questions (if verdict includes RESEARCH)
For each knowledge gap that requires external research:
- **Question**: {specific question that needs answering}
- **Why it matters**: {impact on the implementation}
- **What we know**: {best available evidence from the codebase}
- **What's missing**: {the specific gap}
- **Recommended skill**: `/deep-research` or `/deeper-research`
- **Suggested problem statement**: {ready-to-paste prompt for the research skill}
### 6. Split Recommendations (if verdict includes SPLIT)
For each proposed sub-task:
- **Title**: {imperative statement}
- **Scope**: {which files and sections from the original task}
- **Dependencies**: {which sub-tasks must complete first}
- **Independently testable**: {yes/no}
- **Priority relative to original**: {same / higher / lower}
### 7. Revision Checklist (if verdict includes REVISE)
A numbered list of specific edits to make to the task description. Include
both specialist-sourced revisions and domain enrichment items:
1. {section} — {what to change and why} — source: {specialist / domain skill}
2. ...
Each item must be concrete enough that the edit can be made mechanically.
### Output Format
```markdown
## Task Review Report
**Task**: {id} — {title}
**Verdict**: {READY / REVISE / RESEARCH / SPLIT / REWORK}
**Findings**: {N total — X BLOCKERs, Y MAJORs, Z MINORs}
**Domain skills dispatched**: {list or "none"}
### Findings
| # | Finding | Severity | Confidence | Source |
|---|---------|----------|------------|--------|
**Details:**
#### 1. {Finding title}
- **Problem**: {description}
- **Evidence**: {codebase evidence}
- **Recommended fix**: {specific edit}
- **Source**: {specialist name and/or domain skill name}
### Domain Enrichment Summary
| Domain Skill | Items to Add | Contradictions | Confirmations |
|--------------|-------------|----------------|---------------|
| /test-strategy | 3 | 0 | 1 |
| /unsafe-review | 2 | 1 | 0 |
{Details of each domain enrichment item}
### {Verdict-specific sections as applicable}
---
## Phase 3 — Act on Verdict
Based on the synthesizer's verdict, take the appropriate action. If `--dry-run`
is active, print the proposed actions without executing them.
### Verdict: READY
Present the report to the user. No modifications needed.
Task {id} passed review. Verdict: READY Findings: {summary — e.g., "2 MINORs (optional improvements)"}
{list MINORs if any, as optional suggestions}
### Verdict: REVISE
Apply the revision checklist to the task description:
1. Read the current task description: `bd show <task-id>`
2. Apply each revision from the checklist.
3. If `--dry-run`, print the diff and stop.
4. Update the task: `bd update <task-id> --description="..."`
5. Show the updated task to the user.
Task {id} revised. Verdict: REVISE Changes applied: {count} {summary of each change}
### Verdict: RESEARCH
Dispatch the appropriate research skill for each identified gap:
1. If `--no-research` is set, present the research questions to the user
and stop. The user can manually run `/deep-research` or `/deeper-research`.
2. If `--deep` is set (or inferred from the gap severity):
- Invoke `/deep-research` with the suggested problem statement.
- After research completes, fold key findings into the task description:
add them to the Implementation Guidance, Design Notes, or Risk Analysis
sections as appropriate.
- Update the task: `bd update <task-id> --description="..."`
3. If `--deeper` is set (for critical/highest-stakes gaps):
- Invoke `/deeper-research` with the suggested problem statement.
- Fold findings into the task description as above.
- Update the task.
4. After research is folded in, re-run Phase 1-2 (without the research
dispatch) to verify the enriched task is now READY or REVISE.
### Verdict: SPLIT
Create sub-tasks from the split recommendations:
1. For each proposed sub-task:
- Extract the relevant sections from the original task description.
- Create a new task using `/create-task --quick` with the extracted scope.
Use `--quick` because the context already exists — no need for fresh
research.
- Register dependencies: `bd dep add <child-id> <dependency-id>`
- Set the parent: if the original task is an epic, use `--parent=<original-id>`.
Otherwise, convert the original task to an epic first:
`bd update <original-id> --type=epic`
2. Present the split to the user:
Task {id} split into {N} sub-tasks:
{original-id} (epic): {original title} ├── {child-1-id}: {title} [no dependencies] ├── {child-2-id}: {title} [depends on {child-1-id}] └── {child-3-id}: {title} [depends on {child-1-id}]
3. If `--dry-run`, print the proposed sub-tasks with descriptions without
creating them.
### Verdict: REWORK
Do NOT attempt to patch the task. Present the findings and recommend
re-running `/create-task`:
Task {id} needs rework. Verdict: REWORK ({N} BLOCKERs found)
BLOCKERs:
Recommendation: Re-run /create-task with corrected context. Key corrections:
### Compound Verdicts
When multiple verdicts apply (e.g., REVISE + SPLIT):
1. Apply REVISE first (fix the content).
2. Then apply SPLIT (decompose the corrected task).
3. If RESEARCH is part of the compound, research first, then revise, then split.
Order: RESEARCH → REVISE → SPLIT. REWORK supersedes all others.
---
## Convergence Defaults
Use **fast mode** (2 agents) when all of these are true:
- Task modifies <= 3 files
- Task crosses <= 1 module boundary
- No unsafe code, concurrency, or distributed systems concerns
- Task type is not `epic`
- No `[NEEDS ENRICHMENT]` markers
Fast mode agents: **Reference Accuracy** + **Edge Cases & Completeness**.
Use **standard mode** (4 agents) for everything else.
---
## Anti-Patterns
| Anti-Pattern | Why It's Wrong | Do This Instead |
|--------------|----------------|-----------------|
| Approving a task with stale file paths | Implementer wastes time finding moved code | Verify every path with Glob/Read |
| Adding findings without codebase evidence | Speculation wastes revision effort | Grep and Read before claiming a problem |
| Splitting a task that's already well-scoped | Creates overhead without benefit | Only split when scope criteria are exceeded |
| Patching a fundamentally broken task | Lipstick on a pig — BLOCKERs compound | REWORK verdict, re-run /create-task |
| Running /deeper-research for simple gaps | Token-expensive overkill | Use /deep-research for standard gaps, codebase reading for simple ones |
| Skipping review for "obvious" tasks | Obvious tasks have the most hidden assumptions | Review everything; fast mode exists for small tasks |
| Revising without showing the user | User loses visibility into what changed | Always present changes before or after applying |
| Dispatching 5+ domain skills | Context overload, diminishing returns | Cap at 3 domain skills, prioritize by task type |
| Running domain skills on tasks with no domain-specific code | Wasted tokens, noise in the report | Let auto-detection decide; only force with --domain when justified |
| Ignoring domain skill contradictions | Domain expert found a real problem the generalist missed | Always elevate contradictions to findings |
## Related Skills
**Complementary pair:**
- `/create-task` — creates the tasks this skill reviews
**Research (dispatched for knowledge gaps in Phase 3):**
- `/deep-research` — 7-agent research for standard gaps
- `/deeper-research` — 21-agent research for critical gaps
**Domain enrichment (dispatched in Phase 1.5):**
- `/test-strategy` — test type recommendations (unit, rstest, proptest, fuzz, Kani, sim)
- `/unsafe-review` — safety invariant audit, test coverage matrix
- `/safe-over-unsafe` — safe API boundary design for unsafe internals
- `/dist-sys-auditor` — distributed systems citation and invariant verification
- `/sim-review` — deterministic simulation testability compliance
- `/performance-analyzer` — allocation, cache, and CPU hotspot analysis
- `/simd-optimize` — SIMD pattern classification and ISA strategy
- `/asm-forge` — assembly codegen quality audit
- `/interface-design-review` — API misuse-resistance enforcement
- `/security-reviewer` — memory safety and CWE mapping
**Downstream:**
- `/plan-review` — reviews implementation plans (this skill reviews task descriptions)
- `/plan-forge` — creates implementation plans from tasks
- `/review-dispatch` — reviews code after implementation
- `/execute-review-findings` — executes review findings as tasks
development
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.