.claude/skills/deep-research/SKILL.md
Use when designing safety-critical code, distributed protocols, or novel algorithms where getting the design wrong is expensive. Parallel research agents survey papers, production systems, and prior art, then synthesize into an evidence-backed codebase plan.
npx skillsauth add ahrav/gossip-rs deep-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A three-phase evidence-gathering workflow for problems where getting the design wrong is expensive: safety-critical code, performance-critical paths, distributed systems protocols, unsafe Rust, concurrency primitives, and novel algorithms.
Research agents independently survey the landscape (papers, production systems, post-mortems, specifications), a synthesizer cross-references and ranks the evidence, and an integrator maps findings to a concrete implementation plan grounded in the codebase.
/design-tournament instead when the problem is well-understood but
multiple valid approaches exist/deep-research <problem statement>
The <problem statement> should describe what you're trying to build or decide,
and why existing knowledge is insufficient. If no argument is given, ask the
user for the problem statement before proceeding.
Before launching Phase 1, the orchestrator (you) performs these steps inline (not a separate agent):
Run date +%Y-%m-%d via Bash. Use the returned year for all date-filtered
searches and recency checks. Do NOT assume a year from training data.
Parse the problem statement and identify:
Use Glob, Grep, and Read to gather:
Include a brief summary of this context in every Phase 1 agent's prompt alongside the problem statement.
Agents should weight evidence by domain credibility alongside evidence strength. Higher-credibility domains require less corroboration; lower ones need more.
| Tier | Score | Domains | Treatment | |------|-------|---------|-----------| | High (80-100) | Auto-trust for factual claims | arxiv.org, usenix.org, dl.acm.org, ieee.org, nature.com, official project docs, RFC specs | Core evidence; cite directly | | Moderate (60-79) | Trust with corroboration | Engineering blogs (Cloudflare, AWS, Datadog, Discord), conference talks, arstechnica | Good lead; corroborate with another source | | Low (40-59) | Leads only | Medium posts, personal blogs, Stack Overflow, forum discussions, unknown domains | Use only when nothing stronger exists; flag as WEAK | | Suspect (<40) | Verify before citing | Content farms, SEO-optimized listicles, anonymous posts, sensational patterns | Do NOT cite unless verified against primary source |
These rules apply to ALL agents across ALL phases. Violation makes findings worthless.
Launch 5 research agents in parallel using the Task tool with
subagent_type=general-purpose. Each agent has a distinct research lens but
receives the same problem statement. All agents explore the codebase for context
AND search the web for external evidence.
All 5 agents MUST be launched in a single message (one message, five Task tool calls) so they run concurrently.
Each agent receives the common preamble below, with {PROBLEM} replaced by
the user's problem statement, {AGENT_ID} set to its label, and
{SPECIALTY} / {FOCUS} set per-agent.
You are Research Agent {AGENT_ID} — a {SPECIALTY} specialist.
## Problem Under Investigation
{PROBLEM}
## Current Date
{CURRENT_DATE}
Use this date for all recency checks and date-filtered searches. Do NOT assume
a year from training data.
## Codebase Context
{CODEBASE_CONTEXT}
## Your Research Mission
You are one of 5 independent research agents. Your job is to gather HARD
EVIDENCE — not opinions — about how this problem has been solved before.
Every claim must have a source. Unsourced claims are worthless.
### Research Process
1. **Understand the codebase context**: Use Glob, Grep, and Read to understand
the relevant parts of the codebase. What exists today? What constraints does
the current architecture impose?
2. **Search for external evidence**: Use WebSearch and WebFetch to find:
- Academic papers and technical reports
- Documentation from production systems that solve similar problems
- RFCs, specifications, and formal descriptions
- Post-mortems and failure analyses
- Conference talks, technical blog posts from credible sources
- Existing open-source implementations
Launch ALL searches in parallel when possible (single message, multiple
tool calls). Follow promising results with targeted WebFetch deep-dives.
3. **Evaluate and document**: For each piece of evidence, record:
- Source (URL, paper title, system name)
- Key finding or technique
- Relevance to our specific problem
- Evidence strength (see scale below)
- Source credibility tier (see below)
### Evidence Strength Scale
| Level | Label | Description | Example |
|-------|-------|-------------|---------|
| 5 | **Proven at scale** | Battle-tested in production systems handling similar workloads | FoundationDB's simulation testing, TigerBeetle's storage engine |
| 4 | **Peer-reviewed** | Published in reputable venue with formal analysis | OSDI/SOSP paper with proofs |
| 3 | **Implemented & tested** | Open-source implementation with benchmarks/tests | Well-maintained crate with >1k stars, comprehensive test suite |
| 2 | **Documented practice** | Technical blog from credible engineering org | Blog post from Cloudflare, Datadog, AWS engineering |
| 1 | **Anecdotal** | Forum discussion, personal blog, Stack Overflow answer | Useful for leads but needs corroboration |
### Source Credibility Tiers
| Tier | Domains | Treatment |
|------|---------|-----------|
| High (80-100) | arxiv.org, usenix.org, dl.acm.org, ieee.org, official docs, RFCs | Core evidence; cite directly |
| Moderate (60-79) | Engineering blogs (Cloudflare, AWS, Datadog), conf talks, arstechnica | Corroborate with another source |
| Low (40-59) | Medium posts, personal blogs, SO, forum discussions | Flag as WEAK; use only if nothing stronger exists |
| Suspect (<40) | Content farms, SEO listicles, anonymous posts | Do NOT cite unless verified against primary source |
### Anti-Hallucination Rules
- NEVER fabricate a citation. If no sources address a question, say so.
- Distinguish FACTS (from sources: "According to [source]...") from SYNTHESIS
(your analysis: "This suggests...").
- No vague attributions: NEVER write "research suggests" or "studies show"
without a specific citation.
- Watch for hallucination patterns: generic academic titles without real URLs,
future publication dates, pre-2015 citations mentioning LLMs/transformers.
- When uncertain whether a source says X, note the uncertainty — do not cite.
### Focus Area
{FOCUS}
### Rules
- EVERY finding must have a concrete source. No source = don't include it.
- Prefer primary sources over secondary summaries.
- If you find contradictory evidence, report BOTH sides with sources.
- Distinguish between "X is theoretically optimal" and "X works in production."
- Note when evidence is from a different domain and may not transfer directly.
- Search for COUNTER-evidence too — what are the failure modes of popular approaches?
- If a search returns no useful results, say so. Do not fabricate references.
- Aim for source diversity: mix academic, production, and practitioner sources.
### Output Format
Return a markdown document starting with:
`# Research Report — Agent {AGENT_ID}: {SPECIALTY}`
Then these sections:
#### 1. Codebase Context
What you found in the current codebase that's relevant. File paths and line
numbers for key structures.
#### 2. Findings
For each piece of evidence (aim for 5-15 findings):
**Finding {N}: {title}**
- **Source**: {URL or citation}
- **Source credibility**: {High/Moderate/Low} — {domain}
- **Evidence strength**: {1-5} — {label}
- **Summary**: {2-4 sentences}
- **Key technique/insight**: {the actionable takeaway}
- **Applicability to our problem**: {high/medium/low} — {why}
- **Caveats**: {limitations, different assumptions, scaling concerns}
#### 3. Structured Evidence Summary
After all findings, provide a machine-readable summary for synthesis:
```json
[
{
"id": "F1",
"claim": "specific claim text",
"source_url": "https://...",
"source_title": "...",
"evidence_strength": 4,
"credibility_tier": "high",
"applicability": "high"
}
]
This structured summary prevents synthesis fatigue when merging results from 5 agents. Include it AFTER the narrative findings, not instead of them.
What approaches appear repeatedly across sources? Where do experts agree?
Where do sources contradict each other? What remains unresolved?
Top 3-5 sources the team should read, ranked by relevance.
---
#### Agent 1 — Foundational Theory
{SPECIALTY}: Foundational Theory & Algorithms {FOCUS}: Search for the THEORETICAL foundations of this problem:
Start with WebSearch queries like:
Look at:
---
#### Agent 2 — Production Systems
{SPECIALTY}: Production Systems & Battle-Tested Implementations {FOCUS}: Search for how REAL SYSTEMS in production solve this problem:
Start with WebSearch queries like:
For each system found:
---
#### Agent 3 — Failure Modes & Pitfalls
{SPECIALTY}: Failure Modes, Post-Mortems & Anti-Patterns {FOCUS}: Search for how this problem GOES WRONG:
Start with WebSearch queries like:
For each failure found:
---
#### Agent 4 — Rust Ecosystem & Implementation Patterns
{SPECIALTY}: Rust Ecosystem & Implementation Patterns {FOCUS}: Search for how this problem is solved IN RUST specifically:
Start with WebSearch queries like:
For each crate or pattern found:
Also check the Rust standard library and popular foundational crates (crossbeam, tokio, rayon, parking_lot, etc.) for relevant patterns.
---
#### Agent 5 — Industry Practice & Architecture
{SPECIALTY}: Industry Practice & System Architecture {FOCUS}: Search for how ENGINEERING ORGANIZATIONS approach this problem:
Start with WebSearch queries like:
For each practice found:
---
### Collecting Results
After all 5 agents complete, gather their outputs. If any agent fails or times
out, proceed with the agents that succeeded (minimum 3 required for Phase 2).
---
## Phase 2 — Synthesize (Single Agent)
Launch **1 synthesis agent** using the Task tool with
`subagent_type=general-purpose`.
### Synthesizer Prompt
You are the Research Synthesizer. Five independent research agents have investigated the same problem from different angles. Your job is to cross-reference their findings into a single, evidence-ranked knowledge base.
{PROBLEM}
{ALL_FIVE_REPORTS}
Create a master list of ALL unique findings across all 5 agents. For findings reported by multiple agents, merge them and note corroboration.
Use the structured evidence summaries (JSON blocks) from each agent to bootstrap the inventory, then enrich from the narrative findings.
For each finding:
Credibility multipliers: High=1.0, Moderate=0.8, Low=0.5, Suspect=0.1.
Review all findings for hallucination indicators:
Flag any suspicious findings with: "HALLUCINATION RISK: {reason}". Downgrade their evidence strength to 0 and exclude from the ranking.
Identify the key design decisions for this problem, then for each decision show where the evidence points:
| Decision | Option A | Option B | Evidence For A | Evidence For B | Verdict | |----------|----------|----------|----------------|----------------|---------|
The verdict should be: STRONG CONSENSUS, LEAN (direction), CONTESTED, or INSUFFICIENT EVIDENCE.
Rank all discovered techniques/approaches by weighted evidence score:
Score = (evidence_strength × credibility_multiplier × applicability × corroboration_count)
| Rank | Technique | Score | Evidence | Credibility | Applicability | Corroboration | Key Source | |------|-----------|-------|----------|-------------|---------------|---------------|------------|
From the failure modes research, compile a risk register:
| Risk ID | Risk | Likelihood | Impact | Mitigation | Source | |---------|------|------------|--------|------------|--------|
For each critical gap, provide:
The 5-10 most important things learned from this research that should directly influence the design. Each must cite at least one source.
Return a markdown document starting with:
# Research Synthesis
Include all sections above, plus a final section:
3-5 bullet points capturing the most critical findings for someone who won't read the full report.
### Gap-Triggered Delta-Queries (Between Phase 2 and Phase 3)
After Phase 2 completes, the orchestrator (you) reviews the Contradictions &
Gaps section. If the synthesis identifies a **critical gap** — a question that
load-bearing design decisions depend on but has no evidence — trigger targeted
retrieval before proceeding to Phase 3.
**Trigger criteria**: A gap is critical when removing it would change the
#1 or #2 ranked technique, or when INSUFFICIENT EVIDENCE appears in the
Consensus Matrix for a fundamental design decision.
**Process**:
1. Formulate 2-4 narrow, specific WebSearch queries from the synthesizer's
suggested delta-queries.
2. Launch all queries in parallel (single message).
3. Time-box to 3 minutes total.
4. Append results to the synthesis as a "Supplementary Evidence" section
before passing to Phase 3.
5. Maximum 1 delta-query round per research run.
---
## Phase 3 — Integrate (Single Agent)
Launch **1 integration agent** using the Task tool with
`subagent_type=general-purpose`.
This agent maps the synthesized research to a concrete implementation plan
grounded in the actual codebase.
### Integrator Prompt
You are the Research-to-Plan Integrator. You have a comprehensive research synthesis and access to the codebase. Your job is to produce a concrete, evidence-backed implementation plan.
{PROBLEM}
{SYNTHESIS_REPORT}
Thoroughly explore the codebase to understand:
Map each research finding to specific locations in the codebase:
Before planning, review the synthesis for:
Produce a step-by-step implementation plan where EVERY design decision cites evidence from the synthesis:
For each step:
Step {N}: {title}
Create a traceability matrix:
| Plan Step | Research Finding(s) | Evidence Strength | Credibility | Confidence | |-----------|--------------------|--------------------|-------------|------------|
Confidence levels:
Any step with LOW or NOVEL confidence gets a mandatory note explaining what additional validation is needed (benchmarks, property tests, fuzzing, formal verification, etc.).
For any CONTESTED decisions from the synthesis, describe:
How to verify the implementation is correct:
Return a markdown document starting with:
# Implementation Plan
Include all sections above, plus:
A numbered bibliography of all sources cited in the plan, with URLs. Each citation in the plan body should reference this list: [1], [2], etc.
---
## Final Output Format
After the integrator completes, present the combined output to the user:
```markdown
## Deep Research Results
### Problem
{one-line restatement}
### Current Date
{YYYY-MM-DD used for all recency checks}
### Executive Summary
{from the synthesizer's executive summary}
### Evidence Highlights
| # | Finding | Evidence Strength | Credibility | Sources | Applicability |
|---|---------|-------------------|-------------|---------|---------------|
{top 10 findings from the synthesis, ranked by weighted score}
### Hallucination Flags
{any findings flagged as HALLUCINATION RISK by the synthesizer, with reasons}
{if none: "No hallucination indicators detected."}
### Implementation Plan
{the integrator's full plan}
### Consensus & Contested Decisions
{consensus matrix from the synthesis}
### Risk Register
{risk register from the synthesis}
### Delta-Query Results
{if gap-triggered delta-queries were run: summarize what was found}
{if not triggered: "No critical gaps required additional retrieval."}
### Full Research (collapsed)
<details><summary>Research Agent 1: Foundational Theory</summary>
{full report}
</details>
<details><summary>Research Agent 2: Production Systems</summary>
{full report}
</details>
<details><summary>Research Agent 3: Failure Modes & Pitfalls</summary>
{full report}
</details>
<details><summary>Research Agent 4: Rust Ecosystem</summary>
{full report}
</details>
<details><summary>Research Agent 5: Industry Practice</summary>
{full report}
</details>
<details><summary>Full Research Synthesis</summary>
{synthesis report}
</details>
### References
{consolidated bibliography from integrator}
These standards apply to the output from each phase. The orchestrator (you) verifies them before proceeding to the next phase.
For long research runs, persist intermediate results to survive context compaction.
After Phase 2 completes, write a summary file:
.claude/research-state/{run-id}/
phase1-summary.md — Concatenated Phase 1 agent reports
phase2-synthesis.md — Phase 2 output (+ any delta-query supplements)
phase3-plan.md — Phase 3 implementation plan
Use a stable identifier: {date}-{first-5-words-of-problem-slugified}.
Example: 2026-04-08-lock-free-queue-design.
Default: 5 researchers + 1 synthesizer + 1 integrator (7 total agents).
The user can reduce the research agents:
/deep-research 3 agents: <problem>
Enforce these limits:
/design-tournament if you
want to explore multiple implementation approaches after research.development
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.