.codex/skills/deep-research/SKILL.md
Deep research before design — 3-5 parallel research agents survey papers, production systems, failure modes, and prior art, then a synthesizer compiles evidence, and an integrator maps findings to a concrete codebase plan with citations
npx skillsauth add ahrav/gossip-rs deep-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A three-phase evidence-gathering workflow for problems where getting the design wrong is expensive: safety-critical code, performance-critical paths, distributed systems protocols, unsafe Rust, concurrency primitives, and novel algorithms.
Research agents independently survey the landscape (papers, production systems, post-mortems, specifications), a synthesizer cross-references and ranks the evidence, and an integrator maps findings to a concrete implementation plan grounded in the codebase.
/design-tournament instead when the problem is well-understood but
multiple valid approaches exist/deep-research <problem statement>
The <problem statement> should describe what you're trying to build or decide,
and why existing knowledge is insufficient. If no argument is given, ask the
user for the problem statement before proceeding.
Launch 5 research agents in parallel using the Task tool with
subagent_type=general-purpose. Each agent has a distinct research lens but
receives the same problem statement. All agents explore the codebase for context
AND search the web for external evidence.
All 5 agents MUST be launched in a single message (one message, five Task tool calls) so they run concurrently.
Each agent receives the common preamble below, with {PROBLEM} replaced by
the user's problem statement, {AGENT_ID} set to its label, and
{SPECIALTY} / {FOCUS} set per-agent.
You are Research Agent {AGENT_ID} — a {SPECIALTY} specialist.
## Problem Under Investigation
{PROBLEM}
## Your Research Mission
You are one of 5 independent research agents. Your job is to gather HARD
EVIDENCE — not opinions — about how this problem has been solved before.
Every claim must have a source. Unsourced claims are worthless.
### Research Process
1. **Understand the codebase context**: Use Glob, Grep, and Read to understand
the relevant parts of the codebase. What exists today? What constraints does
the current architecture impose?
2. **Search for external evidence**: Use WebSearch and WebFetch to find:
- Academic papers and technical reports
- Documentation from production systems that solve similar problems
- RFCs, specifications, and formal descriptions
- Post-mortems and failure analyses
- Conference talks, technical blog posts from credible sources
- Existing open-source implementations
3. **Evaluate and document**: For each piece of evidence, record:
- Source (URL, paper title, system name)
- Key finding or technique
- Relevance to our specific problem
- Evidence strength (see scale below)
### Evidence Strength Scale
| Level | Label | Description | Example |
|-------|-------|-------------|---------|
| 5 | **Proven at scale** | Battle-tested in production systems handling similar workloads | FoundationDB's simulation testing, TigerBeetle's storage engine |
| 4 | **Peer-reviewed** | Published in reputable venue with formal analysis | OSDI/SOSP paper with proofs |
| 3 | **Implemented & tested** | Open-source implementation with benchmarks/tests | Well-maintained crate with >1k stars, comprehensive test suite |
| 2 | **Documented practice** | Technical blog from credible engineering org | Blog post from Cloudflare, Datadog, AWS engineering |
| 1 | **Anecdotal** | Forum discussion, personal blog, Stack Overflow answer | Useful for leads but needs corroboration |
### Focus Area
{FOCUS}
### Rules
- EVERY finding must have a concrete source. No source = don't include it.
- Prefer primary sources over secondary summaries.
- If you find contradictory evidence, report BOTH sides with sources.
- Distinguish between "X is theoretically optimal" and "X works in production."
- Note when evidence is from a different domain and may not transfer directly.
- Search for COUNTER-evidence too — what are the failure modes of popular approaches?
- If a search returns no useful results, say so. Do not fabricate references.
### Output Format
Return a markdown document starting with:
`# Research Report — Agent {AGENT_ID}: {SPECIALTY}`
Then these sections:
#### 1. Codebase Context
What you found in the current codebase that's relevant. File paths and line
numbers for key structures.
#### 2. Findings
For each piece of evidence (aim for 5-15 findings):
**Finding {N}: {title}**
- **Source**: {URL or citation}
- **Evidence strength**: {1-5} — {label}
- **Summary**: {2-4 sentences}
- **Key technique/insight**: {the actionable takeaway}
- **Applicability to our problem**: {high/medium/low} — {why}
- **Caveats**: {limitations, different assumptions, scaling concerns}
#### 3. Patterns & Consensus
What approaches appear repeatedly across sources? Where do experts agree?
#### 4. Disagreements & Open Questions
Where do sources contradict each other? What remains unresolved?
#### 5. Recommended Reading
Top 3-5 sources the team should read, ranked by relevance.
{SPECIALTY}: Foundational Theory & Algorithms
{FOCUS}:
Search for the THEORETICAL foundations of this problem:
- Seminal papers and algorithms (Lamport, Dijkstra, Knuth, etc.)
- Formal correctness proofs or verification approaches
- Complexity bounds — what's provably optimal?
- Mathematical models and invariants
- Type-theoretic or formal methods approaches
Start with WebSearch queries like:
- "{problem keywords} algorithm formal proof"
- "{problem keywords} paper OSDI SOSP VLDB SIGMOD"
- "{problem keywords} correctness verification"
- "{problem keywords} complexity bounds"
Look at:
- arxiv.org, dl.acm.org, usenix.org proceedings
- PhD theses and technical reports
- Textbook chapters (CLRS, TAOCP, etc.)
{SPECIALTY}: Production Systems & Battle-Tested Implementations
{FOCUS}:
Search for how REAL SYSTEMS in production solve this problem:
- Database engines (FoundationDB, TigerBeetle, CockroachDB, SQLite, DuckDB)
- Storage systems (RocksDB, LevelDB, WiscKey)
- Distributed systems (etcd, Raft implementations, Paxos variants)
- High-performance systems (DPDK, SPDK, io_uring users)
- Language runtimes (Go GC, Rust allocators, JVM internals)
- Operating systems (Linux kernel, FreeBSD, Fuchsia)
Start with WebSearch queries like:
- "{problem keywords} implementation production"
- "{system name} {problem keywords} design"
- "{problem keywords} source code github"
- "how does {system} handle {problem}"
For each system found:
- What approach do they use?
- What scale does it operate at?
- What trade-offs did they make and why?
- Link to source code or design docs when available.
{SPECIALTY}: Failure Modes, Post-Mortems & Anti-Patterns
{FOCUS}:
Search for how this problem GOES WRONG:
- Post-mortems from outages caused by similar systems
- CVEs and security advisories in related implementations
- Known anti-patterns and common mistakes
- Performance cliffs and degenerate cases
- Subtle bugs found in production (Jepsen reports, fuzzing results)
- Memory safety issues in similar C/C++/Rust implementations
Start with WebSearch queries like:
- "{problem keywords} bug post-mortem"
- "{problem keywords} vulnerability CVE"
- "{problem keywords} performance regression"
- "{problem keywords} Jepsen analysis"
- "{problem keywords} common mistakes pitfalls"
- "{problem keywords} undefined behavior unsafe"
For each failure found:
- What went wrong?
- Root cause analysis
- How was it detected?
- How was it fixed or mitigated?
- What invariant was violated?
{SPECIALTY}: Rust Ecosystem & Implementation Patterns
{FOCUS}:
Search for how this problem is solved IN RUST specifically:
- Existing crates that address this problem (crates.io, lib.rs)
- Rust-specific patterns (ownership for safety, typestate, const generics)
- Unsafe code patterns and safety proofs in similar Rust projects
- Benchmarks comparing Rust implementations
- Rust RFCs and compiler internals if relevant
Start with WebSearch queries like:
- "{problem keywords} rust crate"
- "{problem keywords} rust implementation"
- "{problem keywords} rust unsafe safe abstraction"
- "{problem keywords} rust performance benchmark"
- "crates.io {problem keywords}"
For each crate or pattern found:
- API design — how is it exposed to users?
- Safety story — how is unsafe (if any) encapsulated?
- Performance characteristics — any benchmarks?
- Maintenance status — actively maintained? Production users?
- Code quality — tests, docs, CI, fuzzing?
Also check the Rust standard library and popular foundational crates
(crossbeam, tokio, rayon, parking_lot, etc.) for relevant patterns.
{SPECIALTY}: Industry Practice & System Architecture
{FOCUS}:
Search for how ENGINEERING ORGANIZATIONS approach this problem:
- Technical blog posts from major engineering orgs (Google, Meta, AWS,
Cloudflare, Datadog, Discord, Figma, Fly.io)
- Conference talks (Strange Loop, RustConf, P99 CONF, QCon)
- Architecture Decision Records (ADRs) in open-source projects
- RFCs and design documents from relevant projects
- Books and practitioner guides
Start with WebSearch queries like:
- "{problem keywords} engineering blog"
- "{problem keywords} architecture design document"
- "{problem keywords} conference talk"
- "{problem keywords} RFC design"
- "{problem keywords} lessons learned"
- "{problem keywords} at scale"
For each practice found:
- What organization or project uses this approach?
- At what scale?
- What alternatives did they evaluate?
- What would they do differently in hindsight?
- Is this approach specific to their constraints or generalizable?
After all 5 agents complete, gather their outputs. If any agent fails or times out, proceed with the agents that succeeded (minimum 3 required for Phase 2).
Launch 1 synthesis agent using the Task tool with
subagent_type=general-purpose.
You are the Research Synthesizer. Five independent research agents have
investigated the same problem from different angles. Your job is to
cross-reference their findings into a single, evidence-ranked knowledge base.
## Original Problem
{PROBLEM}
## Research Reports
{ALL_FIVE_REPORTS}
## Your Task
### 1. Evidence Inventory
Create a master list of ALL unique findings across all 5 agents. For
findings reported by multiple agents, merge them and note corroboration.
For each finding:
- **ID**: F{N}
- **Title**: {descriptive title}
- **Sources**: {all sources citing this finding, with URLs}
- **Corroboration**: {how many agents independently found this}
- **Evidence strength**: {1-5, use the highest-quality source}
- **Applicability**: {high/medium/low for our specific problem}
### 2. Consensus Matrix
Identify the key design decisions for this problem, then for each decision
show where the evidence points:
| Decision | Option A | Option B | Evidence For A | Evidence For B | Verdict |
|----------|----------|----------|----------------|----------------|---------|
The verdict should be: STRONG CONSENSUS, LEAN (direction), CONTESTED, or
INSUFFICIENT EVIDENCE.
### 3. Evidence-Ranked Techniques
Rank all discovered techniques/approaches by weighted evidence score:
Score = (evidence_strength × applicability × corroboration_count)
| Rank | Technique | Score | Evidence | Applicability | Corroboration | Key Source |
|------|-----------|-------|----------|---------------|---------------|------------|
### 4. Risk Register
From the failure modes research, compile a risk register:
| Risk ID | Risk | Likelihood | Impact | Mitigation | Source |
|---------|------|------------|--------|------------|--------|
### 5. Contradictions & Gaps
- Where do sources disagree? What's the strongest evidence on each side?
- What aspects of the problem have NO evidence? Where are we flying blind?
- What evidence exists but doesn't transfer to our specific context?
### 6. Key Insights
The 5-10 most important things learned from this research that should
directly influence the design. Each must cite at least one source.
### Rules
- Do NOT add your own findings — you are synthesizing, not researching.
- If an agent's finding has no source, downgrade it to evidence strength 0
and flag it as UNVERIFIED.
- Preserve ALL source URLs from the original reports.
- If agents contradict each other, present both sides — do not pick a winner
unless the evidence clearly favors one side.
- Be explicit about what we DON'T know, not just what we do.
### Output Format
Return a markdown document starting with:
`# Research Synthesis`
Include all sections above, plus a final section:
#### Executive Summary
3-5 bullet points capturing the most critical findings for someone who
won't read the full report.
Launch 1 integration agent using the Task tool with
subagent_type=general-purpose.
This agent maps the synthesized research to a concrete implementation plan grounded in the actual codebase.
You are the Research-to-Plan Integrator. You have a comprehensive research
synthesis and access to the codebase. Your job is to produce a concrete,
evidence-backed implementation plan.
## Original Problem
{PROBLEM}
## Research Synthesis
{SYNTHESIS_REPORT}
## Your Task
### Step 1: Codebase Mapping
Thoroughly explore the codebase to understand:
- Current architecture and module structure (use Glob, Grep, Read)
- Existing patterns and conventions
- What infrastructure already exists that can be leveraged
- What constraints the current architecture imposes
- Dependencies and their versions
Map each research finding to specific locations in the codebase:
- Which files/modules would be affected?
- What existing abstractions can be reused?
- Where do new abstractions need to be introduced?
### Step 2: Implementation Plan
Produce a step-by-step implementation plan where EVERY design decision
cites evidence from the synthesis:
#### Plan Format
For each step:
**Step {N}: {title}**
- **What**: {concrete description — types, signatures, module placement}
- **Why**: {justification citing specific findings by ID: F1, F7, etc.}
- **Evidence**: {the specific technique/paper/system this is based on}
- **Files**: {exact file paths to create or modify}
- **Risks**: {from the risk register, with mitigation}
- **Acceptance criteria**: {how to verify this step is correct}
### Step 3: Evidence Trail
Create a traceability matrix:
| Plan Step | Research Finding(s) | Evidence Strength | Confidence |
|-----------|--------------------|--------------------|------------|
Confidence levels:
- HIGH: Multiple strong sources agree, directly applicable
- MEDIUM: Evidence exists but from different context, or sources disagree
- LOW: Limited evidence, based on extrapolation
- NOVEL: No direct evidence found — this is our own design (flag for extra review)
Any step with LOW or NOVEL confidence gets a mandatory note explaining
what additional validation is needed (benchmarks, property tests, fuzzing,
formal verification, etc.).
### Step 4: Alternative Approaches
For any CONTESTED decisions from the synthesis, describe:
- The alternative approach
- What evidence supports it
- Under what conditions we'd switch to it
- How to structure the code so switching is feasible
### Step 5: Validation Strategy
How to verify the implementation is correct:
- What properties should be tested (unit, property-based, fuzz)?
- What benchmarks should be run?
- What failure modes from the risk register need explicit test cases?
- Are there formal verification opportunities (Kani, MIRI)?
### Rules
- Every design decision MUST cite evidence. If there's no evidence for a
choice, flag it explicitly as NOVEL/UNJUSTIFIED.
- Be concrete: file paths, type signatures, function names. Not hand-waving.
- Respect existing codebase conventions — don't propose patterns alien to
the project.
- The plan should be implementable by a developer who hasn't read the full
research — include enough context in each step.
- Include estimated complexity per step (S/M/L) but NOT time estimates.
### Output Format
Return a markdown document starting with:
`# Implementation Plan`
Include all sections above, plus:
#### References
A numbered bibliography of all sources cited in the plan, with URLs.
Each citation in the plan body should reference this list: [1], [2], etc.
After the integrator completes, present the combined output to the user:
## Deep Research Results
### Problem
{one-line restatement}
### Executive Summary
{from the synthesizer's executive summary}
### Evidence Highlights
| # | Finding | Evidence Strength | Sources | Applicability |
|---|---------|-------------------|---------|---------------|
{top 10 findings from the synthesis, ranked by score}
### Implementation Plan
{the integrator's full plan}
### Consensus & Contested Decisions
{consensus matrix from the synthesis}
### Risk Register
{risk register from the synthesis}
### Full Research (collapsed)
<details><summary>Research Agent 1: Foundational Theory</summary>
{full report}
</details>
<details><summary>Research Agent 2: Production Systems</summary>
{full report}
</details>
<details><summary>Research Agent 3: Failure Modes & Pitfalls</summary>
{full report}
</details>
<details><summary>Research Agent 4: Rust Ecosystem</summary>
{full report}
</details>
<details><summary>Research Agent 5: Industry Practice</summary>
{full report}
</details>
<details><summary>Full Research Synthesis</summary>
{synthesis report}
</details>
### References
{consolidated bibliography from integrator}
Default: 5 researchers + 1 synthesizer + 1 integrator (7 total agents).
The user can reduce the research agents:
/deep-research 3 agents: <problem>
Enforce these limits:
/design-tournament if you
want to explore multiple implementation approaches after research.development
Deep first-principles code explanation that builds real understanding through phased walkthroughs with diagrams. Covers algorithms, data structures, memory layout, concurrency patterns, and performance tricks — especially for systems code in Rust. Use whenever the user asks to explain, walk through, break down, deep dive into, or understand code. Trigger on "how does this work", "what's happening here", "teach me about this", "why is it done this way", or when the user references a file with @ and wants to understand it. Proactively use when examining code involving lock-free algorithms, atomics/CAS, memory ordering,
development
Use when creating implementation-ready beads tasks that need testing strategy, optimal implementation approach, and documentation requirements baked in — composes /create-task with parallel enrichment agents that analyze the codebase and produce concrete test specifications, algorithm/data-structure guidance, and doc quality standards so implementing agents don't need to re-research
development
--- name: autoresearch description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports bounded iteration via Iterations: N inline config. version: 1.9.11 --- # Claude Autoresearch — Autonomous Goal-directed Iteration Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. **Core id
development
Use when implementing a new feature and assessing coverage gaps, during periodic test hygiene, when test suites feel bloated, or before merging code that changes coordination or hot paths. Two-phase assess-then-improve testing pipeline.