skills/agentcpm-report-interleaving-drafting-deepening/SKILL.md
Generate deep research reports by interleaving evidence-based drafting with reasoning-driven deepening. Uses the WARP (Writing As Reasoning Policy) framework from AgentCPM-Report to dynamically evolve outlines during writing instead of rigidly following a static plan. Trigger phrases: "deep research report", "write a comprehensive analysis", "investigate and write up", "research report on", "deep dive report", "analyze this topic thoroughly and produce a report"
npx skillsauth add ndpvt-web/arxiv-claude-skills agentcpm-report-interleaving-drafting-deepeningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to generate deep, insight-rich research reports by alternating between evidence-based drafting (writing grounded in retrieved information) and reasoning-driven deepening (identifying gaps, expanding underdeveloped sections, and evolving the outline). Rather than committing to a fixed outline upfront and filling it in linearly, this approach mirrors how expert human writers work: they start writing, discover what they don't know, research further, restructure, and deepen iteratively. The result is reports with substantially richer insight and more coherent argumentation than plan-then-write methods produce.
Traditional report generation follows a plan-then-write paradigm: first produce a complete outline, then fill each section. This fails for deep research because constructing a good outline itself requires understanding the material — creating a chicken-and-egg problem. WARP breaks this by treating the outline as a living document that co-evolves with the content.
The core insight is that writing reveals what you don't know. When you draft a section on, say, "Performance Characteristics of Column-Store Databases," you discover specific sub-questions (write amplification under concurrent writes, compression ratio vs. query speed trade-offs) that weren't apparent during initial planning. WARP captures this by alternating between two phases:
Evidence-Based Drafting: For each section in the current outline, formulate context-aware queries conditioned on what has already been written, retrieve relevant information, and synthesize it into grounded prose. The key is that queries are informed by the accumulating narrative, not just the section title.
Reasoning-Driven Deepening: After drafting, evaluate the content for logical gaps, superficial arguments, and underdeveloped areas. Decompose weak sections into granular sub-sections, expanding the outline hierarchy. Then trigger a new drafting cycle targeting only the newly created sections. The agent autonomously decides when to stop deepening by evaluating whether additional expansion would yield diminishing returns.
This interleaving produces reports that score significantly higher on insight — the ability to surface non-obvious connections and deep analysis — compared to static-outline approaches.
Capture the research question and constraints. Parse the user's request to identify: the core topic, desired depth, target audience, any specific angles or sub-questions, output format preferences, and available sources (codebase, URLs, documents, or web search).
Generate a sparse Level-1 outline. Create an intentionally minimal outline with only 3-5 top-level section titles and one-sentence writing intents for each. Do NOT try to produce a comprehensive outline — leave room for discovery. Example:
## 1. Background and Motivation
Intent: Establish why this topic matters and what problem it addresses.
## 2. Core Technical Approach
Intent: Explain the primary mechanism or architecture.
## 3. Empirical Evidence and Trade-offs
Intent: Present data, benchmarks, and practical considerations.
## 4. Synthesis and Recommendations
Intent: Draw cross-cutting insights and actionable conclusions.
Begin the first Drafting pass. For each Level-1 section, formulate 2-3 specific search queries conditioned on the section intent AND any context already gathered. Retrieve information (via web search, codebase search, file reads, or provided documents). Write a substantive first draft of each section grounded in retrieved evidence, citing sources inline.
Execute Reasoning-Driven Deepening. Read through the full draft and for each section ask:
For each identified gap, create a new sub-section in the outline (Level-2 or Level-3).
Expand the outline with new sub-sections. Update the working outline to reflect the deeper structure. Mark which new sub-sections need drafting. Typical expansion: a Level-1 section with a gap becomes 2-4 Level-2 sub-sections, each with a specific writing intent.
Draft the newly added sub-sections. Formulate new, targeted queries conditioned on the existing draft content (not just section titles). Retrieve additional evidence and write each new sub-section. This is where the deepest insights emerge — queries are now highly specific because they're informed by what was already written.
Repeat deepening (2-4 cycles maximum). After each drafting pass, re-evaluate for gaps. Continue the interleaving loop but apply diminishing-returns logic: stop expanding when new sub-sections would add marginal value, when the outline has reached Level-3 depth across most sections, or when the content adequately addresses the original research question.
Synthesize cross-cutting insights. Review the full expanded draft for themes, patterns, and connections that span multiple sections. Write or rewrite the synthesis/conclusion section to surface these non-obvious insights. This is where the iterative process pays off — connections invisible during initial planning become clear after deep drafting.
Polish and verify coherence. Ensure the final report reads as a unified document, not a collection of independently drafted sections. Check that: the introduction previews what the deep investigation actually found (not what was initially planned), transitions between sections are logical, evidence is consistently cited, and the overall argument flows.
Deliver with a structural summary. Present the final report along with a brief note on how the outline evolved — which sections were added during deepening, what unexpected findings emerged. This transparency helps the user understand the report's depth.
Example 1: Technical Architecture Investigation
User: Write a deep research report on event sourcing vs. traditional CRUD
for our order management system. We process ~50K orders/day.
Approach:
1. Sparse Level-1 outline:
- Background (why this matters for order systems)
- Event Sourcing Mechanics
- CRUD Approach Baseline
- Comparative Analysis
- Recommendation
2. First Drafting pass: Research event sourcing fundamentals,
CRUD patterns in order systems, retrieve benchmark data.
Write initial sections.
3. Deepening round 1 — gaps discovered:
- "Event Sourcing Mechanics" is too shallow on replay/projection cost
- Missing: how event schema evolution works at 50K orders/day
- Missing: operational complexity (debugging, monitoring)
→ Add sub-sections: "Projection Rebuild Costs at Scale",
"Schema Evolution Strategies", "Operational Observability"
4. Second Drafting pass: Targeted queries on projection rebuild
benchmarks, schema versioning patterns (upcasting, lazy migration),
and event store monitoring tools. Write new sub-sections.
5. Deepening round 2 — new insight emerges:
- The operational complexity section reveals that hybrid approaches
(CRUD + event log for audit) may dominate pure event sourcing
for this scale
→ Add sub-section: "Hybrid Architecture: CRUD with Event Audit Trail"
6. Final synthesis surfaces the non-obvious conclusion:
At 50K orders/day, pure event sourcing's replay costs likely
outweigh its benefits unless the team needs full temporal queries.
The hybrid approach captures 80% of the audit/replay value at
20% of the operational cost.
Output: 2500-word report with 4 top-level sections expanded to
12 sub-sections, concrete latency estimates, and a decision matrix.
Example 2: Codebase Investigation Report
User: Investigate why our test suite takes 45 minutes and write
up a report with recommendations.
Approach:
1. Sparse Level-1 outline:
- Current State (test counts, timing breakdown)
- Bottleneck Analysis
- Optimization Opportunities
- Recommended Action Plan
2. First Drafting pass: Search codebase for test configuration,
timing data, CI logs. Identify top-level numbers (test count,
parallelism settings, fixture setup patterns).
3. Deepening round 1 — gaps discovered:
- "Bottleneck Analysis" reveals database fixture setup is 60%
of total time, but WHY is unclear
- Missing: analysis of test isolation strategy (per-test DB
reset vs. transaction rollback)
→ Add: "Database Fixture Teardown Profiling",
"Test Isolation Strategy Analysis"
4. Second Drafting pass: Grep for setUp/tearDown patterns, analyze
fixture factories, check for missing test database pooling.
Write sub-sections with specific file:line references.
5. Deepening round 2:
- Discover that 12 integration tests each spin up a Redis
instance — invisible from top-level timing
→ Add: "Hidden Infrastructure Setup Costs"
6. Synthesis: The 45-minute runtime is 60% DB fixtures (fixable
with transaction rollback), 20% redundant Redis instances
(fixable with shared test container), 20% actual test execution.
Output: Report with specific file references, a prioritized fix
list with estimated impact, and a phased implementation plan.
Example 3: Technology Evaluation
User: Deep dive on whether we should migrate from REST to gRPC
for our internal microservices communication.
Approach:
1. Sparse outline: Background, gRPC Mechanics, Migration Costs,
Performance Comparison, Recommendation.
2. First Drafting pass: Research gRPC streaming, protobuf schema
management, HTTP/2 multiplexing benefits. Write initial sections.
3. Deepening round 1:
- "Migration Costs" is vague — need concrete sub-sections
→ Add: "Client Library Generation Pipeline", "Backward
Compatibility During Rollout", "Observability Tooling Gap"
- Performance section lacks our-context specificity
→ Add: "Latency Profile for <1KB Payloads" (matches our p95)
4. Second Drafting pass: Research protobuf backward compatibility
rules, gRPC-gateway for gradual migration, compare Jaeger/Zipkin
gRPC support vs REST.
5. Deepening round 2:
- Discover that gRPC reflection + server streaming enables a
real-time dashboard pattern impossible with REST polling
→ Add: "Emergent Architectural Possibilities"
6. Synthesis: gRPC wins on latency and type safety, but the
migration's hidden cost is observability tooling. Recommend
gRPC for new services + gRPC-gateway adapter for existing ones.
Output: Structured report with latency benchmarks, migration
checklist, and a "what we'd gain beyond performance" section
that only emerged through iterative deepening.
Do:
Avoid:
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research — Focus on Section 3 (WARP framework), particularly the five core actions (Initialize, Search, Write, Expand, Terminate) and how Evidence-Based Drafting and Reasoning-Driven Deepening interleave to produce reports that outperform closed-source systems on the Insight metric.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".