skills/very-long-text-summarization/SKILL.md
Summarizes very long texts (books, handbooks, biographies, codebases) using hierarchical multi-pass extraction with cheap model armies. Produces structured knowledge maps, not just summaries. Use when processing 50+ page documents, professional handbooks, career biographies, or any text too large for a single context window. Activate on "summarize book", "summarize handbook", "long document", "extract knowledge", "distill text", "professional biography". NOT for short text summarization (<10 pages), real-time chat summarization, or code documentation (use technical-writer).
npx skillsauth add curiositech/windags-skills very-long-text-summarizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Processes texts too large for a single context window using hierarchical multi-pass extraction with armies of cheap models. Produces structured knowledge maps, indexed summaries, and skill drafts — not just prose compression.
1. Output Mode Selection
If user wants quick understanding for meeting prep:
→ Use SUMMARY mode (2-pass: Haiku army → Sonnet synthesis)
If user needs machine-readable knowledge for downstream processing:
→ Use KNOWLEDGE-MAP mode (2-pass with full structured extraction)
If user wants to convert handbook expertise into Claude skill:
→ Use SKILL-DRAFT mode (3-pass: add Opus refinement)
2. Document Size Assessment
If < 20 pages (< 40K tokens):
→ Skip this skill, use direct summarization
If 20-200 pages (40K-400K tokens):
→ Standard 3-pass pipeline, 4K chunks with 500-token overlap
If 200+ pages (400K+ tokens):
→ Large document mode: 8K chunks, 1K overlap, parallel batch processing
3. Quality vs Cost Trade-off
If budget < $0.05:
→ Haiku-only mode: single pass extraction, no synthesis
If budget $0.05-0.20:
→ Standard mode: Haiku extraction → Sonnet synthesis
If budget > $0.20 AND output is skill-draft:
→ Premium mode: Add Opus refinement pass
4. Iteration Threshold
If knowledge map has < 80% concept coverage on first pass:
→ Run second extraction pass with focused prompts on gaps
If extracted processes have < 3 decision points each:
→ Re-extract with decision-focused template
If < 5 failure modes identified in technical text:
→ Run failure-mining pass with anti-pattern detection
Schema Bloat
Chunking Amnesia
Attention Dilution
Traceability Gap
False Convergence
Example 1: Technical Handbook with Cost/Quality Trade-off
Document: "PostgreSQL Administration Handbook" (450 pages, $0.30 budget)
Pass 1 Decision: 450 pages = ~900K tokens = ~225 chunks. At $0.001/chunk = $0.225 for Haiku army. Budget allows standard mode.
Chunking Strategy: Semantic chunking on section headings (##). Average chunk: 4K tokens, 500 overlap.
Pass 1 Extraction (parallel, 3 seconds wall time):
Pass 2 Synthesis (Sonnet, $0.08): Merges 225 chunk extractions into knowledge map with:
Quality Check: Coverage analysis shows 15 failure modes identified, 12 processes have ≥3 decision points each. Meets acceptance criteria.
Total Cost: $0.30, Output: 47-page knowledge map with full traceability.
Example 2: Dense Legal Text Showing Chunking Failure
Document: "Federal Tax Code Section 1031" (78 pages of dense legal text)
First Attempt - Fixed Chunking: 4K token chunks on hard boundaries.
Failure Detection: Knowledge map shows "exchange property" and "sale property" as unrelated concepts. Cross-reference validation fails.
Fix - Semantic Chunking: Split on subsection boundaries (numbered paragraphs).
Result: Correctly extracts "Like-kind exchange applies EXCEPT when property held primarily for sale" as single integrated rule.
Example 3: Codebase Extraction Result
Document: Ruby on Rails codebase (500 files, 50K lines)
Challenge: Code spans multiple files, architectural patterns emerge across modules.
Approach:
Pass 1 Results:
Pass 2 Synthesis:
Output: Architecture knowledge map showing Rails expertise patterns, not code documentation.
Knowledge Map Completeness
Process Extraction Validation
Traceability Verification
Expertise Pattern Quality
Cost/Quality Optimization
Use other skills instead:
For short documents (<10 pages):
For real-time chat summarization:
conversation-synthesizer — this skill is designed for static documents, not live streamsFor code documentation generation:
technical-writer — this skill extracts architectural knowledge, not API docsFor research paper analysis:
research-craft — this skill handles single long documents, not multi-document synthesisFor meeting transcripts:
meeting-synthesizer — this skill assumes structured text, not conversational flowWhen NOT to iterate further:
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.