Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ratacat/documentation-scraper

Name: documentation-scraper
Author: ratacat

skills/documentation-scraper/SKILL.md

npx skillsauth add ratacat/claude-skills documentation-scraper

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Documentation Scraper with slurp-ai

Overview

slurp-ai scrapes documentation websites and compiles them into a single markdown file optimized for AI agent context. It is lightweight, fast, and deterministic - it does NOT use AI to scrape, it is FOR AI consumption.

CRITICAL: Run Outside Sandbox

All commands in this skill MUST be run outside the sandbox. Use dangerouslyDisableSandbox: true for all Bash commands including:

which slurp (installation check)
node analyze-sitemap.js (sitemap analysis)
slurp (scraping)
File inspection commands (wc, head, cat, etc.)

The sandbox blocks network access and file operations required for web scraping.

Pre-Flight: Check Installation

Before scraping, verify slurp-ai is installed:

which slurp || echo "NOT INSTALLED"

If not installed, ask the user to run:

npm install -g slurp-ai

Requires: Node.js v20+

Do NOT proceed with scraping until slurp-ai is confirmed installed.

Commands

| Command | Purpose | |---------|---------| | slurp <url> | Fetch and compile in one step | | slurp fetch <url> [version] | Download docs to partials only | | slurp compile | Compile partials into single file | | slurp read <package> [version] | Read local documentation |

Output: Creates slurp_compiled/compiled_docs.md from partials in slurp_partials/.

CRITICAL: Analyze Sitemap First

Before running slurp, ALWAYS analyze the sitemap. This reveals the complete site structure and informs your --base-path and --max decisions.

Step 1: Run Sitemap Analysis

Use the included analyze-sitemap.js script:

node analyze-sitemap.js https://docs.example.com

This outputs:

Total page count (informs --max)
URLs grouped by section (informs --base-path)
Suggested slurp commands with appropriate flags
Sample URLs to understand naming patterns

Step 2: Interpret the Output

Example output:

📊 Total URLs in sitemap: 247

📁 URLs by top-level section:
   /docs                          182 pages
   /api                            45 pages
   /blog                           20 pages

🎯 Suggested --base-path options:
   https://docs.example.com/docs/guides/     (67 pages)
   https://docs.example.com/docs/reference/  (52 pages)
   https://docs.example.com/api/             (45 pages)

💡 Recommended slurp commands:

   # Just "/docs/guides" section (67 pages)
   slurp https://docs.example.com/docs/guides/ --base-path https://docs.example.com/docs/guides/ --max 80

Step 3: Choose Scope Based on Analysis

| Sitemap Shows | Action | |---------------|--------| | < 50 pages total | Scrape entire site: slurp <url> --max 60 | | 50-200 pages | Scope to relevant section with --base-path | | 200+ pages | Must scope down - pick specific subsection | | No sitemap found | Start with --max 30, inspect partials, adjust |

Step 4: Frame the Slurp Command

With sitemap data, you can now set accurate parameters:

# From sitemap: /docs/api has 45 pages
slurp https://docs.example.com/docs/api/intro \
  --base-path https://docs.example.com/docs/api/ \
  --max 55

Key insight: Starting URL is where crawling begins. Base path filters which links get followed. They can differ (useful when base path itself returns 404).

Common Scraping Patterns

Library Documentation (versioned)

# Express.js 4.x docs
slurp https://expressjs.com/en/4x/api.html --base-path https://expressjs.com/en/4x/

# React docs (latest)
slurp https://react.dev/learn --base-path https://react.dev/learn

API Reference Only

slurp https://docs.example.com/api/introduction --base-path https://docs.example.com/api/

Full Documentation Site

slurp https://docs.example.com/

CLI Options

| Flag | Default | Purpose | |------|---------|---------| | --max <n> | 20 | Maximum pages to scrape | | --concurrency <n> | 5 | Parallel page requests | | --headless <bool> | true | Use headless browser | | --base-path <url> | start URL | Filter links to this prefix | | --output <dir> | ./slurp_partials | Output directory for partials | | --retry-count <n> | 3 | Retries for failed requests | | --retry-delay <ms> | 1000 | Delay between retries | | --yes | - | Skip confirmation prompts |

Compile Options

| Flag | Default | Purpose | |------|---------|---------| | --input <dir> | ./slurp_partials | Input directory | | --output <file> | ./slurp_compiled/compiled_docs.md | Output file | | --preserve-metadata | true | Keep metadata blocks | | --remove-navigation | true | Strip nav elements | | --remove-duplicates | true | Eliminate duplicates | | --exclude <json> | - | JSON array of regex patterns to exclude |

When to Disable Headless Mode

Use --headless false for:

Static HTML documentation sites
Faster scraping when JS rendering not needed

Default is headless (true) - works for most modern doc sites including SPAs.

Output Structure

slurp_partials/              # Intermediate files
  └── page1.md
  └── page2.md
slurp_compiled/              # Final output
  └── compiled_docs.md       # Compiled result

Quick Reference

# 1. ALWAYS analyze sitemap first
node analyze-sitemap.js https://docs.example.com

# 2. Scrape with informed parameters (from sitemap analysis)
slurp https://docs.example.com/docs/ --base-path https://docs.example.com/docs/ --max 80

# 3. Skip prompts for automation
slurp https://docs.example.com/ --yes

# 4. Check output
cat slurp_compiled/compiled_docs.md | head -100

Common Issues

| Problem | Cause | Solution | |---------|-------|----------| | Wrong --max value | Guessing page count | Run analyze-sitemap.js first | | Too few pages scraped | --max limit (default 20) | Set --max based on sitemap analysis | | Missing content | JS not rendering | Ensure --headless true (default) | | Crawl stuck/slow | Rate limiting | Reduce --concurrency 3 | | Duplicate sections | Similar content | Use --remove-duplicates (default) | | Wrong pages included | Base path too broad | Use sitemap to find correct --base-path | | Prompts blocking automation | Interactive mode | Add --yes flag |

Post-Scrape Usage

The output markdown is designed for AI context injection:

# Check file size (context budget)
wc -c slurp_compiled/compiled_docs.md

# Preview structure
grep "^#" slurp_compiled/compiled_docs.md | head -30

# Use with Claude Code - reference in prompt or via @file

When NOT to Use

API specs in OpenAPI/Swagger: Use dedicated parsers instead
GitHub READMEs: Fetch directly via raw.githubusercontent.com
npm package docs: Often better to read source + README
Frequently updated docs: Consider caching strategy

ratacat/documentation-scraper

skills/documentation-scraper/SKILL.md

Use when needing to scrape documentation websites into markdown for AI context. Triggers on "scrape docs", "download documentation", "get docs for [library]", or creating local copies of online documentation. CRITICAL - always analyze sitemap first before scraping.

29 stars

development

Updated Apr 11, 2026

$ install --global

skillsauth

npx skillsauth add ratacat/claude-skills documentation-scraper

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 11, 2026, 8:36 PM6.1s2 files scanned

SKILL.md

name:: documentation-scraper
description:: Use when needing to scrape documentation websites into markdown for AI context. Triggers on "scrape docs", "download documentation", "get docs for [library]", or creating local copies of online documentation. CRITICAL - always analyze sitemap first before scraping.

Documentation Scraper with slurp-ai

Overview

CRITICAL: Run Outside Sandbox

All commands in this skill MUST be run outside the sandbox. Use dangerouslyDisableSandbox: true for all Bash commands including:

which slurp (installation check)
node analyze-sitemap.js (sitemap analysis)
slurp (scraping)
File inspection commands (wc, head, cat, etc.)

The sandbox blocks network access and file operations required for web scraping.

Pre-Flight: Check Installation

Before scraping, verify slurp-ai is installed:

which slurp || echo "NOT INSTALLED"

If not installed, ask the user to run:

npm install -g slurp-ai

Requires: Node.js v20+

Do NOT proceed with scraping until slurp-ai is confirmed installed.

Commands

Output: Creates slurp_compiled/compiled_docs.md from partials in slurp_partials/.

CRITICAL: Analyze Sitemap First

Before running slurp, ALWAYS analyze the sitemap. This reveals the complete site structure and informs your --base-path and --max decisions.

Step 1: Run Sitemap Analysis

Use the included analyze-sitemap.js script:

node analyze-sitemap.js https://docs.example.com

This outputs:

Total page count (informs --max)
URLs grouped by section (informs --base-path)
Suggested slurp commands with appropriate flags
Sample URLs to understand naming patterns

Step 2: Interpret the Output

Example output:

📊 Total URLs in sitemap: 247

📁 URLs by top-level section:
   /docs                          182 pages
   /api                            45 pages
   /blog                           20 pages

🎯 Suggested --base-path options:
   https://docs.example.com/docs/guides/     (67 pages)
   https://docs.example.com/docs/reference/  (52 pages)
   https://docs.example.com/api/             (45 pages)

💡 Recommended slurp commands:

   # Just "/docs/guides" section (67 pages)
   slurp https://docs.example.com/docs/guides/ --base-path https://docs.example.com/docs/guides/ --max 80

Step 3: Choose Scope Based on Analysis

Step 4: Frame the Slurp Command

With sitemap data, you can now set accurate parameters:

# From sitemap: /docs/api has 45 pages
slurp https://docs.example.com/docs/api/intro \
  --base-path https://docs.example.com/docs/api/ \
  --max 55

Key insight: Starting URL is where crawling begins. Base path filters which links get followed. They can differ (useful when base path itself returns 404).

Common Scraping Patterns

Library Documentation (versioned)

# Express.js 4.x docs
slurp https://expressjs.com/en/4x/api.html --base-path https://expressjs.com/en/4x/

# React docs (latest)
slurp https://react.dev/learn --base-path https://react.dev/learn

API Reference Only

slurp https://docs.example.com/api/introduction --base-path https://docs.example.com/api/

Full Documentation Site

slurp https://docs.example.com/

CLI Options

Compile Options

When to Disable Headless Mode

Use --headless false for:

Static HTML documentation sites
Faster scraping when JS rendering not needed

Default is headless (true) - works for most modern doc sites including SPAs.

Output Structure

slurp_partials/              # Intermediate files
  └── page1.md
  └── page2.md
slurp_compiled/              # Final output
  └── compiled_docs.md       # Compiled result

Quick Reference

# 1. ALWAYS analyze sitemap first
node analyze-sitemap.js https://docs.example.com

# 2. Scrape with informed parameters (from sitemap analysis)
slurp https://docs.example.com/docs/ --base-path https://docs.example.com/docs/ --max 80

# 3. Skip prompts for automation
slurp https://docs.example.com/ --yes

# 4. Check output
cat slurp_compiled/compiled_docs.md | head -100

Common Issues

Post-Scrape Usage

The output markdown is designed for AI context injection:

# Check file size (context budget)
wc -c slurp_compiled/compiled_docs.md

# Preview structure
grep "^#" slurp_compiled/compiled_docs.md | head -30

# Use with Claude Code - reference in prompt or via @file

When NOT to Use

API specs in OpenAPI/Swagger: Use dedicated parsers instead
GitHub READMEs: Fetch directly via raw.githubusercontent.com
npm package docs: Often better to read source + README
Frequently updated docs: Consider caching strategy

Related Skills

ratacat/xcode-test

tools

VerifiedTrustedCommunity

Build and test iOS apps on simulator using XcodeBuildMCP

29SKILL.mdUpdated Apr 11, 2026

ratacat/writing-documentation

development

VerifiedTrustedCommunity

Produces concise, clear documentation by applying Elements of Style principles. Use when writing or improving any technical documentation (READMEs, guides, API docs, architecture docs). Not for code comments.

29SKILL.mdUpdated Apr 11, 2026

ratacat/writing-documentation

ratacat/writing-skills

testing

VerifiedTrustedCommunity

Use when user asks to create, write, edit, or test a skill. Also use when documenting reusable techniques, patterns, or workflows for future Claude instances.

29SKILL.mdUpdated Apr 11, 2026

ratacat/writing-skills

ratacat/workflows-work

testing

VerifiedTrustedCommunity

Execute work plans efficiently while maintaining quality and finishing features

29SKILL.mdUpdated Apr 11, 2026

ratacat/workflows-work

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ratacat/claude-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-skills/skills/documentation-scraper ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ratacat/claude-skills

29 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT