skills/jb-docs-scraper/SKILL.md
Use when scraping docs websites into local markdown files, crawling docs, or building AI-readable docs context.
npx skillsauth add bjesuiter/skills jb-docs-scraperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Scrape any documentation website into local markdown files. Uses crawl4ai for async web crawling.
# Scrape any documentation URL
uv run --with crawl4ai python ./references/scrape_docs.py <URL>
# Examples
uv run --with crawl4ai python ./references/scrape_docs.py https://mediasoup.org/documentation/v3/
uv run --with crawl4ai python ./references/scrape_docs.py https://docs.rombo.co/tailwind
Output goes to ./docs/<auto-detected-name>/ by default.
uv run --with crawl4ai playwright install
uv run --with crawl4ai python ./references/scrape_docs.py <URL> [OPTIONS]
| Option | Description | Default |
|--------|-------------|---------|
| -o, --output PATH | Output directory | ./docs/<auto-detected-name> |
| --max-depth N | Maximum link depth | 6 |
| --max-pages N | Maximum pages to scrape | 500 |
| --url-pattern PATTERN | URL filter (glob) | Auto-detected |
| -q, --quiet | Suppress verbose output | False |
# Basic - scrape to ./docs/documentation_v3/
uv run --with crawl4ai python ./references/scrape_docs.py \
https://mediasoup.org/documentation/v3/
# Custom output directory
uv run --with crawl4ai python ./references/scrape_docs.py \
https://docs.rombo.co/tailwind \
--output ./my-tailwind-docs
# Limit crawl scope
uv run --with crawl4ai python ./references/scrape_docs.py \
https://tanstack.com/start/latest/docs/framework/react/overview \
--max-pages 50 \
--max-depth 3
# Custom URL pattern filter
uv run --with crawl4ai python ./references/scrape_docs.py \
https://example.com/docs/api/v2/ \
--url-pattern "*api/v2/*"
docs/<name>/
index.md # Root page
getting-started.md
api/
overview.md
client.md
guides/
installation.md
| Issue | Solution |
|-------|----------|
| Playwright browser binaries are missing | Run uv run --with crawl4ai playwright install |
| Empty output | Check if URL pattern matches actual doc URLs. Try --url-pattern |
| Missing pages | Increase --max-depth or --max-pages |
| Wrong pages scraped | Use stricter --url-pattern |
--max-pages 10 to verify config before full crawltesting
Use when the user mentions Clawpatch/clawpatch.ai, semantic feature review, repo-wide AI audit, persistent findings, or clawpatch init/map/review/report/fix/revalidate.
development
Use when the user asks for autoreview, Codex/Claude second-model review, or final review of dirty changes, a branch, commit, or PR before ship.
testing
Use when the user asks to cut, prepare, publish, tag, or verify a release, especially npm/package releases.
tools
Use when adding, writing, fixing, or exposing a script for the Tuna macOS launcher.