skills/web-search-tools/nimble-web-expert/references/nimble-crawl/SKILL.md
Reference for nimble crawl command. Load when bulk-crawling many pages asynchronously. Contains: async workflow (create → status → tasks results), all flags, polling guidelines, CRITICAL: use task_id (not crawl_id) for results, crawl vs map comparison.
npx skillsauth add nimbleway/agent-skills nimble-crawl-referenceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Async bulk crawling — starts a crawl job, returns a crawl_id, then you poll for status and retrieve per-page content via individual task_ids.
Tip: For LLM use, prefer
nimble map+nimble extract --format markdown(faster, cleaner). Use crawl when you need raw HTML archives of many pages at once.
Parameters:
| Parameter | CLI flag | Type | Default | Description |
| ------------------------- | --------------------------- | -------- | --------- | -------------------------------------------- |
| url | --url | string | required | Starting URL |
| limit | --limit | int | 5000 | Max pages (1–10,000) — always set this |
| name | --name | string | — | Label for tracking |
| sitemap | --sitemap | string | include | include | only | skip |
| include_path | --include-path | string[] | — | URL path regex to include (repeatable) |
| exclude_path | --exclude-path | string[] | — | URL path regex to exclude (repeatable) |
| max_discovery_depth | --max-discovery-depth | int | 5 | Max link depth from start (1–20) |
| crawl_entire_domain | --crawl-entire-domain | bool | false | Follow sibling/parent paths |
| allow_subdomains | --allow-subdomains | bool | false | Follow links to subdomains |
| allow_external_links | --allow-external-links | bool | false | Follow links to external domains |
| ignore_query_parameters | --ignore-query-parameters | bool | false | Deduplicate query-param variants |
| callback | --callback | string | — | Webhook URL for real-time page notifications |
| extract_options | --extract-options | object | — | Extraction config applied to each page |
CLI:
nimble crawl run --url "https://docs.example.com" --limit 50 --name "docs-crawl"
Python SDK:
from nimble_python import Nimble
nimble = Nimble(api_key=os.environ["NIMBLE_API_KEY"])
resp = nimble.crawl.run(url="https://docs.example.com", limit=50, name="docs-crawl")
crawl_id = resp.crawl_id
Response fields: crawl_id, status (initial: queued), url, name, crawl_options, created_at
Crawl status values: queued → running → succeeded / failed / canceled
Parameters:
| Parameter | CLI flag | Type | Description |
| --------- | -------- | ------ | --------------------- |
| id | --id | string | crawl_id (required) |
CLI:
nimble crawl status --id "abc-123"
Python SDK:
crawl = nimble.crawl.status(id=crawl_id)
print(crawl.status, crawl.completed, crawl.total)
Response adds: total, pending, completed, failed, tasks[]
| Field | Description |
| ----------------- | ----------------------------------------------------- |
| tasks[].task_id | Use with nimble tasks results to fetch page content |
| tasks[].status | pending / processing / completed / failed |
CRITICAL: Use
tasks[].task_id— NOTcrawl_id— to fetch page content. Usingcrawl_idwith tasks returns 404.
Parameters:
| Parameter | CLI flag | Type | Description |
| --------- | ---------- | ------ | ---------------------------------------------------------------------- |
| status | --status | string | Filter: queued | running | completed | failed | canceled |
| limit | --limit | int | Results per page |
| cursor | --cursor | string | Pagination cursor |
CLI:
nimble crawl list --status running
Python SDK:
result = nimble.crawl.list()
# result.data = list of crawl objects
# result.pagination = { has_next, next_cursor, total }
Parameters:
| Parameter | CLI flag | Type | Description |
| --------- | -------- | ------ | --------------------- |
| id | --id | string | crawl_id (required) |
CLI:
nimble crawl terminate --id "abc-123"
Python SDK:
nimble.crawl.terminate(id=crawl_id)
# Returns: {"status": "canceled"}
CRAWL_ID=$(nimble crawl run --url "https://example.com" --limit 100 --name "my-crawl" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['crawl_id'])")
while true; do
STATUS=$(nimble crawl status --id "$CRAWL_ID" \
| python3 -c "import json,sys; print(json.load(sys.stdin).get('status',''))")
echo "Status: $STATUS"
[ "$STATUS" = "succeeded" ] || [ "$STATUS" = "failed" ] || [ "$STATUS" = "canceled" ] && break
sleep 30
done
| Crawl size | Poll interval | | ------------ | ------------- | | < 50 pages | every 15–30s | | 50–500 pages | every 30–60s | | 500+ pages | every 60–120s |
After crawl succeeded, fetch each completed task — see nimble-tasks reference:
nimble tasks results --task-id "task-456"
| | map | crawl |
| -------- | ------------------------------------------- | ------------------------- |
| Speed | Seconds (sync) | Minutes (async) |
| Output | URL list with metadata | Full page content per URL |
| Best for | Discover URLs, then selectively extract | Archive all pages at once |
| LLM use | ✅ Combine with extract --format markdown | ⚠️ Returns raw HTML |
development
Finds qualified candidates for a role by searching LinkedIn, Indeed, GitHub, and other professional platforms using Nimble Web Search Agents. Accepts a job description, role title, or freeform request and returns a ranked candidate list with profiles, skills, and contact signals. Use this skill when the user wants to find, source, or recruit candidates for a role. Common triggers: "find candidates for", "source engineers in", "who can I hire for", "find me a [role]", "recruiting for", "talent search", "find a [role] in [city]", "build a candidate list", "sourcing for [role]", "who's available for", "find potential hires". Also triggers on a pasted job description followed by a sourcing request. Do NOT use for job market research or salary benchmarking — use market-finder instead. Do NOT use for researching a single known person — use company-deep-dive or meeting-prep instead.
development
Get web data now — fast, incremental, immediately responsive to what the user needs. The only way Claude can access live websites. USE FOR: - Fetching any URL or reading any webpage - Scraping prices, listings, reviews, jobs, stats, docs from any site - Discovering URLs on a site before bulk extraction - Calling public REST/XHR API endpoints - Web search and research (8 focus modes) - Bulk crawling website sections Must be pre-installed and authenticated. Run `nimble --version` to verify. For building reusable extraction workflows to run at scale over time, use nimble-agent-builder instead.
development
A building experience: create, test, validate, refine, and publish extraction workflows based on existing or new Nimble agents. For users who want to invest in a durable, reusable workflow for a specific domain — not get data immediately. Trigger phrases: "set up extraction for X site", "I need to extract from this site regularly", "build an agent for", "create a reusable scraper", "generate a Nimble agent", "refine my agent", "add a field to my agent", or when the user wants to run extraction at scale. For getting data immediately, use nimble-web-expert instead.
tools
SEO intelligence toolkit covering the full lifecycle via live web data: keyword research, rank tracking, site audits, content gap analysis, competitor keyword reverse-engineering, AI visibility across five platforms (ChatGPT, Perplexity, Google AI, Gemini, Grok), and GitHub repo SEO. Crawls real sites and SERPs via Nimble CLI — no fabricated metrics. Triggers: "SEO", "keywords", "rank tracker", "site audit", "content gap", "competitor keywords", "AI visibility", "GitHub SEO", "SERP analysis", "keyword research", "technical SEO", "keyword difficulty", "topic clusters", "ranking delta", "on-page SEO", "AI citation audit". Do NOT use for competitor business signals — use `competitor-intel` instead. Do NOT use for competitor messaging — use `competitor-positioning` instead. Do NOT use for general web scraping — use `nimble-web-expert` instead.