Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

scraperapi/scraperapi-crawler

Name: scraperapi-crawler
Author: scraperapi

skills/scraperapi-crawler/SKILL.md

npx skillsauth add scraperapi/scraperapi-skills scraperapi-crawler

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

ScraperAPI Crawler

The Crawler discovers and scrapes linked pages automatically, starting from a seed URL and following links that match your regex pattern. Use it when you need data from many pages of a site but don't have the URLs upfront.

When NOT to use the Crawler

You have a known URL list → use the Async API instead; it's cheaper and more predictable.
You need a single page → use the Standard API (api.scraperapi.com).
The content is behind login → the Crawler only accesses publicly available pages.
Free plan, need depth > 1 → free accounts are capped at max_depth: 1.

Endpoints

| Action | Method | URL | |--------|--------|-----| | Start a crawl | POST | https://crawler.scraperapi.com/job | | Check status | GET | https://crawler.scraperapi.com/job/<jobId> | | Cancel a crawl | DELETE | https://crawler.scraperapi.com/job/<jobId> |

Auth: include api_key in the JSON body (POST) or as a query parameter (GET/DELETE).

Starting a Crawl

import os, requests

API_KEY = os.environ["SCRAPERAPI_API_KEY"]

job = requests.post(
    "https://crawler.scraperapi.com/job",
    json={
        "api_key":            API_KEY,
        "start_url":          "https://example.com/blog/",
        "url_regexp_include": "(?<full_url>https?://example\\.com/blog/.*)",
        "url_regexp_exclude": ".*\\.(pdf|png|jpg|zip)",
        "max_depth":          3,
        "crawl_budget":       500,
        "api_params": {
            "country_code": "us",
            "render":       False,
        },
        "callback": {
            "type": "webhook",
            "url":  "https://yourapp.com/crawler-results"
        },
    }
).json()

job_id = job["jobId"]
print(f"Started: {job_id}")

Required Parameters

| Parameter | Description | |-----------|-------------| | api_key | Your ScraperAPI key (read from env, never hardcode) | | start_url | Seed URL — where the crawl begins | | url_regexp_include | Regex defining which URLs to crawl (see patterns below) | | max_depth or crawl_budget | Must provide at least one; providing both applies whichever limit is hit first |

URL Regex Patterns

url_regexp_include and url_regexp_exclude are standard regexes. Include patterns must use named capture groups: (?<full_url>...) for absolute URLs, (?<relative_url>...) for relative.

# All pages on a domain
(?<full_url>https?://example\.com/.*)

# Only blog posts
(?<full_url>https?://example\.com/blog/.*)

# Only HTML pages — exclude files by requiring no extension
(?<full_url>https?://example\.com/[^.]*$)

# Relative paths (for sites that use relative links)
(?<relative_url>/products/.*)

Exclusion patterns (url_regexp_exclude — plain regex, no capture group needed):

# Skip binary files
.*\.(pdf|png|jpg|jpeg|gif|svg|zip|mp4)

# Skip auth and admin pages
.*(login|logout|signup|admin|auth).*

Start with a narrow url_regexp_include and validate on a small test run before scaling up — a pattern like .* will follow external links and crawl the entire internet.

Depth vs Budget

| Use max_depth when | Use crawl_budget when | |---------------------|------------------------| | Site structure is known and bounded | Site size is unknown | | You want a specific section (e.g., 3 hops from landing page) | You want to cap credit spend | | Small site | Large site with unpredictable link density |

Use both together for safe exploration: max_depth: 3, crawl_budget: 500 stops at whichever limit is hit first.

Depth 0 = start URL only. Depth 1 = start URL + every page it links to. Depth 2 = one level deeper, and so on.

Per-Page Scraping Parameters (`api_params`)

Controls how each discovered page is fetched. All standard ScraperAPI parameters are supported:

{
  "render":       true,
  "premium":      false,
  "ultra_premium": false,
  "country_code": "us",
  "device_type":  "desktop",
  "output_format": "markdown"
}

Add render: true only if the site uses JavaScript to load content — it multiplies the credit cost of every crawled page. Test without it first.

Checking Status

import time, requests

def wait_for_crawl(job_id, api_key, poll_interval=10):
    url = f"https://crawler.scraperapi.com/job/{job_id}"
    while True:
        data = requests.get(url, params={"api_key": api_key}).json()
        status = data.get("status")
        print(f"Status: {status}")
        if status in ("completed", "failed", "cancelled", "delivered"):
            return data
        time.sleep(poll_interval)

Job states: delayed → running → completed / failed / cancelled → in delivery → delivered.

Use webhooks instead of polling for long crawls — they push results as each page is scraped, rather than requiring you to poll for a final result.

Webhooks

When callback is set, ScraperAPI POSTs results to your endpoint as each page is scraped (not just at the end). Each payload contains the crawled URL, HTML/content, credit cost for that request, and the current depth.

A final summary payload is sent when the crawl completes, listing all succeeded/failed URLs and total cost.

{
  "callback": {
    "type": "webhook",
    "url": "https://yourapp.com/crawler-results"
  }
}

The webhook URL must be publicly reachable. ScraperAPI retries delivery up to 3 times; a failed webhook does not stop the crawl itself. Results are also retained for up to 30 days if you need to re-fetch them.

Scheduling

Omit schedule for a one-time crawl. Add it to run on a recurring interval:

{
  "schedule": {
    "name": "weekly-blog-crawl",
    "interval": "daily"
  }
}

Available intervals: once, hourly, daily, weekly, monthly. Scheduling requires a paid plan — free accounts can only run one-time crawls at max_depth: 1.

Credit Cost

Crawler credit cost = sum of all individual page requests in the crawl. Each page costs the same as a direct Standard API call with the same api_params (render, premium, etc.).

Set crawl_budget to cap total credits. The crawl stops gracefully when the budget is reached — it will not exceed it. Failed requests are not charged.

Check the sa-credit-cost header on each webhook payload to see per-page cost; use this to calibrate crawl_budget for future runs.

Error Handling

| Code | Meaning | Action | |------|---------|--------| | 200 | Job created or status returned | Continue | | 400 | Malformed request | Check required fields and regex syntax | | 401 | Invalid API key | Check SCRAPERAPI_API_KEY | | 403 | Credits exhausted or free plan depth limit | Upgrade or reduce max_depth/crawl_budget | | 429 | Too many concurrent jobs | Wait and retry | | 500 | Transient failure | Retry with backoff |

Documentation

Crawler overview
API reference
Dashboard

scraperapi/scraperapi-crawler

skills/scraperapi-crawler/SKILL.md

Product-usage reference for ScraperAPI's Crawler — crawl an entire site or section by following links automatically. Consult when the user needs to extract data from many pages of a site without knowing the URLs upfront. Use when user asks: "crawl an entire website with ScraperAPI", "scrape all pages on a domain", "follow links and scrape each page", "how do I use the ScraperAPI crawler API", "scrape a site map", "extract data from every product page on a site", "ScraperAPI crawler job API". Covers job creation, URL regex patterns, depth vs budget, per-page scraping parameters, status polling, webhooks, scheduling, and credit costs. Also invoke when the user is building a site-wide scraper and asks which ScraperAPI product to use.

2 stars

development

Updated Jun 2, 2026

$ install --global

skillsauth

npx skillsauth add scraperapi/scraperapi-skills scraperapi-crawler

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 2, 2026, 4:39 AM115.5s1 file scanned

SKILL.md

name:: scraperapi-crawler
description:: >
Use when user asks:: crawl an entire website with ScraperAPI", "scrape all pages on a domain",
emoji:: 🕷️
homepage:: https://docs.scraperapi.com/crawler

ScraperAPI Crawler

When NOT to use the Crawler

You have a known URL list → use the Async API instead; it's cheaper and more predictable.
You need a single page → use the Standard API (api.scraperapi.com).
The content is behind login → the Crawler only accesses publicly available pages.
Free plan, need depth > 1 → free accounts are capped at max_depth: 1.

Endpoints

Auth: include api_key in the JSON body (POST) or as a query parameter (GET/DELETE).

Starting a Crawl

import os, requests

API_KEY = os.environ["SCRAPERAPI_API_KEY"]

job = requests.post(
    "https://crawler.scraperapi.com/job",
    json={
        "api_key":            API_KEY,
        "start_url":          "https://example.com/blog/",
        "url_regexp_include": "(?<full_url>https?://example\\.com/blog/.*)",
        "url_regexp_exclude": ".*\\.(pdf|png|jpg|zip)",
        "max_depth":          3,
        "crawl_budget":       500,
        "api_params": {
            "country_code": "us",
            "render":       False,
        },
        "callback": {
            "type": "webhook",
            "url":  "https://yourapp.com/crawler-results"
        },
    }
).json()

job_id = job["jobId"]
print(f"Started: {job_id}")

Required Parameters

URL Regex Patterns

url_regexp_include and url_regexp_exclude are standard regexes. Include patterns must use named capture groups: (?<full_url>...) for absolute URLs, (?<relative_url>...) for relative.

# All pages on a domain
(?<full_url>https?://example\.com/.*)

# Only blog posts
(?<full_url>https?://example\.com/blog/.*)

# Only HTML pages — exclude files by requiring no extension
(?<full_url>https?://example\.com/[^.]*$)

# Relative paths (for sites that use relative links)
(?<relative_url>/products/.*)

Exclusion patterns (url_regexp_exclude — plain regex, no capture group needed):

# Skip binary files
.*\.(pdf|png|jpg|jpeg|gif|svg|zip|mp4)

# Skip auth and admin pages
.*(login|logout|signup|admin|auth).*

Start with a narrow url_regexp_include and validate on a small test run before scaling up — a pattern like .* will follow external links and crawl the entire internet.

Depth vs Budget

Use both together for safe exploration: max_depth: 3, crawl_budget: 500 stops at whichever limit is hit first.

Depth 0 = start URL only. Depth 1 = start URL + every page it links to. Depth 2 = one level deeper, and so on.

Per-Page Scraping Parameters (`api_params`)

Controls how each discovered page is fetched. All standard ScraperAPI parameters are supported:

{
  "render":       true,
  "premium":      false,
  "ultra_premium": false,
  "country_code": "us",
  "device_type":  "desktop",
  "output_format": "markdown"
}

Add render: true only if the site uses JavaScript to load content — it multiplies the credit cost of every crawled page. Test without it first.

Checking Status

import time, requests

def wait_for_crawl(job_id, api_key, poll_interval=10):
    url = f"https://crawler.scraperapi.com/job/{job_id}"
    while True:
        data = requests.get(url, params={"api_key": api_key}).json()
        status = data.get("status")
        print(f"Status: {status}")
        if status in ("completed", "failed", "cancelled", "delivered"):
            return data
        time.sleep(poll_interval)

Job states: delayed → running → completed / failed / cancelled → in delivery → delivered.

Use webhooks instead of polling for long crawls — they push results as each page is scraped, rather than requiring you to poll for a final result.

Webhooks

A final summary payload is sent when the crawl completes, listing all succeeded/failed URLs and total cost.

{
  "callback": {
    "type": "webhook",
    "url": "https://yourapp.com/crawler-results"
  }
}

Scheduling

Omit schedule for a one-time crawl. Add it to run on a recurring interval:

{
  "schedule": {
    "name": "weekly-blog-crawl",
    "interval": "daily"
  }
}

Available intervals: once, hourly, daily, weekly, monthly. Scheduling requires a paid plan — free accounts can only run one-time crawls at max_depth: 1.

Credit Cost

Crawler credit cost = sum of all individual page requests in the crawl. Each page costs the same as a direct Standard API call with the same api_params (render, premium, etc.).

Set crawl_budget to cap total credits. The crawl stops gracefully when the budget is reached — it will not exceed it. Failed requests are not charged.

Check the sa-credit-cost header on each webhook payload to see per-page cost; use this to calibrate crawl_budget for future runs.

Error Handling

Documentation

Crawler overview
API reference
Dashboard

Related Skills

scraperapi/scraperapi-serp-intelligence

development

VerifiedTrustedCommunity

SERP landscape analysis for SEO strategy decisions. Use this skill when the user wants to understand what a search results page actually looks like for their target keywords — including AI Overview presence and attribution, SERP feature composition, how Google is interpreting query intent, which competitors dominate specific keyword sets, and where organic rankings actually translate to visible traffic. Trigger on requests like "analyze the SERP for [keyword]," "why isn't my content getting traffic even though it ranks," "what does Google show for [keyword]," "which keywords are worth targeting," "is [keyword] dominated by AI Overviews," "who owns the SERP for [topic]," "SERP analysis," "keyword landscape," or any request to understand what's happening on a search results page before making a content or SEO strategy decision.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-serp-intelligence

scraperapi/scraperapi-seo-audit

tools

VerifiedTrustedCommunity

Run a comprehensive SEO audit using ScraperAPI's live SERP and scraping tools — no setup required. Use this skill whenever the user wants to: audit SEO for a website, understand why a page isn't ranking, check SEO health, analyze keyword rankings, compare against competitors in search results, find content gaps, review on-page signals (titles, meta, headings, schema), diagnose a traffic drop, check indexation, or get prioritized SEO recommendations. Also trigger when the user says things like "why am I not showing up on Google," "my traffic dropped," "how do I rank for X," "what's wrong with my SEO," "SEO check," or "SEO review." This skill works out of the box — it uses the ScraperAPI MCP tools already connected to this session, with no CLI or API key setup needed.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-seo-audit

scraperapi/scraperapi-scraper-builder

development

VerifiedTrustedCommunity

Build and implement web scrapers using ScraperAPI. Use this skill whenever the user asks to build, write, create, or implement a scraper, or wants runnable code that extracts data from a website. Trigger on: "build me a scraper for [website]", "write a scraper that fetches product pages from [ecommerce site]", "I need to scrape [data] from [website]", "create a script that extracts [fields] from [URL]", "help me scrape [website] — I need [fields]", "write code to scrape [website]", "make a script that scrapes [website]", "implement a scraper for [URL]". Guides architectural decisions (structured endpoint vs. raw HTML, JS rendering, proxy tier, sync vs. async batch), then generates a complete runnable Python or Node.js script with retry logic, error handling, pagination, and credit estimation.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-scraper-builder

scraperapi/scraperapi-price-monitoring

development

VerifiedTrustedCommunity

Use this skill whenever the user wants to check, track, or be alerted about product prices on Amazon, Walmart, or via Google Shopping. Trigger on: "monitor the price of this Amazon product", "did the price drop on [Walmart URL]?", "track these ASINs", "compare today's prices to last week", "alert me if [product] goes below $X", "what's the current price of [product]?", "check my price watchlist", "scrape the price of [URL]", "is [product] cheaper anywhere else?". Accepts ASINs, Amazon/Walmart product URLs, or free-text product queries for Google Shopping. Reads an optional baseline JSON file to detect changes, fetches live prices via ScraperAPI's structured endpoints, and reports increases, decreases, restocks, and out-of-stock transitions in a structured change report. Use this skill even when the user does not say the word "monitor" — any one-shot or recurring price-check request belongs here.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-price-monitoring

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/scraperapi/scraperapi-skills.git

# Copy into Claude Code skills folder (global)
cp -r scraperapi-skills/skills/scraperapi-crawler ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

scraperapi/scraperapi-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

scraperapi/scraperapi-crawler

$ install --global

Security Scan Results

SKILL.md

ScraperAPI Crawler

When NOT to use the Crawler

Endpoints

Starting a Crawl

Required Parameters

URL Regex Patterns

Depth vs Budget

Per-Page Scraping Parameters (api_params)

Checking Status

Webhooks

Scheduling

Credit Cost

Error Handling

Documentation

Related Skills

scraperapi/scraperapi-serp-intelligence

scraperapi/scraperapi-seo-audit

scraperapi/scraperapi-scraper-builder

scraperapi/scraperapi-price-monitoring

scraperapi/scraperapi-crawler

$ install --global

Security Scan Results

SKILL.md

ScraperAPI Crawler

When NOT to use the Crawler

Endpoints

Starting a Crawl

Required Parameters

URL Regex Patterns

Depth vs Budget

Per-Page Scraping Parameters (api_params)

Checking Status

Webhooks

Scheduling

Credit Cost

Error Handling

Documentation

Related Skills

scraperapi/scraperapi-serp-intelligence

scraperapi/scraperapi-seo-audit

scraperapi/scraperapi-scraper-builder

scraperapi/scraperapi-price-monitoring

Per-Page Scraping Parameters (`api_params`)

Per-Page Scraping Parameters (`api_params`)