Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

openclaw/web-scraper-skill

Name: web-scraper-skill
Author: openclaw

abhishekj9621/web-scraper-skill/SKILL.md

npx skillsauth add openclaw/skills web-scraper-skill

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Web Scraper Skill (Apify + Firecrawl)

This skill helps Openclaw scrape and extract data from websites using two powerful APIs:

Firecrawl — best for scraping individual pages, crawling entire sites, and getting LLM-ready content (markdown)
Apify — best for specialized scrapers (social media, Google Maps, e-commerce, etc.) via pre-built Actors

Quick Decision Guide: Apify vs Firecrawl

| Use Case | Recommended Tool | |---|---| | Scrape a single page into markdown/JSON | Firecrawl /scrape | | Crawl an entire website (follow links) | Firecrawl /crawl | | Map all URLs on a site | Firecrawl /map | | Search web + scrape results | Firecrawl /search | | Scrape Instagram / TikTok / Twitter | Apify (social actors) | | Scrape Google Maps / reviews | Apify (compass/crawler-google-places) | | Scrape Amazon products | Apify (apify/amazon-scraper) | | Scrape Google Search results | Apify (apify/google-search-scraper) | | Custom actor / any Apify Store actor | Apify |

Authentication

Both APIs require API keys passed via headers. Always ask the user for their key if not provided.

Firecrawl: Authorization: Bearer fc-YOUR_API_KEY Apify: Authorization: Bearer YOUR_APIFY_TOKEN (or ?token=YOUR_TOKEN in URL)

Firecrawl API Reference

Base URL: https://api.firecrawl.dev/v2

1. Scrape a Single Page

POST /v2/scrape
Authorization: Bearer fc-YOUR_API_KEY
Content-Type: application/json

{
  "url": "https://example.com",
  "formats": ["markdown"],          // Options: markdown, html, rawHtml, links, screenshot, json
  "onlyMainContent": true,          // Strips nav/footer/ads
  "waitFor": 0,                     // ms to wait before scraping (for JS-heavy pages)
  "timeout": 30000,                 // ms
  "blockAds": true,
  "proxy": "auto"                   // "auto", "basic", or "stealth"
}

Response: { "success": true, "data": { "markdown": "...", "metadata": {...} } }

2. Crawl an Entire Website

Crawling is async — starts a job, then poll for results.

POST /v2/crawl
{
  "url": "https://docs.example.com",
  "limit": 50,                      // Max pages
  "maxDepth": 3,
  "allowExternalLinks": false,
  "scrapeOptions": {
    "formats": ["markdown"],
    "onlyMainContent": true
  }
}

Response: { "success": true, "id": "crawl-job-id" }

Poll status:

GET /v2/crawl/{crawl-job-id}

Response: { "status": "completed", "total": 50, "data": [...] }

3. Map a Website's URLs

POST /v2/map
{ "url": "https://example.com" }

Response: { "success": true, "links": [{ "url": "...", "title": "..." }] }

4. Search + Scrape in One Call

POST /v2/search
{
  "query": "best web scraping tools 2025",
  "limit": 5,
  "scrapeOptions": { "formats": ["markdown"] }
}

Response: { "data": [{ "url": "...", "title": "...", "markdown": "..." }] }

5. Batch Scrape Multiple URLs

POST /v2/batch/scrape
{
  "urls": ["https://a.com", "https://b.com"],
  "formats": ["markdown"]
}

Returns a job ID; poll with GET /v2/batch/scrape/{id}

Apify API Reference

Base URL: https://api.apify.com/v2 Auth: Pass token as query param ?token=YOUR_TOKEN or in Authorization header.

Core Workflow

Apify runs "Actors" (pre-built scrapers). The flow is:

Start a run → get a runId and defaultDatasetId
Poll status until SUCCEEDED
Fetch results from the dataset

1. Run an Actor (Async)

POST /v2/acts/{actorId}/runs?token=YOUR_TOKEN
Content-Type: application/json

{ ...actor-specific input... }

Response:

{
  "data": {
    "id": "RUN_ID",
    "status": "RUNNING",
    "defaultDatasetId": "DATASET_ID"
  }
}

Common Actor IDs:

apify/web-scraper — generic JS scraper
apify/google-search-scraper — Google SERPs
compass/crawler-google-places — Google Maps
apify/instagram-scraper — Instagram
clockworks/free-tiktok-scraper — TikTok
apify/amazon-scraper — Amazon products

2. Poll Run Status

GET /v2/acts/{actorId}/runs/{runId}?token=YOUR_TOKEN

Poll until status is SUCCEEDED or FAILED. Recommended interval: 5 seconds.

3. Fetch Results

GET /v2/datasets/{datasetId}/items?token=YOUR_TOKEN&format=json

Optional params: format (json/csv/xlsx/xml), limit, offset

4. Run Synchronously (≤5 minutes)

For short runs, use the sync endpoint — it waits and returns dataset items directly:

POST /v2/acts/{actorId}/run-sync-get-dataset-items?token=YOUR_TOKEN
Content-Type: application/json

{ ...actor input... }

Common Actor Inputs

Google Search Scraper:

{ "queries": "web scraping tools", "maxPagesPerQuery": 1, "resultsPerPage": 10 }

Google Maps Scraper:

{ "searchStringsArray": ["restaurants in Mumbai"], "maxCrawledPlaces": 20 }

Web Scraper (generic):

{
  "startUrls": [{ "url": "https://example.com" }],
  "pageFunction": "async function pageFunction(context) { const $ = context.jQuery; return { title: $('title').text() }; }",
  "maxPagesPerCrawl": 10
}

Output Handling

Firecrawl returns data directly in the response (or via polling for crawl/batch).
Apify stores results in a dataset; retrieve with GET /v2/datasets/{id}/items.
Both support JSON output. Firecrawl also provides clean markdown ideal for LLMs.
Apify also supports CSV, XLSX, XML output formats.

Code Templates

See references/code-templates.md for ready-to-run Python and JavaScript code for both APIs.

Error Handling

Firecrawl 402 → out of credits; user needs to upgrade plan
Firecrawl 429 → rate limited; add delays between requests
Apify FAILED run → check run logs via GET /v2/acts/{id}/runs/{runId}/log
Always wrap API calls in try/catch and check success: false in Firecrawl responses
Firecrawl crawls respect robots.txt by default
For JS-heavy pages, increase waitFor (Firecrawl) or use Playwright/Puppeteer actors (Apify)

Best Practices

Start small — test with 1 URL or a small limit before scaling
Use onlyMainContent: true in Firecrawl to remove nav/footer noise
Choose async for large jobs — don't use sync endpoints for crawls with 50+ pages
Store API keys securely — never hardcode them; use environment variables
Check rate limits — Firecrawl: varies by plan; Apify: 250k requests/min global
Prefer Firecrawl for LLM pipelines — markdown output is clean and ready for RAG/AI
Prefer Apify for social/structured data — specialized actors handle anti-bot better

openclaw/web-scraper-skill

abhishekj9621/web-scraper-skill/SKILL.md

Use this skill to scrape, crawl, or extract data from websites using Apify or Firecrawl APIs. Trigger whenever the user wants to: scrape a URL, crawl a website, extract structured data from web pages, run an Apify Actor, batch scrape multiple URLs, search and scrape the web, map a site's URLs, collect product/price/review data, or build any web data pipeline. If the user says things like "scrape this site", "get data from this URL", "crawl this website", "run an Apify actor", "use Firecrawl", "extract content from a page", "pull data from the web", or mentions any web data extraction task — always use this skill. Also use it when the user wants to choose between Apify and Firecrawl.

3,729 stars

development

Updated Apr 2, 2026

$ install --global

skillsauth

npx skillsauth add openclaw/skills web-scraper-skill

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 3, 2026, 7:58 AM4.2s1 file scanned

SKILL.md

Web Scraper Skill (Apify + Firecrawl)

This skill helps Openclaw scrape and extract data from websites using two powerful APIs:

Firecrawl — best for scraping individual pages, crawling entire sites, and getting LLM-ready content (markdown)
Apify — best for specialized scrapers (social media, Google Maps, e-commerce, etc.) via pre-built Actors

Quick Decision Guide: Apify vs Firecrawl

Authentication

Both APIs require API keys passed via headers. Always ask the user for their key if not provided.

Firecrawl: Authorization: Bearer fc-YOUR_API_KEY Apify: Authorization: Bearer YOUR_APIFY_TOKEN (or ?token=YOUR_TOKEN in URL)

Firecrawl API Reference

Base URL: https://api.firecrawl.dev/v2

1. Scrape a Single Page

POST /v2/scrape
Authorization: Bearer fc-YOUR_API_KEY
Content-Type: application/json

{
  "url": "https://example.com",
  "formats": ["markdown"],          // Options: markdown, html, rawHtml, links, screenshot, json
  "onlyMainContent": true,          // Strips nav/footer/ads
  "waitFor": 0,                     // ms to wait before scraping (for JS-heavy pages)
  "timeout": 30000,                 // ms
  "blockAds": true,
  "proxy": "auto"                   // "auto", "basic", or "stealth"
}

Response: { "success": true, "data": { "markdown": "...", "metadata": {...} } }

2. Crawl an Entire Website

Crawling is async — starts a job, then poll for results.

POST /v2/crawl
{
  "url": "https://docs.example.com",
  "limit": 50,                      // Max pages
  "maxDepth": 3,
  "allowExternalLinks": false,
  "scrapeOptions": {
    "formats": ["markdown"],
    "onlyMainContent": true
  }
}

Response: { "success": true, "id": "crawl-job-id" }

Poll status:

GET /v2/crawl/{crawl-job-id}

Response: { "status": "completed", "total": 50, "data": [...] }

3. Map a Website's URLs

POST /v2/map
{ "url": "https://example.com" }

Response: { "success": true, "links": [{ "url": "...", "title": "..." }] }

4. Search + Scrape in One Call

POST /v2/search
{
  "query": "best web scraping tools 2025",
  "limit": 5,
  "scrapeOptions": { "formats": ["markdown"] }
}

Response: { "data": [{ "url": "...", "title": "...", "markdown": "..." }] }

5. Batch Scrape Multiple URLs

POST /v2/batch/scrape
{
  "urls": ["https://a.com", "https://b.com"],
  "formats": ["markdown"]
}

Returns a job ID; poll with GET /v2/batch/scrape/{id}

Apify API Reference

Base URL: https://api.apify.com/v2 Auth: Pass token as query param ?token=YOUR_TOKEN or in Authorization header.

Core Workflow

Apify runs "Actors" (pre-built scrapers). The flow is:

Start a run → get a runId and defaultDatasetId
Poll status until SUCCEEDED
Fetch results from the dataset

1. Run an Actor (Async)

POST /v2/acts/{actorId}/runs?token=YOUR_TOKEN
Content-Type: application/json

{ ...actor-specific input... }

Response:

{
  "data": {
    "id": "RUN_ID",
    "status": "RUNNING",
    "defaultDatasetId": "DATASET_ID"
  }
}

Common Actor IDs:

apify/web-scraper — generic JS scraper
apify/google-search-scraper — Google SERPs
compass/crawler-google-places — Google Maps
apify/instagram-scraper — Instagram
clockworks/free-tiktok-scraper — TikTok
apify/amazon-scraper — Amazon products

2. Poll Run Status

GET /v2/acts/{actorId}/runs/{runId}?token=YOUR_TOKEN

Poll until status is SUCCEEDED or FAILED. Recommended interval: 5 seconds.

3. Fetch Results

GET /v2/datasets/{datasetId}/items?token=YOUR_TOKEN&format=json

Optional params: format (json/csv/xlsx/xml), limit, offset

4. Run Synchronously (≤5 minutes)

For short runs, use the sync endpoint — it waits and returns dataset items directly:

POST /v2/acts/{actorId}/run-sync-get-dataset-items?token=YOUR_TOKEN
Content-Type: application/json

{ ...actor input... }

Common Actor Inputs

Google Search Scraper:

{ "queries": "web scraping tools", "maxPagesPerQuery": 1, "resultsPerPage": 10 }

Google Maps Scraper:

{ "searchStringsArray": ["restaurants in Mumbai"], "maxCrawledPlaces": 20 }

Web Scraper (generic):

{
  "startUrls": [{ "url": "https://example.com" }],
  "pageFunction": "async function pageFunction(context) { const $ = context.jQuery; return { title: $('title').text() }; }",
  "maxPagesPerCrawl": 10
}

Output Handling

Firecrawl returns data directly in the response (or via polling for crawl/batch).
Apify stores results in a dataset; retrieve with GET /v2/datasets/{id}/items.
Both support JSON output. Firecrawl also provides clean markdown ideal for LLMs.
Apify also supports CSV, XLSX, XML output formats.

Code Templates

See references/code-templates.md for ready-to-run Python and JavaScript code for both APIs.

Error Handling

Firecrawl 402 → out of credits; user needs to upgrade plan
Firecrawl 429 → rate limited; add delays between requests
Apify FAILED run → check run logs via GET /v2/acts/{id}/runs/{runId}/log
Always wrap API calls in try/catch and check success: false in Firecrawl responses
Firecrawl crawls respect robots.txt by default
For JS-heavy pages, increase waitFor (Firecrawl) or use Playwright/Puppeteer actors (Apify)

Best Practices

Start small — test with 1 URL or a small limit before scaling
Use onlyMainContent: true in Firecrawl to remove nav/footer noise
Choose async for large jobs — don't use sync endpoints for crawls with 50+ pages
Store API keys securely — never hardcode them; use environment variables
Check rate limits — Firecrawl: varies by plan; Apify: 250k requests/min global
Prefer Firecrawl for LLM pipelines — markdown output is clean and ready for RAG/AI
Prefer Apify for social/structured data — specialized actors handle anti-bot better

Related Skills

openclaw/mcdonalds-skill

tools

VerifiedTrustedCommunity

Use when the user wants to connect to, test, or use the McDonalds service at mcp.mcd.cn, including checking authentication, probing MCP endpoints, listing tools, or calling McDonalds MCP tools through a reusable local CLI.

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/mcdonalds-skill

openclaw/scrapebadger

development

VerifiedTrustedCommunity

Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/scrapebadger

openclaw/slowmist-security-cc

development

VerifiedTrustedCommunity

SlowMist AI Agent Security Review — comprehensive security framework for skills, repositories, URLs, on-chain addresses, and products (Claude Code version)

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/slowmist-security-cc

openclaw/humanizer-cn

data-ai

VerifiedTrustedCommunity

去除中文文本中的 AI 写作痕迹，使其读起来自然。基于维基百科 AI 写作特征指南，检测 24 种 AI 模式。触发词：humanizer-cn、去除 AI 痕迹、去除 AI 写作痕迹、中文文本人性化。

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/humanizer-cn

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/openclaw/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/abhishekj9621/web-scraper-skill ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

openclaw/skills

3,729 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT