Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

scraperapi/scraperapi-datapipeline

Name: scraperapi-datapipeline
Author: scraperapi

skills/scraperapi-datapipeline/SKILL.md

npx skillsauth add scraperapi/scraperapi-skills scraperapi-datapipeline

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

ScraperAPI DataPipeline

DataPipeline is a managed scraping product. You define a project (what to scrape, how often, where to send results), and ScraperAPI runs it on your schedule without you managing proxies, retries, or infrastructure.

When NOT to use DataPipeline

One-off scrapes of a known URL list → use the Async API — faster, cheaper, no project setup.
Exploring a site without known URLs → use the Crawler.
Need results in real-time within your code → Async API is programmable; DataPipeline is scheduled.
Free plan, need recurring execution → recurring schedules require a paid plan.

Use DataPipeline when: scraping runs on a fixed schedule, the input list is large (up to 100,000 items), results should flow to a webhook automatically, or you want email notifications on job completion.

Base URL and Auth

Base URL: https://datapipeline.scraperapi.com/api
Auth:     ?api_key=YOUR_KEY  (query parameter on every request)

Project Types

Set projectType in the create request to choose what to scrape:

| Type | Input | |------|-------| | urls | Raw HTML from any URL | | urls_with_js | Same but with JavaScript rendering | | google_search | Search queries | | google_news | Search queries | | google_jobs | Search queries | | google_shopping | Search queries | | google_maps | Search queries | | amazon_product | ASINs | | amazon_search | Search queries | | amazon_offers | ASINs | | walmart_product | Product IDs | | walmart_search | Search queries | | walmart_category | Category IDs | | walmart_reviews | Product IDs | | ebay_product | 12-digit product IDs | | ebay_search | Search queries | | redfin_listing_for_sale | Listing URLs | | redfin_listing_for_rent | Listing URLs | | redfin_listing_search | Search result URLs | | redfin_agent_details | Agent profile URLs |

Creating a Project

import os, requests

API_KEY = os.environ["SCRAPERAPI_API_KEY"]
BASE    = "https://datapipeline.scraperapi.com/api"

project = requests.post(
    f"{BASE}/projects",
    params={"api_key": API_KEY},
    json={
        "name":               "Weekly Amazon price monitor",
        "projectType":        "amazon_product",
        "schedulingEnabled":  True,
        "scrapingInterval":   "weekly",
        "scheduledAt":        "now",
        "projectInput": {
            "type": "list",
            "list": ["B09V3KXJPB", "B08N5WRWNW"]   # ASINs
        },
        "apiParams": {
            "country_code": "us"
        },
        "webhookOutput": {
            "url":             "https://yourapp.com/pipeline-results",
            "webhookEncoding": "multipart_form_data_encoding"
        },
        "notificationConfig": {
            "notifyOnSuccess": "with_every_run",
            "notifyOnFailure": "with_every_run"
        }
    }
).json()

print(f"Project created: id={project['id']}")

Create request fields

| Field | Required | Description | |-------|----------|-------------| | name | No | Human-readable project name | | projectType | Yes | What to scrape (see table above) | | schedulingEnabled | No | true to enable recurring schedule | | scrapingInterval | Yes (if scheduled) | See scheduling options below | | scheduledAt | No | "now" to run immediately on create | | projectInput | Yes | Input data (see input methods below) | | apiParams | No | Standard ScraperAPI parameters | | webhookOutput | No | Webhook delivery config | | notificationConfig | No | Email notification settings |

Input Methods

Direct list (up to 500 items)

{
  "projectInput": {
    "type": "list",
    "list": ["query one", "query two", "B09V3KXJPB"]
  }
}

CSV file (up to 100,000 items)

Upload a CSV with one URL/query/ASIN per line — no header rows, no commas. Do this through the dashboard when creating a project; the API accepts list inputs only.

Webhook input (dynamic polling)

{
  "projectInput": {
    "type": "webhook",
    "webhookUrl": "https://yourapp.com/input-items"
  }
}

ScraperAPI polls your webhook URL for the item list when the job starts. One item per line; no commas. Useful for dynamically generated lists (e.g., new ASINs added since the last run).

Scheduling Options

| scrapingInterval | Description | |-------------------|-------------| | "once" | Run a single job immediately | | "hourly" | Every hour | | "daily" | Once per day | | "weekly" | Once per week | | "monthly" | Once per month | | "cron" | Custom cron expression (use cron field instead of interval) |

Recurring schedules (hourly, daily, weekly, monthly, cron) require a paid plan.

Set "scheduledAt": "now" to trigger the first run immediately when the project is created.

Output / Delivery

Webhook delivery

Results are POSTed to your webhook URL as they complete. The webhookEncoding field controls the format:

{
  "webhookOutput": {
    "url":             "https://yourapp.com/results",
    "webhookEncoding": "multipart_form_data_encoding"
  }
}

Dashboard download

Omit webhookOutput and results are saved for download in the DataPipeline dashboard. Results are retained for 30 days then automatically deleted.

Output formats by project type:

urls / urls_with_js → HTML wrapped in JSONL
Structured types (Amazon, Google, Walmart, eBay, Redfin) → JSON or CSV

Managing Projects

# List all projects
projects = requests.get(f"{BASE}/projects", params={"api_key": API_KEY}).json()

# Get a single project
project = requests.get(f"{BASE}/projects/525", params={"api_key": API_KEY}).json()

# Update (partial update — only include fields to change)
requests.patch(
    f"{BASE}/projects/525",
    params={"api_key": API_KEY},
    json={
        "scrapingInterval": "daily",
        "apiParams":        {"premium": True},
        "notificationConfig": {"notifyOnSuccess": "never"}
    }
)

# Delete / archive (irreversible without support)
requests.delete(f"{BASE}/projects/525", params={"api_key": API_KEY})

Updatable fields: scrapingInterval, scheduledAt, outputFormat, apiParams, notificationConfig.

Managing Jobs

# List jobs for a project
jobs = requests.get(
    f"{BASE}/projects/525/jobs",
    params={"api_key": API_KEY}
).json()

# Cancel a running job
requests.delete(
    f"{BASE}/projects/525/jobs/{job_id}",
    params={"api_key": API_KEY}
)
# Running requests within the job finish first; final status becomes "Cancelled"

A new job can only start if no other job for that project is currently running.

Notification Config

{
  "notificationConfig": {
    "notifyOnSuccess": "with_every_run",
    "notifyOnFailure": "with_every_run"
  }
}

Options for both fields: "never", "with_every_run", "daily", "weekly".

`apiParams` Reference

All standard ScraperAPI parameters are supported inside apiParams:

| Parameter | Purpose | |-----------|---------| | country_code | Geotarget (e.g. "us", "gb") | | render | JavaScript rendering | | premium | Premium residential proxies | | ultra_premium | Ultra-premium proxies (mutually exclusive with premium) | | device_type | "desktop" or "mobile" | | output_format | "text" or "markdown" for LLM pipelines | | autoparse | Structured JSON extraction for supported sites | | keep_headers | Forward custom headers | | follow_redirect | Control redirect handling | | wait_for_selector | Wait for CSS selector (requires render: true) | | screenshot | Capture screenshot (auto-enables rendering) | | retry_404 | Retry 404 responses |

Credit Costs

DataPipeline uses the same underlying credit rates as the Standard API. Cost is the sum of all requests in a job run. Preview the estimated cost before launching a project from the dashboard.

Only successful 200 and 404 responses are charged; failed requests are not.

Limits

| Limit | Value | |-------|-------| | Max input items | 100,000 per job | | Direct list input | 500 items | | Data retention | 30 days | | Free plan concurrency | 5 connections | | Free plan scheduling | One-time runs only |

Documentation

DataPipeline overview
Dashboard — manage projects
API reference

scraperapi/scraperapi-datapipeline

skills/scraperapi-datapipeline/SKILL.md

Product-usage reference for ScraperAPI's DataPipeline — managed, scheduled scraping projects that run automatically and deliver results to a webhook or dashboard download. Consult when the user needs recurring scraping, has a large list of URLs/ASINs/queries to process, or wants to avoid building and maintaining their own scraping infrastructure. Use when user asks: "schedule recurring scraping with ScraperAPI", "ScraperAPI DataPipeline", "how do I run a scraping project on a schedule", "scrape 10000 ASINs automatically", "ScraperAPI managed scraping project", "set up a ScraperAPI pipeline", "deliver scraping results to a webhook automatically". Covers project types, input methods, scheduling, output delivery, the DataPipeline API, job management, and credit costs.

2 stars

development

Updated Jun 2, 2026

$ install --global

skillsauth

npx skillsauth add scraperapi/scraperapi-skills scraperapi-datapipeline

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 2, 2026, 4:39 AM154.3s1 file scanned

SKILL.md

name:: scraperapi-datapipeline
description:: >
Use when user asks:: schedule recurring scraping with ScraperAPI", "ScraperAPI DataPipeline",
emoji:: 🔄
homepage:: https://docs.scraperapi.com/data-pipeline

ScraperAPI DataPipeline

When NOT to use DataPipeline

One-off scrapes of a known URL list → use the Async API — faster, cheaper, no project setup.
Exploring a site without known URLs → use the Crawler.
Need results in real-time within your code → Async API is programmable; DataPipeline is scheduled.
Free plan, need recurring execution → recurring schedules require a paid plan.

Base URL and Auth

Base URL: https://datapipeline.scraperapi.com/api
Auth:     ?api_key=YOUR_KEY  (query parameter on every request)

Project Types

Set projectType in the create request to choose what to scrape:

Creating a Project

import os, requests

API_KEY = os.environ["SCRAPERAPI_API_KEY"]
BASE    = "https://datapipeline.scraperapi.com/api"

project = requests.post(
    f"{BASE}/projects",
    params={"api_key": API_KEY},
    json={
        "name":               "Weekly Amazon price monitor",
        "projectType":        "amazon_product",
        "schedulingEnabled":  True,
        "scrapingInterval":   "weekly",
        "scheduledAt":        "now",
        "projectInput": {
            "type": "list",
            "list": ["B09V3KXJPB", "B08N5WRWNW"]   # ASINs
        },
        "apiParams": {
            "country_code": "us"
        },
        "webhookOutput": {
            "url":             "https://yourapp.com/pipeline-results",
            "webhookEncoding": "multipart_form_data_encoding"
        },
        "notificationConfig": {
            "notifyOnSuccess": "with_every_run",
            "notifyOnFailure": "with_every_run"
        }
    }
).json()

print(f"Project created: id={project['id']}")

Create request fields

Input Methods

Direct list (up to 500 items)

{
  "projectInput": {
    "type": "list",
    "list": ["query one", "query two", "B09V3KXJPB"]
  }
}

CSV file (up to 100,000 items)

Upload a CSV with one URL/query/ASIN per line — no header rows, no commas. Do this through the dashboard when creating a project; the API accepts list inputs only.

Webhook input (dynamic polling)

{
  "projectInput": {
    "type": "webhook",
    "webhookUrl": "https://yourapp.com/input-items"
  }
}

ScraperAPI polls your webhook URL for the item list when the job starts. One item per line; no commas. Useful for dynamically generated lists (e.g., new ASINs added since the last run).

Scheduling Options

Recurring schedules (hourly, daily, weekly, monthly, cron) require a paid plan.

Set "scheduledAt": "now" to trigger the first run immediately when the project is created.

Output / Delivery

Webhook delivery

Results are POSTed to your webhook URL as they complete. The webhookEncoding field controls the format:

{
  "webhookOutput": {
    "url":             "https://yourapp.com/results",
    "webhookEncoding": "multipart_form_data_encoding"
  }
}

Dashboard download

Omit webhookOutput and results are saved for download in the DataPipeline dashboard. Results are retained for 30 days then automatically deleted.

Output formats by project type:

urls / urls_with_js → HTML wrapped in JSONL
Structured types (Amazon, Google, Walmart, eBay, Redfin) → JSON or CSV

Managing Projects

# List all projects
projects = requests.get(f"{BASE}/projects", params={"api_key": API_KEY}).json()

# Get a single project
project = requests.get(f"{BASE}/projects/525", params={"api_key": API_KEY}).json()

# Update (partial update — only include fields to change)
requests.patch(
    f"{BASE}/projects/525",
    params={"api_key": API_KEY},
    json={
        "scrapingInterval": "daily",
        "apiParams":        {"premium": True},
        "notificationConfig": {"notifyOnSuccess": "never"}
    }
)

# Delete / archive (irreversible without support)
requests.delete(f"{BASE}/projects/525", params={"api_key": API_KEY})

Updatable fields: scrapingInterval, scheduledAt, outputFormat, apiParams, notificationConfig.

Managing Jobs

# List jobs for a project
jobs = requests.get(
    f"{BASE}/projects/525/jobs",
    params={"api_key": API_KEY}
).json()

# Cancel a running job
requests.delete(
    f"{BASE}/projects/525/jobs/{job_id}",
    params={"api_key": API_KEY}
)
# Running requests within the job finish first; final status becomes "Cancelled"

A new job can only start if no other job for that project is currently running.

Notification Config

{
  "notificationConfig": {
    "notifyOnSuccess": "with_every_run",
    "notifyOnFailure": "with_every_run"
  }
}

Options for both fields: "never", "with_every_run", "daily", "weekly".

`apiParams` Reference

All standard ScraperAPI parameters are supported inside apiParams:

Credit Costs

DataPipeline uses the same underlying credit rates as the Standard API. Cost is the sum of all requests in a job run. Preview the estimated cost before launching a project from the dashboard.

Only successful 200 and 404 responses are charged; failed requests are not.

Limits

Documentation

DataPipeline overview
Dashboard — manage projects
API reference

Related Skills

scraperapi/scraperapi-serp-intelligence

development

VerifiedTrustedCommunity

SERP landscape analysis for SEO strategy decisions. Use this skill when the user wants to understand what a search results page actually looks like for their target keywords — including AI Overview presence and attribution, SERP feature composition, how Google is interpreting query intent, which competitors dominate specific keyword sets, and where organic rankings actually translate to visible traffic. Trigger on requests like "analyze the SERP for [keyword]," "why isn't my content getting traffic even though it ranks," "what does Google show for [keyword]," "which keywords are worth targeting," "is [keyword] dominated by AI Overviews," "who owns the SERP for [topic]," "SERP analysis," "keyword landscape," or any request to understand what's happening on a search results page before making a content or SEO strategy decision.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-serp-intelligence

scraperapi/scraperapi-seo-audit

tools

VerifiedTrustedCommunity

Run a comprehensive SEO audit using ScraperAPI's live SERP and scraping tools — no setup required. Use this skill whenever the user wants to: audit SEO for a website, understand why a page isn't ranking, check SEO health, analyze keyword rankings, compare against competitors in search results, find content gaps, review on-page signals (titles, meta, headings, schema), diagnose a traffic drop, check indexation, or get prioritized SEO recommendations. Also trigger when the user says things like "why am I not showing up on Google," "my traffic dropped," "how do I rank for X," "what's wrong with my SEO," "SEO check," or "SEO review." This skill works out of the box — it uses the ScraperAPI MCP tools already connected to this session, with no CLI or API key setup needed.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-seo-audit

scraperapi/scraperapi-scraper-builder

development

VerifiedTrustedCommunity

Build and implement web scrapers using ScraperAPI. Use this skill whenever the user asks to build, write, create, or implement a scraper, or wants runnable code that extracts data from a website. Trigger on: "build me a scraper for [website]", "write a scraper that fetches product pages from [ecommerce site]", "I need to scrape [data] from [website]", "create a script that extracts [fields] from [URL]", "help me scrape [website] — I need [fields]", "write code to scrape [website]", "make a script that scrapes [website]", "implement a scraper for [URL]". Guides architectural decisions (structured endpoint vs. raw HTML, JS rendering, proxy tier, sync vs. async batch), then generates a complete runnable Python or Node.js script with retry logic, error handling, pagination, and credit estimation.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-scraper-builder

scraperapi/scraperapi-price-monitoring

development

VerifiedTrustedCommunity

Use this skill whenever the user wants to check, track, or be alerted about product prices on Amazon, Walmart, or via Google Shopping. Trigger on: "monitor the price of this Amazon product", "did the price drop on [Walmart URL]?", "track these ASINs", "compare today's prices to last week", "alert me if [product] goes below $X", "what's the current price of [product]?", "check my price watchlist", "scrape the price of [URL]", "is [product] cheaper anywhere else?". Accepts ASINs, Amazon/Walmart product URLs, or free-text product queries for Google Shopping. Reads an optional baseline JSON file to detect changes, fetches live prices via ScraperAPI's structured endpoints, and reports increases, decreases, restocks, and out-of-stock transitions in a structured change report. Use this skill even when the user does not say the word "monitor" — any one-shot or recurring price-check request belongs here.

3SKILL.mdUpdated Jun 2, 2026

scraperapi/scraperapi-price-monitoring

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/scraperapi/scraperapi-skills.git

# Copy into Claude Code skills folder (global)
cp -r scraperapi-skills/skills/scraperapi-datapipeline ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

scraperapi/scraperapi-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

scraperapi/scraperapi-datapipeline

$ install --global

Security Scan Results

SKILL.md

ScraperAPI DataPipeline

When NOT to use DataPipeline

Base URL and Auth

Project Types

Creating a Project

Create request fields

Input Methods

Direct list (up to 500 items)

CSV file (up to 100,000 items)

Webhook input (dynamic polling)

Scheduling Options

Output / Delivery

Webhook delivery

Dashboard download

Managing Projects

Managing Jobs

Notification Config

apiParams Reference

Credit Costs

Limits

Documentation

Related Skills

scraperapi/scraperapi-serp-intelligence

scraperapi/scraperapi-seo-audit

scraperapi/scraperapi-scraper-builder

scraperapi/scraperapi-price-monitoring

scraperapi/scraperapi-datapipeline

$ install --global

Security Scan Results

SKILL.md

ScraperAPI DataPipeline

When NOT to use DataPipeline

Base URL and Auth

Project Types

Creating a Project

Create request fields

Input Methods

Direct list (up to 500 items)

CSV file (up to 100,000 items)

Webhook input (dynamic polling)

Scheduling Options

Output / Delivery

Webhook delivery

Dashboard download

Managing Projects

Managing Jobs

Notification Config

apiParams Reference

Credit Costs

Limits

Documentation

Related Skills

scraperapi/scraperapi-serp-intelligence

scraperapi/scraperapi-seo-audit

scraperapi/scraperapi-scraper-builder

scraperapi/scraperapi-price-monitoring

`apiParams` Reference

`apiParams` Reference