skills/scraperapi-datapipeline/SKILL.md
Product-usage reference for ScraperAPI's DataPipeline — managed, scheduled scraping projects that run automatically and deliver results to a webhook or dashboard download. Consult when the user needs recurring scraping, has a large list of URLs/ASINs/queries to process, or wants to avoid building and maintaining their own scraping infrastructure. Use when user asks: "schedule recurring scraping with ScraperAPI", "ScraperAPI DataPipeline", "how do I run a scraping project on a schedule", "scrape 10000 ASINs automatically", "ScraperAPI managed scraping project", "set up a ScraperAPI pipeline", "deliver scraping results to a webhook automatically". Covers project types, input methods, scheduling, output delivery, the DataPipeline API, job management, and credit costs.
npx skillsauth add scraperapi/scraperapi-skills scraperapi-datapipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
DataPipeline is a managed scraping product. You define a project (what to scrape, how often, where to send results), and ScraperAPI runs it on your schedule without you managing proxies, retries, or infrastructure.
Use DataPipeline when: scraping runs on a fixed schedule, the input list is large (up to 100,000 items), results should flow to a webhook automatically, or you want email notifications on job completion.
Base URL: https://datapipeline.scraperapi.com/api
Auth: ?api_key=YOUR_KEY (query parameter on every request)
Set projectType in the create request to choose what to scrape:
| Type | Input |
|------|-------|
| urls | Raw HTML from any URL |
| urls_with_js | Same but with JavaScript rendering |
| google_search | Search queries |
| google_news | Search queries |
| google_jobs | Search queries |
| google_shopping | Search queries |
| google_maps | Search queries |
| amazon_product | ASINs |
| amazon_search | Search queries |
| amazon_offers | ASINs |
| walmart_product | Product IDs |
| walmart_search | Search queries |
| walmart_category | Category IDs |
| walmart_reviews | Product IDs |
| ebay_product | 12-digit product IDs |
| ebay_search | Search queries |
| redfin_listing_for_sale | Listing URLs |
| redfin_listing_for_rent | Listing URLs |
| redfin_listing_search | Search result URLs |
| redfin_agent_details | Agent profile URLs |
import os, requests
API_KEY = os.environ["SCRAPERAPI_API_KEY"]
BASE = "https://datapipeline.scraperapi.com/api"
project = requests.post(
f"{BASE}/projects",
params={"api_key": API_KEY},
json={
"name": "Weekly Amazon price monitor",
"projectType": "amazon_product",
"schedulingEnabled": True,
"scrapingInterval": "weekly",
"scheduledAt": "now",
"projectInput": {
"type": "list",
"list": ["B09V3KXJPB", "B08N5WRWNW"] # ASINs
},
"apiParams": {
"country_code": "us"
},
"webhookOutput": {
"url": "https://yourapp.com/pipeline-results",
"webhookEncoding": "multipart_form_data_encoding"
},
"notificationConfig": {
"notifyOnSuccess": "with_every_run",
"notifyOnFailure": "with_every_run"
}
}
).json()
print(f"Project created: id={project['id']}")
| Field | Required | Description |
|-------|----------|-------------|
| name | No | Human-readable project name |
| projectType | Yes | What to scrape (see table above) |
| schedulingEnabled | No | true to enable recurring schedule |
| scrapingInterval | Yes (if scheduled) | See scheduling options below |
| scheduledAt | No | "now" to run immediately on create |
| projectInput | Yes | Input data (see input methods below) |
| apiParams | No | Standard ScraperAPI parameters |
| webhookOutput | No | Webhook delivery config |
| notificationConfig | No | Email notification settings |
{
"projectInput": {
"type": "list",
"list": ["query one", "query two", "B09V3KXJPB"]
}
}
Upload a CSV with one URL/query/ASIN per line — no header rows, no commas. Do this through the dashboard when creating a project; the API accepts list inputs only.
{
"projectInput": {
"type": "webhook",
"webhookUrl": "https://yourapp.com/input-items"
}
}
ScraperAPI polls your webhook URL for the item list when the job starts. One item per line; no commas. Useful for dynamically generated lists (e.g., new ASINs added since the last run).
| scrapingInterval | Description |
|-------------------|-------------|
| "once" | Run a single job immediately |
| "hourly" | Every hour |
| "daily" | Once per day |
| "weekly" | Once per week |
| "monthly" | Once per month |
| "cron" | Custom cron expression (use cron field instead of interval) |
Recurring schedules (hourly, daily, weekly, monthly, cron) require a paid plan.
Set "scheduledAt": "now" to trigger the first run immediately when the project is created.
Results are POSTed to your webhook URL as they complete. The webhookEncoding field controls
the format:
{
"webhookOutput": {
"url": "https://yourapp.com/results",
"webhookEncoding": "multipart_form_data_encoding"
}
}
Omit webhookOutput and results are saved for download in the
DataPipeline dashboard. Results are retained for
30 days then automatically deleted.
Output formats by project type:
urls / urls_with_js → HTML wrapped in JSONL# List all projects
projects = requests.get(f"{BASE}/projects", params={"api_key": API_KEY}).json()
# Get a single project
project = requests.get(f"{BASE}/projects/525", params={"api_key": API_KEY}).json()
# Update (partial update — only include fields to change)
requests.patch(
f"{BASE}/projects/525",
params={"api_key": API_KEY},
json={
"scrapingInterval": "daily",
"apiParams": {"premium": True},
"notificationConfig": {"notifyOnSuccess": "never"}
}
)
# Delete / archive (irreversible without support)
requests.delete(f"{BASE}/projects/525", params={"api_key": API_KEY})
Updatable fields: scrapingInterval, scheduledAt, outputFormat, apiParams, notificationConfig.
# List jobs for a project
jobs = requests.get(
f"{BASE}/projects/525/jobs",
params={"api_key": API_KEY}
).json()
# Cancel a running job
requests.delete(
f"{BASE}/projects/525/jobs/{job_id}",
params={"api_key": API_KEY}
)
# Running requests within the job finish first; final status becomes "Cancelled"
A new job can only start if no other job for that project is currently running.
{
"notificationConfig": {
"notifyOnSuccess": "with_every_run",
"notifyOnFailure": "with_every_run"
}
}
Options for both fields: "never", "with_every_run", "daily", "weekly".
apiParams ReferenceAll standard ScraperAPI parameters are supported inside apiParams:
| Parameter | Purpose |
|-----------|---------|
| country_code | Geotarget (e.g. "us", "gb") |
| render | JavaScript rendering |
| premium | Premium residential proxies |
| ultra_premium | Ultra-premium proxies (mutually exclusive with premium) |
| device_type | "desktop" or "mobile" |
| output_format | "text" or "markdown" for LLM pipelines |
| autoparse | Structured JSON extraction for supported sites |
| keep_headers | Forward custom headers |
| follow_redirect | Control redirect handling |
| wait_for_selector | Wait for CSS selector (requires render: true) |
| screenshot | Capture screenshot (auto-enables rendering) |
| retry_404 | Retry 404 responses |
DataPipeline uses the same underlying credit rates as the Standard API. Cost is the sum of all requests in a job run. Preview the estimated cost before launching a project from the dashboard.
Only successful 200 and 404 responses are charged; failed requests are not.
| Limit | Value | |-------|-------| | Max input items | 100,000 per job | | Direct list input | 500 items | | Data retention | 30 days | | Free plan concurrency | 5 connections | | Free plan scheduling | One-time runs only |
development
SERP landscape analysis for SEO strategy decisions. Use this skill when the user wants to understand what a search results page actually looks like for their target keywords — including AI Overview presence and attribution, SERP feature composition, how Google is interpreting query intent, which competitors dominate specific keyword sets, and where organic rankings actually translate to visible traffic. Trigger on requests like "analyze the SERP for [keyword]," "why isn't my content getting traffic even though it ranks," "what does Google show for [keyword]," "which keywords are worth targeting," "is [keyword] dominated by AI Overviews," "who owns the SERP for [topic]," "SERP analysis," "keyword landscape," or any request to understand what's happening on a search results page before making a content or SEO strategy decision.
tools
Run a comprehensive SEO audit using ScraperAPI's live SERP and scraping tools — no setup required. Use this skill whenever the user wants to: audit SEO for a website, understand why a page isn't ranking, check SEO health, analyze keyword rankings, compare against competitors in search results, find content gaps, review on-page signals (titles, meta, headings, schema), diagnose a traffic drop, check indexation, or get prioritized SEO recommendations. Also trigger when the user says things like "why am I not showing up on Google," "my traffic dropped," "how do I rank for X," "what's wrong with my SEO," "SEO check," or "SEO review." This skill works out of the box — it uses the ScraperAPI MCP tools already connected to this session, with no CLI or API key setup needed.
development
Build and implement web scrapers using ScraperAPI. Use this skill whenever the user asks to build, write, create, or implement a scraper, or wants runnable code that extracts data from a website. Trigger on: "build me a scraper for [website]", "write a scraper that fetches product pages from [ecommerce site]", "I need to scrape [data] from [website]", "create a script that extracts [fields] from [URL]", "help me scrape [website] — I need [fields]", "write code to scrape [website]", "make a script that scrapes [website]", "implement a scraper for [URL]". Guides architectural decisions (structured endpoint vs. raw HTML, JS rendering, proxy tier, sync vs. async batch), then generates a complete runnable Python or Node.js script with retry logic, error handling, pagination, and credit estimation.
development
Use this skill whenever the user wants to check, track, or be alerted about product prices on Amazon, Walmart, or via Google Shopping. Trigger on: "monitor the price of this Amazon product", "did the price drop on [Walmart URL]?", "track these ASINs", "compare today's prices to last week", "alert me if [product] goes below $X", "what's the current price of [product]?", "check my price watchlist", "scrape the price of [URL]", "is [product] cheaper anywhere else?". Accepts ASINs, Amazon/Walmart product URLs, or free-text product queries for Google Shopping. Reads an optional baseline JSON file to detect changes, fetches live prices via ScraperAPI's structured endpoints, and reports increases, decreases, restocks, and out-of-stock transitions in a structured change report. Use this skill even when the user does not say the word "monitor" — any one-shot or recurring price-check request belongs here.