skills/scraperapi-async/SKILL.md
Product-usage reference for ScraperAPI's Async Jobs API — submit scraping jobs in the background and retrieve results via polling or webhook, including batch jobs up to 50,000 URLs. Consult when the user is scraping many URLs, needs non-blocking requests, or wants webhook delivery. Use when user asks: "how do I scrape 1000 URLs with ScraperAPI", "ScraperAPI async jobs", "batch scraping with ScraperAPI", "submit a scraping job and poll for results", "ScraperAPI webhook callback", "scrape URLs in the background", "ScraperAPI batchjobs endpoint". Covers single jobs, batch jobs (up to 50k URLs), webhook callbacks, all apiParams, async-exclusive parameters, binary response decoding, retention policy, and error handling.
npx skillsauth add scraperapi/scraperapi-skills scraperapi-asyncInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The Async API submits scraping jobs in the background and retries them for up to 24 hours to maximize success. Results are retrieved by polling a status URL or received automatically via webhook.
api.scraperapi.com) — simpler and returns inline.Use Async when: scraping 20+ URLs, the target site is slow or flaky, you want webhook delivery, or you need to scrape PDFs/images.
| Action | Method | URL |
|--------|--------|-----|
| Submit single job | POST | https://async.scraperapi.com/jobs |
| Submit batch (up to 50k) | POST | https://async.scraperapi.com/batchjobs |
| Check / retrieve job | GET | https://async.scraperapi.com/jobs/<jobId> |
| Cancel job | DELETE | https://async.scraperapi.com/jobs/<jobId> |
Auth: apiKey in the JSON request body (note: apiKey camelCase, unlike the Standard API's api_key).
import os, requests, time
API_KEY = os.environ["SCRAPERAPI_API_KEY"]
# Submit
r = requests.post(
"https://async.scraperapi.com/jobs",
json={
"apiKey": API_KEY,
"url": "https://example.com/product/123",
"apiParams": {
"render": True,
"country_code": "us",
}
}
)
job = r.json()
# {"id": "...", "status": "running", "statusUrl": "...", "url": "..."}
# Poll
def poll(status_url, interval=5, max_wait=120):
deadline = time.time() + max_wait
while time.time() < deadline:
data = requests.get(status_url).json()
if data["status"] == "finished":
return data["response"]["body"]
if data["status"] == "failed":
raise RuntimeError(f"Job failed: {data.get('failReason')}")
time.sleep(interval)
raise TimeoutError("Job did not finish in time")
html = poll(job["statusUrl"])
Finished job response shape:
{
"id": "...",
"status": "finished",
"statusUrl": "...",
"url": "https://example.com/product/123",
"response": {
"headers": { "content-type": "text/html", "sa-final-url": "...", "sa-statuscode": "200" },
"body": "<!doctype html>...",
"statusCode": 200
}
}
jobs = requests.post(
"https://async.scraperapi.com/batchjobs",
json={
"apiKey": API_KEY,
"urls": [
"https://example.com/page/1",
"https://example.com/page/2",
# ... up to 50,000
],
"apiParams": {"country_code": "us"}
}
).json()
# Returns a list of {id, status, statusUrl, url} — one per submitted URL
results = [poll(job["statusUrl"]) for job in jobs]
For workloads over 50,000 URLs, split into multiple batch requests. Use webhooks (below) instead of polling when batches are large — polling 10,000 status URLs serially is slow.
Use webhooks to receive results without polling. ScraperAPI POSTs the completed job payload to your URL when the scrape finishes.
requests.post(
"https://async.scraperapi.com/jobs",
json={
"apiKey": API_KEY,
"url": "https://example.com/",
"callback": {
"type": "webhook",
"url": "https://yourapp.com/scraperapi/callback"
}
}
)
Webhook mechanics:
"expectUnsuccessReport": true to also receive failed job payloads.Failed job callback payload:
{
"id": "...",
"attempts": 50,
"status": "failed",
"failReason": "failed_due_to_timeout",
"url": "https://example.com/"
}
{
"apiKey": "YOUR_KEY",
"url": "https://example.com",
"urls": ["url1", "url2"],
"method": "GET",
"headers": { "Accept-Language": "en-US" },
"body": "foo=bar",
"callback": { "type": "webhook", "url": "https://..." },
"expectUnsuccessReport": false,
"timeoutSec": 600,
"meta": { "jobLabel": "batch-42" },
"apiParams": {
"autoparse": false,
"country_code": "us",
"keep_headers": false,
"device_type": "desktop",
"follow_redirect": true,
"premium": false,
"ultra_premium": false,
"render": false,
"wait_for_selector": ".content",
"screenshot": false,
"retry_404": false,
"output_format": "html",
"max_cost": 10
}
}
| Parameter | Type | Purpose |
|-----------|------|---------|
| expectUnsuccessReport | boolean | Receive webhook payload for failed jobs too |
| timeoutSec | integer | Override default job timeout (seconds) |
| meta | object | Custom metadata — echoed back in every response/callback for correlation |
meta is especially useful for tracking which batch or workflow a job belongs to:
{ "meta": { "batchId": "run-2024-06", "sourceFile": "urls.csv" } }
requests.post(
"https://async.scraperapi.com/jobs",
json={
"apiKey": API_KEY,
"url": "https://api.example.com/search",
"method": "POST",
"headers": {"content-type": "application/x-www-form-urlencoded"},
"body": "query=scraperapi&page=1",
}
)
When the target URL returns binary content, the response body is Base64-encoded in
response.base64EncodedBody.
import base64
r = requests.post(
"https://async.scraperapi.com/jobs",
json={"apiKey": API_KEY, "url": "https://example.com/report.pdf"}
)
job = r.json()
# ... wait or poll ...
result = requests.get(job["statusUrl"]).json()
pdf_bytes = base64.b64decode(result["response"]["base64EncodedBody"])
with open("report.pdf", "wb") as f:
f.write(pdf_bytes)
Job results are stored for up to 72 hours (24 hours guaranteed) after the job finishes. After that, the data is deleted — resubmit the job if you need it again.
Retrieve results before the retention window closes. For long pipelines, prefer webhooks so results are pushed to your system immediately upon completion.
| Status | Meaning | Action |
|--------|---------|--------|
| Job finished, statusCode: 200 | Success | Use response.body |
| Job finished, statusCode: 403 | Target blocked the scrape | Retry with premium: true in apiParams |
| Job failed, failReason: failed_due_to_timeout | Timed out after 24h retries | Check if target is reachable; try render: false |
| HTTP 401 on submission | Bad API key | Check SCRAPERAPI_API_KEY |
| HTTP 403 on submission | Out of credits or plan limit | Check dashboard |
| HTTP 429 on submission | Too many concurrent submissions | Back off and re-submit in batches |
Use max_cost in apiParams to cap per-request credit spend — requests that would exceed the
cap return a 403 rather than consuming more credits than expected.
The Async API uses the same credit costs as the Standard API:
| Request type | Credits |
|---|---|
| Standard | 1 |
| render: true | 10 |
| premium: true | 10 |
| ultra_premium: true | 30 |
| Failed requests | 0 |
Async jobs that fail after exhausting all retries are not charged.
development
SERP landscape analysis for SEO strategy decisions. Use this skill when the user wants to understand what a search results page actually looks like for their target keywords — including AI Overview presence and attribution, SERP feature composition, how Google is interpreting query intent, which competitors dominate specific keyword sets, and where organic rankings actually translate to visible traffic. Trigger on requests like "analyze the SERP for [keyword]," "why isn't my content getting traffic even though it ranks," "what does Google show for [keyword]," "which keywords are worth targeting," "is [keyword] dominated by AI Overviews," "who owns the SERP for [topic]," "SERP analysis," "keyword landscape," or any request to understand what's happening on a search results page before making a content or SEO strategy decision.
tools
Run a comprehensive SEO audit using ScraperAPI's live SERP and scraping tools — no setup required. Use this skill whenever the user wants to: audit SEO for a website, understand why a page isn't ranking, check SEO health, analyze keyword rankings, compare against competitors in search results, find content gaps, review on-page signals (titles, meta, headings, schema), diagnose a traffic drop, check indexation, or get prioritized SEO recommendations. Also trigger when the user says things like "why am I not showing up on Google," "my traffic dropped," "how do I rank for X," "what's wrong with my SEO," "SEO check," or "SEO review." This skill works out of the box — it uses the ScraperAPI MCP tools already connected to this session, with no CLI or API key setup needed.
development
Build and implement web scrapers using ScraperAPI. Use this skill whenever the user asks to build, write, create, or implement a scraper, or wants runnable code that extracts data from a website. Trigger on: "build me a scraper for [website]", "write a scraper that fetches product pages from [ecommerce site]", "I need to scrape [data] from [website]", "create a script that extracts [fields] from [URL]", "help me scrape [website] — I need [fields]", "write code to scrape [website]", "make a script that scrapes [website]", "implement a scraper for [URL]". Guides architectural decisions (structured endpoint vs. raw HTML, JS rendering, proxy tier, sync vs. async batch), then generates a complete runnable Python or Node.js script with retry logic, error handling, pagination, and credit estimation.
development
Use this skill whenever the user wants to check, track, or be alerted about product prices on Amazon, Walmart, or via Google Shopping. Trigger on: "monitor the price of this Amazon product", "did the price drop on [Walmart URL]?", "track these ASINs", "compare today's prices to last week", "alert me if [product] goes below $X", "what's the current price of [product]?", "check my price watchlist", "scrape the price of [URL]", "is [product] cheaper anywhere else?". Accepts ASINs, Amazon/Walmart product URLs, or free-text product queries for Google Shopping. Reads an optional baseline JSON file to detect changes, fetches live prices via ScraperAPI's structured endpoints, and reports increases, decreases, restocks, and out-of-stock transitions in a structured change report. Use this skill even when the user does not say the word "monitor" — any one-shot or recurring price-check request belongs here.