skills/scraperapi-php-sdk/SKILL.md
Best-practices reference for the ScraperAPI PHP SDK (scraperapi/sdk Composer package). Consult whenever the user is writing, debugging, or reviewing PHP code that calls ScraperAPI. Use when user asks: "scrape a website with PHP and ScraperAPI", "ScraperAPI PHP example", "how do I use the ScraperAPI PHP SDK", "PHP ScraperAPI render", "ScraperAPI PHP premium proxy", "ScraperAPI PHP Composer install", "ScraperAPI PHP error handling". Covers Composer setup, all request parameters, the escalation ladder, POST requests, error handling, and credit costs.
npx skillsauth add scraperapi/scraperapi-skills scraperapi-php-sdkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Requires: PHP 7.0+, Composer, composer require scraperapi/sdk, SCRAPERAPI_API_KEY environment variable.
<?php
require __DIR__ . '/vendor/autoload.php';
use ScraperAPI\Client;
$client = new Client(getenv('SCRAPERAPI_API_KEY'));
Never hardcode the API key. Read it from the environment every time.
// Simple GET — returns HTML string via ->raw_body
$html = $client->get("https://example.com/")->raw_body;
echo $html;
// With a single parameter
$html = $client->get("https://example.com/", ["render" => true])->raw_body;
// With multiple parameters
$html = $client->get(
"https://example.com/",
[
"render" => true,
"country_code" => "us",
]
)->raw_body;
Parameters are passed as an associative array. ->raw_body extracts the HTML string from the response object.
| Situation | Approach |
|-----------|---------|
| Single URL, synchronous | $client->get($url, $params)->raw_body |
| Page loads content via JavaScript | Add "render" => true |
| Site blocks datacenter proxies | Add "premium" => true |
| Toughest anti-bot protection | Add "ultra_premium" => true |
| Multi-step / paginated flow on same domain | Use "session_number" |
| POST a form or JSON body to target | $client->post($url, $options)->raw_body |
| 20+ URLs or batch jobs | Use async endpoint via cURL or Guzzle |
| Supported platform (Amazon, Google, etc.) | Use structured data endpoint directly |
// Render JavaScript before returning HTML
// Use when: page is a React/Vue/Angular SPA, or scrape returns empty/partial content
// Cost: +10 credits
$html = $client->get("https://spa-site.com/", ["render" => true])->raw_body;
// Wait for a specific DOM element (requires render: true)
$html = $client->get("https://spa-site.com/", [
"render" => true,
"wait_for_selector" => ".product-list",
])->raw_body;
// Screenshot (auto-enables rendering)
$html = $client->get("https://example.com/", ["screenshot" => true])->raw_body;
Start without render. Add it only when the response is missing expected content — it increases cost and latency.
// Route through a country-specific proxy — no extra credit cost
$html = $client->get("https://example.com/", ["country_code" => "de"])->raw_body;
// Premium residential/mobile IPs — for sites that block datacenter proxies
// Cost: 10 credits (25 with render)
$html = $client->get("https://hard-site.com/", ["premium" => true])->raw_body;
// Ultra-premium — for the toughest anti-bot protections
// Cost: 30 credits (75 with render)
// Note: incompatible with custom headers — keep_headers is ignored
$html = $client->get("https://hardest-site.com/", ["ultra_premium" => true])->raw_body;
premium and ultra_premium are mutually exclusive — never set both.
Escalation order: standard (1 cr) → render (10 cr) → premium (10 cr) → ultra_premium (30 cr).
// Reuse the same proxy IP across requests — useful for pagination and multi-step flows
// Sessions expire 15 minutes after last use; any integer is a valid session ID
$html1 = $client->get("https://example.com/page1", ["session_number" => 42])->raw_body;
$html2 = $client->get("https://example.com/page2", ["session_number" => 42])->raw_body;
// Forward custom headers to the target site
// Note: keep_headers is ignored when ultra_premium is true
$html = $client->get("https://example.com/", [
"keep_headers" => true,
])->raw_body;
// Pass headers in the request options array alongside ScraperAPI params
// Emulate a mobile or desktop browser user-agent
$html = $client->get("https://example.com/", ["device_type" => "mobile"])->raw_body;
// Return structured JSON instead of HTML for supported sites (Amazon, Google, etc.)
$json = $client->get("https://amazon.com/dp/B09V3KXJPB", ["autoparse" => true])->raw_body;
$data = json_decode($json, true);
// Markdown output — useful for text pipelines
$md = $client->get("https://docs.example.com/", ["output_format" => "markdown"])->raw_body;
// POST a JSON body to the target site through ScraperAPI's proxy
$options = [
"body" => json_encode(["key" => "value"]),
"headers" => ["Content-Type" => "application/json"],
];
$result = $client->post("https://example.com/api", $options)->raw_body;
Always start with the cheapest option and escalate only when blocked.
function scrapeWithEscalation(Client $client, string $url): ?string
{
$tiers = [
[],
["render" => true],
["premium" => true],
["premium" => true, "render" => true],
["ultra_premium" => true],
];
foreach ($tiers as $params) {
$html = $client->get($url, $params)->raw_body;
if ($html && stripos($html, '<html') !== false) {
return $html;
}
}
return null;
}
The SDK is synchronous — each ->get() call blocks until the response (up to 70 seconds). For 20+ URLs, use the async REST endpoint.
$apiKey = getenv('SCRAPERAPI_API_KEY');
function submitJob(string $url, array $apiParams = []): array
{
global $apiKey;
$ch = curl_init('https://async.scraperapi.com/jobs');
curl_setopt_array($ch, [
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => json_encode(['apiKey' => $apiKey, 'url' => $url, 'apiParams' => $apiParams]),
CURLOPT_HTTPHEADER => ['Content-Type: application/json'],
CURLOPT_RETURNTRANSFER => true,
]);
$response = curl_exec($ch);
curl_close($ch);
return json_decode($response, true); // ["id" => "...", "statusUrl" => "..."]
}
function pollJob(array $job, int $maxWait = 120, int $interval = 5): string
{
$deadline = time() + $maxWait;
while (time() < $deadline) {
$data = json_decode(file_get_contents($job['statusUrl']), true);
if ($data['status'] === 'finished') return $data['response']['body'];
if ($data['status'] === 'failed') throw new \RuntimeException("Job {$job['id']} failed");
sleep($interval);
}
throw new \RuntimeException("Job {$job['id']} timed out");
}
// Submit and collect
$urls = ['https://example.com/p1', 'https://example.com/p2'];
$jobs = array_map('submitJob', $urls);
$results = array_map('pollJob', $jobs);
For supported platforms, use structured endpoints instead of raw HTML — they return clean JSON without parsing logic.
function structuredGet(string $vertical, array $params = []): array
{
global $apiKey;
$query = http_build_query(array_merge(['api_key' => $apiKey], $params));
$url = "https://api.scraperapi.com/structured/{$vertical}?{$query}";
$body = file_get_contents($url);
if ($body === false) throw new \RuntimeException("Request failed for {$vertical}");
return json_decode($body, true);
}
// Google SERP
$results = structuredGet('google/search', ['query' => 'PHP web scraping']);
// Amazon product details
$product = structuredGet('amazon/product', ['asin' => 'B09V3KXJPB']);
// Walmart search
$items = structuredGet('walmart/search', ['query' => 'standing desk', 'tld' => 'com']);
See structured data docs for all verticals and required fields.
function safeScrape(Client $client, string $url, array $params = []): ?string
{
try {
return $client->get($url, $params)->raw_body;
} catch (\Exception $e) {
$status = method_exists($e, 'getCode') ? (int) $e->getCode() : 0;
switch ($status) {
case 401: throw new \RuntimeException('Invalid API key — check SCRAPERAPI_API_KEY');
case 403: throw new \RuntimeException('Blocked or out of credits — try premium or ultra_premium');
case 429: throw new \RuntimeException('Rate limit — reduce concurrency or switch to async');
case 500:
case 503: throw new \RuntimeException('Transient error — retry with exponential backoff');
default: throw $e;
}
}
}
Status code reference: 200 success, 401 bad key, 403 blocked/no credits, 404 target not found, 429 rate limit, 500/503 transient (not charged — safe to retry).
Also see retry docs.
| Request type | Credits |
|---|---|
| Standard | 1 |
| "render" => true | 10 |
| "premium" => true | 10 |
| "premium" => true, "render" => true | 25 |
| "ultra_premium" => true | 30 |
| "ultra_premium" => true, "render" => true | 75 |
Add "max_cost" => N to any request to cap credit spend — returns 403 if the request would cost more than N credits.
development
SERP landscape analysis for SEO strategy decisions. Use this skill when the user wants to understand what a search results page actually looks like for their target keywords — including AI Overview presence and attribution, SERP feature composition, how Google is interpreting query intent, which competitors dominate specific keyword sets, and where organic rankings actually translate to visible traffic. Trigger on requests like "analyze the SERP for [keyword]," "why isn't my content getting traffic even though it ranks," "what does Google show for [keyword]," "which keywords are worth targeting," "is [keyword] dominated by AI Overviews," "who owns the SERP for [topic]," "SERP analysis," "keyword landscape," or any request to understand what's happening on a search results page before making a content or SEO strategy decision.
tools
Run a comprehensive SEO audit using ScraperAPI's live SERP and scraping tools — no setup required. Use this skill whenever the user wants to: audit SEO for a website, understand why a page isn't ranking, check SEO health, analyze keyword rankings, compare against competitors in search results, find content gaps, review on-page signals (titles, meta, headings, schema), diagnose a traffic drop, check indexation, or get prioritized SEO recommendations. Also trigger when the user says things like "why am I not showing up on Google," "my traffic dropped," "how do I rank for X," "what's wrong with my SEO," "SEO check," or "SEO review." This skill works out of the box — it uses the ScraperAPI MCP tools already connected to this session, with no CLI or API key setup needed.
development
Build and implement web scrapers using ScraperAPI. Use this skill whenever the user asks to build, write, create, or implement a scraper, or wants runnable code that extracts data from a website. Trigger on: "build me a scraper for [website]", "write a scraper that fetches product pages from [ecommerce site]", "I need to scrape [data] from [website]", "create a script that extracts [fields] from [URL]", "help me scrape [website] — I need [fields]", "write code to scrape [website]", "make a script that scrapes [website]", "implement a scraper for [URL]". Guides architectural decisions (structured endpoint vs. raw HTML, JS rendering, proxy tier, sync vs. async batch), then generates a complete runnable Python or Node.js script with retry logic, error handling, pagination, and credit estimation.
development
Use this skill whenever the user wants to check, track, or be alerted about product prices on Amazon, Walmart, or via Google Shopping. Trigger on: "monitor the price of this Amazon product", "did the price drop on [Walmart URL]?", "track these ASINs", "compare today's prices to last week", "alert me if [product] goes below $X", "what's the current price of [product]?", "check my price watchlist", "scrape the price of [URL]", "is [product] cheaper anywhere else?". Accepts ASINs, Amazon/Walmart product URLs, or free-text product queries for Google Shopping. Reads an optional baseline JSON file to detect changes, fetches live prices via ScraperAPI's structured endpoints, and reports increases, decreases, restocks, and out-of-stock transitions in a structured change report. Use this skill even when the user does not say the word "monitor" — any one-shot or recurring price-check request belongs here.