cli-tool/components/skills/web-data/bright-data-best-practices/SKILL.md
Build production-ready Bright Data integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web scraping, search, browser automation, and structured data extraction. Covers Web Unlocker API, SERP API, Web Scraper API, and Browser API (Scraping Browser).
npx skillsauth add davila7/claude-code-templates bright-data-best-practicesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.
| Use Case | API | Why | |----------|-----|-----| | Scrape any webpage by URL (no interaction) | Web Unlocker | HTTP-based, auto-bypasses bot detection, cheapest | | Google / Bing / Yandex search results | SERP API | Specialized for SERP extraction, returns structured data | | Structured data from Amazon, LinkedIn, Instagram, TikTok, etc. | Web Scraper API | Pre-built scrapers, no parsing needed | | Click, scroll, fill forms, run JS, intercept XHR | Browser API | Full browser automation | | Puppeteer / Playwright / Selenium automation | Browser API | Connects via CDP/WebDriver |
All APIs share the same authentication model:
export BRIGHTDATA_API_KEY="your-api-key" # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name" # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name" # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD" # Browser API credentials
REST API authentication header for Web Unlocker and SERP API:
Authorization: Bearer YOUR_API_KEY
HTTP-based scraping proxy. Best for simple page fetches without browser interaction.
Endpoint: POST https://api.brightdata.com/request
import requests
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_ZONE_NAME",
"url": "https://example.com/product/123",
"format": "raw"
}
)
html = response.text
| Parameter | Type | Description |
|-----------|------|-------------|
| zone | string | Zone name (required) |
| url | string | Target URL with http:// or https:// (required) |
| format | string | "raw" (HTML) or "json" (structured wrapper) (required) |
| method | string | HTTP verb, default "GET" |
| country | string | 2-letter ISO for geo-targeting (e.g., "us", "de") |
| data_format | string | Transform: "markdown" or "screenshot" |
| async | boolean | true for async mode |
# Get markdown (best for LLM input)
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
)
# Geo-targeted request
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}
# Screenshot for debugging
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}
# Async for bulk processing
json={"zone": ZONE, "url": url, "format": "raw", "async": True}
Critical rule: Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.
See references/web-unlocker.md for complete reference including proxy interface, special headers, async flow, features, and billing.
Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.
Endpoint: POST https://api.brightdata.com/request (same as Web Unlocker)
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_SERP_ZONE",
"url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
"format": "raw"
}
)
data = response.json()
for result in data.get("organic", []):
print(result["rank"], result["title"], result["link"])
| Parameter | Description | Example |
|-----------|-------------|---------|
| q | Search query | q=python+web+scraping |
| brd_json | Parsed JSON output | brd_json=1 (always use for data pipelines) |
| gl | Country for search | gl=us |
| hl | Language | hl=en |
| start | Pagination offset | start=10 (page 2), start=20 (page 3) |
| tbm | Search type | tbm=nws (news), tbm=isch (images), tbm=vid (videos) |
| brd_mobile | Device | brd_mobile=1 (mobile), brd_mobile=ios |
| brd_browser | Browser | brd_browser=chrome |
| brd_ai_overview | Trigger AI Overview | brd_ai_overview=2 |
| uule | Encoded geo location | for precise location targeting |
Note: num parameter is deprecated as of September 2025. Use start for pagination.
{
"organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
"paid": [],
"people_also_ask": [],
"knowledge_graph": {},
"related_searches": [],
"general": {"results_cnt": 1240000000, "query": "..."}
}
| Parameter | Description |
|-----------|-------------|
| q | Search query |
| setLang | Language (prefer 4-letter: en-US) |
| cc | Country code |
| first | Pagination (increment by 10: 1, 11, 21...) |
| safesearch | off, moderate, strict |
| brd_mobile | Device type |
# Submit
response = requests.post(
"https://api.brightdata.com/request",
params={"async": "1"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
)
response_id = response.headers.get("x-response-id")
# Retrieve (retrieve calls are NOT billed)
result = requests.get(
"https://api.brightdata.com/serp/get_result",
params={"response_id": response_id},
headers={"Authorization": f"Bearer {API_KEY}"}
)
Billing: Pay per 1,000 successful requests only. Async retrieve calls are not billed.
See references/serp-api.md for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.
Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.
Sync Endpoint: POST https://api.brightdata.com/datasets/v3/scrape
Async Endpoint: POST https://api.brightdata.com/datasets/v3/trigger
# Sync (up to 20 URLs, returns immediately)
response = requests.post(
"https://api.brightdata.com/datasets/v3/scrape",
params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
)
if response.status_code == 200:
data = response.json() # Results ready
elif response.status_code == 202:
snapshot_id = response.json()["snapshot_id"] # Poll for completion
| Parameter | Type | Description |
|-----------|------|-------------|
| dataset_id | string | Scraper identifier from the Scraper Library (required) |
| format | string | json (default), ndjson, jsonl, csv |
| custom_output_fields | string | Pipe-separated fields: url\|title\|price |
| include_errors | boolean | Include error info in results |
{
"input": [
{ "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
{ "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
]
}
import time
# Trigger
snapshot_id = requests.post(
"https://api.brightdata.com/datasets/v3/trigger",
params={"dataset_id": DATASET_ID, "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": u} for u in urls]}
).json()["snapshot_id"]
# Poll
while True:
status = requests.get(
f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
).json()["status"]
if status == "ready": break
if status == "failed": raise Exception("Job failed")
time.sleep(10)
# Download
data = requests.get(
f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
params={"format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"}
).json()
Progress status values: starting → running → ready | failed
Data retention: 30 days.
Billing: Per delivered record. Invalid input URLs that fail are still billable.
See references/web-scraper-api.md for complete reference including scraper types, output formats, delivery options, and billing details.
Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.
Connection:
wss://${AUTH}@brd.superproxy.io:9222https://${AUTH}@brd.superproxy.io:9515const { chromium } = require("playwright-core");
const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
page = await browser.new_page()
page.set_default_navigation_timeout(120000)
await page.goto("https://example.com", wait_until="domcontentloaded")
html = await page.content()
await browser.close()
| Function | Purpose |
|----------|---------|
| Captcha.solve | Manually trigger CAPTCHA solving |
| Captcha.setAutoSolve | Enable/disable auto CAPTCHA solving |
| Proxy.setLocation | Set precise geo location (call BEFORE goto) |
| Proxy.useSession | Maintain same IP across sessions |
| Emulation.setDevice | Apply device profile (iPhone 14, etc.) |
| Emulation.getSupportedDevices | List available device profiles |
| Unblocker.enableAdBlock | Block ads to save bandwidth |
| Unblocker.disableAdBlock | Re-enable ads |
| Input.type | Fast text input for bulk form filling |
| Browser.addCertificate | Install client SSL cert for session |
| Page.inspect | Get DevTools debug URL for live session |
// CDP session pattern for custom functions
const client = await page.target().createCDPSession();
// CAPTCHA solve with timeout
const result = await client.send("Captcha.solve", { timeout: 30000 });
// Precise geo location (must be before goto)
await client.send("Proxy.setLocation", {
latitude: 37.7749,
longitude: -122.4194,
distance: 10,
strict: true
});
// Block unnecessary resources
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });
// Device emulation
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });
-country-us to credentials username-country-eu (routes through 29+ European countries)Proxy.setLocation CDP command (before navigation)| Code | Issue | Fix |
|------|-------|-----|
| 407 | Wrong port | Playwright/Puppeteer → 9222, Selenium → 9515 |
| 403 | Bad auth | Check credentials format and zone type |
| 503 | Service scaling | Wait 1 minute, reconnect |
Billing: Traffic-based only. Block images/CSS/fonts to reduce costs.
See references/browser-api.md for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.
tools
No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power
tools
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
tools
Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility
development
Trigger.dev expert for background jobs, AI workflows, and reliable async execution with excellent developer experience and TypeScript-first design. Use when: trigger.dev, trigger dev, background task, ai background job, long running task.