hugging-face-skills/skills/brightdata-web-mcp/SKILL.md
Search the web, scrape websites, extract structured data from URLs, and automate browsers using Bright Data's Web MCP. Use when fetching live web content, bypassing blocks/CAPTCHAs, getting product data from Amazon/eBay, social media posts, or when standard requests fail.
npx skillsauth add patchy631/ai-engineering-hub brightdata-web-mcpInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for reliable web access in MCP-compatible agents. Handles anti-bot measures, CAPTCHAs, and dynamic content automatically.
Tool: search_engine
Input: { "query": "latest AI news", "engine": "google" }
Returns JSON for Google, Markdown for Bing/Yandex. Use cursor parameter for pagination.
Tool: scrape_as_markdown
Input: { "url": "https://example.com/article" }
Tool: extract
Input: {
"url": "https://example.com/product",
"prompt": "Extract: name, price, description, availability"
}
| Scenario | Tool | Mode |
|----------|------|------|
| Web search results | search_engine | Rapid (Free) |
| Clean page content | scrape_as_markdown | Rapid (Free) |
| Parallel searches (up to 10) | search_engine_batch | Pro/advanced_scraping |
| Multiple URLs at once | scrape_batch | Pro/advanced_scraping |
| HTML structure needed | scrape_as_html | Pro/advanced_scraping |
| AI JSON extraction | extract | Pro/advanced_scraping |
| Dynamic/JS-heavy sites | scraping_browser_* | Pro/browser |
| Amazon/LinkedIn/social data | web_data_* | Pro |
Remote (recommended) - No installation required:
SSE Endpoint:
https://mcp.brightdata.com/sse?token=YOUR_API_TOKEN
Streamable HTTP Endpoint:
https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN
Local:
API_TOKEN=<token> npx @brightdata/mcp
search_engine, scrape_as_markdown&pro=1 to URLPRO_MODE=trueSelect specific tool bundles instead of all Pro tools:
&groups=ecommerce,socialGROUPS=ecommerce,social| Group | Description | Featured Tools |
|-------|-------------|----------------|
| ecommerce | Retail & marketplace data | web_data_amazon_product, web_data_walmart_product |
| social | Social media insights | web_data_linkedin_posts, web_data_instagram_profiles |
| browser | Browser automation | scraping_browser_* |
| business | Company intelligence | web_data_crunchbase_company, web_data_zoominfo_company_profile |
| finance | Financial data | web_data_yahoo_finance_business |
| research | News & dev data | web_data_github_repository_file, web_data_reuter_news |
| app_stores | App store data | web_data_google_play_store, web_data_apple_app_store |
| travel | Travel information | web_data_booking_hotel_listings |
| advanced_scraping | Batch & AI extraction | scrape_batch, extract, search_engine_batch |
Cherry-pick individual tools:
&tools=scrape_as_markdown,web_data_linkedin_person_profileTOOLS=scrape_as_markdown,web_data_linkedin_person_profileNote:
GROUPSorTOOLSoverridePRO_MODEwhen specified.
search_engine - Google/Bing/Yandex SERP results (JSON for Google, Markdown for others)scrape_as_markdown - Clean Markdown from any URL with anti-bot bypasssearch_engine_batch - Up to 10 parallel searchesscrape_batch - Up to 10 URLs in one requestscrape_as_html - Full HTML responseextract - AI-powered JSON extraction with custom promptsession_stats - Monitor tool usage during sessionFor JavaScript-rendered content or user interactions:
| Tool | Description |
|------|-------------|
| scraping_browser_navigate | Open URL in browser session |
| scraping_browser_go_back | Navigate back |
| scraping_browser_go_forward | Navigate forward |
| scraping_browser_snapshot | Get ARIA snapshot with element refs |
| scraping_browser_click_ref | Click element by ref |
| scraping_browser_type_ref | Type into input (optional submit) |
| scraping_browser_screenshot | Capture page image |
| scraping_browser_wait_for_ref | Wait for element visibility |
| scraping_browser_scroll | Scroll to bottom |
| scraping_browser_scroll_to_ref | Scroll element into view |
| scraping_browser_get_text | Get page text content |
| scraping_browser_get_html | Get full HTML |
| scraping_browser_network_requests | List network requests |
Pre-built extractors for popular platforms:
E-commerce:
web_data_amazon_product, web_data_amazon_product_reviews, web_data_amazon_product_searchweb_data_walmart_product, web_data_walmart_sellerweb_data_ebay_product, web_data_google_shoppingweb_data_homedepot_products, web_data_bestbuy_products, web_data_etsy_products, web_data_zara_productsSocial Media:
web_data_linkedin_person_profile, web_data_linkedin_company_profile, web_data_linkedin_job_listings, web_data_linkedin_posts, web_data_linkedin_people_searchweb_data_instagram_profiles, web_data_instagram_posts, web_data_instagram_reels, web_data_instagram_commentsweb_data_facebook_posts, web_data_facebook_marketplace_listings, web_data_facebook_company_reviews, web_data_facebook_eventsweb_data_tiktok_profiles, web_data_tiktok_posts, web_data_tiktok_shop, web_data_tiktok_commentsweb_data_x_postsweb_data_youtube_videos, web_data_youtube_profiles, web_data_youtube_commentsweb_data_reddit_postsBusiness & Finance:
web_data_google_maps_reviews, web_data_crunchbase_company, web_data_zoominfo_company_profileweb_data_zillow_properties_listing, web_data_yahoo_finance_businessOther:
web_data_github_repository_file, web_data_reuter_newsweb_data_google_play_store, web_data_apple_app_storeweb_data_booking_hotel_listingssearch_engine to find relevant URLsscrape_as_markdown to get contentextract for structured JSON (if needed)web_data_amazon_product for structured product dataweb_data_amazon_product_reviews for review analysisweb_data_* tools for structured extractionscrape_as_markdown + extractscraping_browser_navigate → open URLscraping_browser_snapshot → get element refsscraping_browser_click_ref / scraping_browser_type_ref → interactscraping_browser_screenshot → capture results| Variable | Description | Default |
|----------|-------------|---------|
| API_TOKEN | Bright Data API token (required) | - |
| PRO_MODE | Enable all Pro tools | false |
| GROUPS | Comma-separated tool groups | - |
| TOOLS | Comma-separated individual tools | - |
| RATE_LIMIT | Request rate limit | 100/1h |
| WEB_UNLOCKER_ZONE | Custom zone for scraping | mcp_unlocker |
| BROWSER_ZONE | Custom zone for browser | mcp_browser |
web_data_* tools when available (faster, more reliable)scrape_as_markdown + extract for unsupported sitesscrape_batch, search_engine_batch)session_statsUse full Node.js path instead of npx:
"command": "/usr/local/bin/node",
"args": ["node_modules/@brightdata/mcp/index.js"]
web_data_* tools (often faster)For detailed documentation, see:
tools
Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API) or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, HF Space syncing, and JSON output for automation.
tools
Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.
data-ai
Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.
development
This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.