agency-workflow/protected-site-scraper/SKILL.md
Scrape websites protected by Cloudflare, anti-bot systems, and other WAF protections using undetected-chromedriver. Includes support for complex extraction, CAPTCHA handling, and data export. Works with sahibinden.com and similar protected real estate/classified sites.
npx skillsauth add gbalabanli/agent-skills protected-site-scraperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides tools to scrape websites that use anti-bot protection (Cloudflare, DataDome, PerimeterX, etc.) using undetected-chromedriver - a specialized Selenium wrapper designed to bypass detection.
Perfect for:
Use this skill when:
# Install dependencies
pip install undetected-chromedriver selenium
# Run the sahibinden example
python examples/sahibinden_rentals.py --location istanbul-maltepe-altintepe
from protected_site_scraper import ProtectedSiteScraper
scraper = ProtectedSiteScraper()
driver = scraper.setup_driver()
# Navigate and scrape
driver.get("https://example.com/listings")
listings = scraper.extract_listings(driver)
scraper.save_results(listings, "output.json")
driver.quit()
from protected_site_scraper import ProtectedSiteScraper
from selenium.webdriver.common.by import By
scraper = ProtectedSiteScraper()
driver = scraper.setup_driver()
driver.get("https://example.com")
# Wait for Cloudflare
scraper.wait_for_protection_to_clear(driver)
# Custom extraction
elements = driver.find_elements(By.CSS_SELECTOR, ".listing")
data = []
for elem in elements:
data.append({
'title': elem.find_element(By.CSS_SELECTOR, '.title').text,
'price': elem.find_element(By.CSS_SELECTOR, '.price').text,
})
scraper.save_results(data, "output.json")
driver.quit()
from examples.sahibinden_rentals import SahibindenScraper
scraper = SahibindenScraper()
listings = scraper.scrape_rentals(
city="istanbul",
district="maltepe",
neighborhood="altintepe"
)
print(f"Found {len(listings)} listings")
scraper.save_to_csv(listings, "rentals.csv")
Location: protected_site_scraper/core.py
Key Methods:
setup_driver(headless=False) - Initialize undetected Chromewait_for_protection_to_clear(driver, timeout=30) - Wait for Cloudflare/anti-botsafe_find_elements(driver, selectors) - Try multiple selectorsextract_with_fallback(element, selectors) - Extract text with fallbacksave_results(data, filename) - Save to JSON/CSVsafe_print(text) - Handle Turkish/special characters in consoleLocation: examples/
sahibinden_rentals.py - Complete working example for sahibinden.comtemplate_scraper.py - Template for new sitesLocation: prompts/
find_rentals.md - Prompt to find rentals on sahibindenscrape_generic.md - Generic scraping prompt template# Optional: Set Chrome binary path if not default
CHROME_BINARY_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe"
# Optional: Set specific Chrome version
CHROME_VERSION="120"
from protected_site_scraper import ProtectedSiteScraper
scraper = ProtectedSiteScraper()
# Custom options
driver = scraper.setup_driver(
headless=False, # Show browser window
window_size=(1920, 1080),
user_agent="Custom User Agent",
disable_images=True, # Faster loading
proxy="http://proxy:8080" # Use proxy
)
def scrape_all_pages(scraper, driver, base_url):
all_listings = []
page = 1
while True:
url = f"{base_url}?page={page}"
driver.get(url)
scraper.wait_for_protection_to_clear(driver)
listings = scraper.extract_listings(driver)
if not listings:
break
all_listings.extend(listings)
page += 1
# Check for next button
try:
next_btn = driver.find_element(By.CSS_SELECTOR, ".next")
if not next_btn.is_enabled():
break
except:
break
return all_listings
def extract_custom_fields(element):
"""Define custom extraction logic"""
fields = {}
# Try multiple selectors for each field
field_selectors = {
'title': ['.title', 'h1', 'h2', '.listing-title'],
'price': ['.price', '.cost', '[data-price]'],
'location': ['.location', '.address', '.place'],
'rooms': ['.rooms', '.oda', '[data-rooms]'],
'area': ['.area', '.square-meter', '[data-area]'],
}
for field, selectors in field_selectors.items():
fields[field] = scraper.extract_with_fallback(element, selectors)
return fields
from selenium.common.exceptions import TimeoutException, WebDriverException
def scrape_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
driver.get(url)
scraper.wait_for_protection_to_clear(driver, timeout=30)
return scraper.extract_listings(driver)
except TimeoutException:
print(f"Attempt {attempt + 1} failed, retrying...")
time.sleep(5)
except WebDriverException as e:
print(f"Browser error: {e}")
# Restart driver
driver.quit()
driver = scraper.setup_driver()
return []
time.sleep() between actions# Chrome version mismatch - specify version
driver = uc.Chrome(options=options, version_main=120)
# Increase wait time
scraper.wait_for_protection_to_clear(driver, timeout=60)
# Or check for specific elements
WebDriverWait(driver, 60).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".content"))
)
# Try multiple selectors
selectors = [
"#searchResultsTable",
".listings",
"[data-listings]"
]
for selector in selectors:
elements = driver.find_elements(By.CSS_SELECTOR, selector)
if elements:
break
python examples/sahibinden_rentals.py \
--city istanbul \
--district maltepe \
--neighborhood altintepe \
--output rentals.json
# examples/template_scraper.py
from protected_site_scraper import ProtectedSiteScraper
class MySiteScraper(ProtectedSiteScraper):
def scrape(self, url):
driver = self.setup_driver()
driver.get(url)
self.wait_for_protection_to_clear(driver)
# Your extraction logic here
data = self.extract_listings(driver)
driver.quit()
return data
if __name__ == "__main__":
scraper = MySiteScraper()
results = scraper.scrape("https://example.com/listings")
scraper.save_results(results, "output.json")
undetected-chromedriver>=3.5.0
selenium>=4.15.0
webdriver-manager>=4.0.0
MIT - Use responsibly and comply with website terms of service.
data-ai
Remove AI writing patterns from prose. Use when drafting, editing, or reviewing text to eliminate predictable AI tells.
testing
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
development
Search Unsplash for images related to a natural-language user prompt and download a selected result to disk using the official API and tracked download endpoint. Use when Codex needs stock photos from Unsplash for articles, slide decks, social posts, mockups, inspiration boards, or any prompt-to-image download workflow.
development
Build Stitch-based walkthrough videos and presentation-ready screen sequences with Remotion, including asset staging, screen manifests, modular compositions, and render validation. Use when Codex must turn Stitch project screens or exported screenshots into a demo video, promo walkthrough, app tour, or other screen-driven visual artifact.