Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

gbalabanli/protected-site-scraper

Name: protected-site-scraper
Author: gbalabanli

agency-workflow/protected-site-scraper/SKILL.md

npx skillsauth add gbalabanli/agent-skills protected-site-scraper

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Protected Site Scraper

Overview

This skill provides tools to scrape websites that use anti-bot protection (Cloudflare, DataDome, PerimeterX, etc.) using undetected-chromedriver - a specialized Selenium wrapper designed to bypass detection.

Perfect for:

Real estate sites (sahibinden.com, emlakjet, etc.)
Classified listings
E-commerce with bot protection
Any site showing "Just a moment..." or CAPTCHA challenges

When to Use

Use this skill when:

Standard requests/httpx fail with 403 errors
The site shows Cloudflare challenge pages
Selenium without undetected-chromedriver gets blocked
You need to scrape JavaScript-heavy protected sites

Quick Start

# Install dependencies
pip install undetected-chromedriver selenium

# Run the sahibinden example
python examples/sahibinden_rentals.py --location istanbul-maltepe-altintepe

Usage Patterns

Pattern 1: Simple Protected Site Scraper

from protected_site_scraper import ProtectedSiteScraper

scraper = ProtectedSiteScraper()
driver = scraper.setup_driver()

# Navigate and scrape
driver.get("https://example.com/listings")
listings = scraper.extract_listings(driver)

scraper.save_results(listings, "output.json")
driver.quit()

Pattern 2: Custom Extraction

from protected_site_scraper import ProtectedSiteScraper
from selenium.webdriver.common.by import By

scraper = ProtectedSiteScraper()
driver = scraper.setup_driver()

driver.get("https://example.com")

# Wait for Cloudflare
scraper.wait_for_protection_to_clear(driver)

# Custom extraction
elements = driver.find_elements(By.CSS_SELECTOR, ".listing")
data = []
for elem in elements:
    data.append({
        'title': elem.find_element(By.CSS_SELECTOR, '.title').text,
        'price': elem.find_element(By.CSS_SELECTOR, '.price').text,
    })

scraper.save_results(data, "output.json")
driver.quit()

Pattern 3: Sahibinden.com Specific

from examples.sahibinden_rentals import SahibindenScraper

scraper = SahibindenScraper()
listings = scraper.scrape_rentals(
    city="istanbul",
    district="maltepe", 
    neighborhood="altintepe"
)

print(f"Found {len(listings)} listings")
scraper.save_to_csv(listings, "rentals.csv")

Core Components

1. ProtectedSiteScraper (Base Class)

Location: protected_site_scraper/core.py

Key Methods:

setup_driver(headless=False) - Initialize undetected Chrome
wait_for_protection_to_clear(driver, timeout=30) - Wait for Cloudflare/anti-bot
safe_find_elements(driver, selectors) - Try multiple selectors
extract_with_fallback(element, selectors) - Extract text with fallback
save_results(data, filename) - Save to JSON/CSV
safe_print(text) - Handle Turkish/special characters in console

2. Site-Specific Scrapers

Location: examples/

sahibinden_rentals.py - Complete working example for sahibinden.com
template_scraper.py - Template for new sites

3. Prompts

Location: prompts/

find_rentals.md - Prompt to find rentals on sahibinden
scrape_generic.md - Generic scraping prompt template

Configuration

Environment Variables

# Optional: Set Chrome binary path if not default
CHROME_BINARY_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe"

# Optional: Set specific Chrome version
CHROME_VERSION="120"

Driver Options

from protected_site_scraper import ProtectedSiteScraper

scraper = ProtectedSiteScraper()

# Custom options
driver = scraper.setup_driver(
    headless=False,  # Show browser window
    window_size=(1920, 1080),
    user_agent="Custom User Agent",
    disable_images=True,  # Faster loading
    proxy="http://proxy:8080"  # Use proxy
)

Advanced Features

Handling Pagination

def scrape_all_pages(scraper, driver, base_url):
    all_listings = []
    page = 1
    
    while True:
        url = f"{base_url}?page={page}"
        driver.get(url)
        scraper.wait_for_protection_to_clear(driver)
        
        listings = scraper.extract_listings(driver)
        if not listings:
            break
            
        all_listings.extend(listings)
        page += 1
        
        # Check for next button
        try:
            next_btn = driver.find_element(By.CSS_SELECTOR, ".next")
            if not next_btn.is_enabled():
                break
        except:
            break
    
    return all_listings

Custom Field Extraction

def extract_custom_fields(element):
    """Define custom extraction logic"""
    fields = {}
    
    # Try multiple selectors for each field
    field_selectors = {
        'title': ['.title', 'h1', 'h2', '.listing-title'],
        'price': ['.price', '.cost', '[data-price]'],
        'location': ['.location', '.address', '.place'],
        'rooms': ['.rooms', '.oda', '[data-rooms]'],
        'area': ['.area', '.square-meter', '[data-area]'],
    }
    
    for field, selectors in field_selectors.items():
        fields[field] = scraper.extract_with_fallback(element, selectors)
    
    return fields

Error Recovery

from selenium.common.exceptions import TimeoutException, WebDriverException

def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            driver.get(url)
            scraper.wait_for_protection_to_clear(driver, timeout=30)
            return scraper.extract_listings(driver)
        except TimeoutException:
            print(f"Attempt {attempt + 1} failed, retrying...")
            time.sleep(5)
        except WebDriverException as e:
            print(f"Browser error: {e}")
            # Restart driver
            driver.quit()
            driver = scraper.setup_driver()
    
    return []

Anti-Detection Tips

Don't use headless mode - It triggers more detection
Add realistic delays - Use time.sleep() between actions
Use realistic User-Agent - The scraper handles this automatically
Avoid rapid requests - Add delays between page loads
Rotate proxies - For high-volume scraping
Handle cookies - Accept or manage cookies appropriately

Troubleshooting

"Session not created" Error

# Chrome version mismatch - specify version
driver = uc.Chrome(options=options, version_main=120)

Cloudflare Challenge Not Passing

# Increase wait time
scraper.wait_for_protection_to_clear(driver, timeout=60)

# Or check for specific elements
WebDriverWait(driver, 60).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, ".content"))
)

Elements Not Found

# Try multiple selectors
selectors = [
    "#searchResultsTable",
    ".listings",
    "[data-listings]"
]

for selector in selectors:
    elements = driver.find_elements(By.CSS_SELECTOR, selector)
    if elements:
        break

Examples

Sahibinden.com - Rental Listings

python examples/sahibinden_rentals.py \
    --city istanbul \
    --district maltepe \
    --neighborhood altintepe \
    --output rentals.json

Generic Site Template

# examples/template_scraper.py
from protected_site_scraper import ProtectedSiteScraper

class MySiteScraper(ProtectedSiteScraper):
    def scrape(self, url):
        driver = self.setup_driver()
        driver.get(url)
        
        self.wait_for_protection_to_clear(driver)
        
        # Your extraction logic here
        data = self.extract_listings(driver)
        
        driver.quit()
        return data

if __name__ == "__main__":
    scraper = MySiteScraper()
    results = scraper.scrape("https://example.com/listings")
    scraper.save_results(results, "output.json")

Best Practices

Always quit driver - Use try/finally or context managers
Save incrementally - Don't lose data on crashes
Respect robots.txt - Check if scraping is allowed
Rate limiting - Don't overwhelm servers
Legal compliance - Check website terms of service

Dependencies

undetected-chromedriver>=3.5.0
selenium>=4.15.0
webdriver-manager>=4.0.0

License

MIT - Use responsibly and comply with website terms of service.

gbalabanli/protected-site-scraper

agency-workflow/protected-site-scraper/SKILL.md

Scrape websites protected by Cloudflare, anti-bot systems, and other WAF protections using undetected-chromedriver. Includes support for complex extraction, CAPTCHA handling, and data export. Works with sahibinden.com and similar protected real estate/classified sites.

2 stars

development

Updated Apr 5, 2026

$ install --global

skillsauth

npx skillsauth add gbalabanli/agent-skills protected-site-scraper

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 7:51 AM4.5s1 file scanned

SKILL.md

name:: protected-site-scraper
description:: Scrape websites protected by Cloudflare, anti-bot systems, and other WAF protections using undetected-chromedriver. Includes support for complex extraction, CAPTCHA handling, and data export. Works with sahibinden.com and similar protected real estate/classified sites.

Protected Site Scraper

Overview

Perfect for:

Real estate sites (sahibinden.com, emlakjet, etc.)
Classified listings
E-commerce with bot protection
Any site showing "Just a moment..." or CAPTCHA challenges

When to Use

Use this skill when:

Standard requests/httpx fail with 403 errors
The site shows Cloudflare challenge pages
Selenium without undetected-chromedriver gets blocked
You need to scrape JavaScript-heavy protected sites

Quick Start

# Install dependencies
pip install undetected-chromedriver selenium

# Run the sahibinden example
python examples/sahibinden_rentals.py --location istanbul-maltepe-altintepe

Usage Patterns

Pattern 1: Simple Protected Site Scraper

from protected_site_scraper import ProtectedSiteScraper

scraper = ProtectedSiteScraper()
driver = scraper.setup_driver()

# Navigate and scrape
driver.get("https://example.com/listings")
listings = scraper.extract_listings(driver)

scraper.save_results(listings, "output.json")
driver.quit()

Pattern 2: Custom Extraction

from protected_site_scraper import ProtectedSiteScraper
from selenium.webdriver.common.by import By

scraper = ProtectedSiteScraper()
driver = scraper.setup_driver()

driver.get("https://example.com")

# Wait for Cloudflare
scraper.wait_for_protection_to_clear(driver)

# Custom extraction
elements = driver.find_elements(By.CSS_SELECTOR, ".listing")
data = []
for elem in elements:
    data.append({
        'title': elem.find_element(By.CSS_SELECTOR, '.title').text,
        'price': elem.find_element(By.CSS_SELECTOR, '.price').text,
    })

scraper.save_results(data, "output.json")
driver.quit()

Pattern 3: Sahibinden.com Specific

from examples.sahibinden_rentals import SahibindenScraper

scraper = SahibindenScraper()
listings = scraper.scrape_rentals(
    city="istanbul",
    district="maltepe", 
    neighborhood="altintepe"
)

print(f"Found {len(listings)} listings")
scraper.save_to_csv(listings, "rentals.csv")

Core Components

1. ProtectedSiteScraper (Base Class)

Location: protected_site_scraper/core.py

Key Methods:

setup_driver(headless=False) - Initialize undetected Chrome
wait_for_protection_to_clear(driver, timeout=30) - Wait for Cloudflare/anti-bot
safe_find_elements(driver, selectors) - Try multiple selectors
extract_with_fallback(element, selectors) - Extract text with fallback
save_results(data, filename) - Save to JSON/CSV
safe_print(text) - Handle Turkish/special characters in console

2. Site-Specific Scrapers

Location: examples/

sahibinden_rentals.py - Complete working example for sahibinden.com
template_scraper.py - Template for new sites

3. Prompts

Location: prompts/

find_rentals.md - Prompt to find rentals on sahibinden
scrape_generic.md - Generic scraping prompt template

Configuration

Environment Variables

# Optional: Set Chrome binary path if not default
CHROME_BINARY_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe"

# Optional: Set specific Chrome version
CHROME_VERSION="120"

Driver Options

from protected_site_scraper import ProtectedSiteScraper

scraper = ProtectedSiteScraper()

# Custom options
driver = scraper.setup_driver(
    headless=False,  # Show browser window
    window_size=(1920, 1080),
    user_agent="Custom User Agent",
    disable_images=True,  # Faster loading
    proxy="http://proxy:8080"  # Use proxy
)

Advanced Features

Handling Pagination

def scrape_all_pages(scraper, driver, base_url):
    all_listings = []
    page = 1
    
    while True:
        url = f"{base_url}?page={page}"
        driver.get(url)
        scraper.wait_for_protection_to_clear(driver)
        
        listings = scraper.extract_listings(driver)
        if not listings:
            break
            
        all_listings.extend(listings)
        page += 1
        
        # Check for next button
        try:
            next_btn = driver.find_element(By.CSS_SELECTOR, ".next")
            if not next_btn.is_enabled():
                break
        except:
            break
    
    return all_listings

Custom Field Extraction

def extract_custom_fields(element):
    """Define custom extraction logic"""
    fields = {}
    
    # Try multiple selectors for each field
    field_selectors = {
        'title': ['.title', 'h1', 'h2', '.listing-title'],
        'price': ['.price', '.cost', '[data-price]'],
        'location': ['.location', '.address', '.place'],
        'rooms': ['.rooms', '.oda', '[data-rooms]'],
        'area': ['.area', '.square-meter', '[data-area]'],
    }
    
    for field, selectors in field_selectors.items():
        fields[field] = scraper.extract_with_fallback(element, selectors)
    
    return fields

Error Recovery

from selenium.common.exceptions import TimeoutException, WebDriverException

def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            driver.get(url)
            scraper.wait_for_protection_to_clear(driver, timeout=30)
            return scraper.extract_listings(driver)
        except TimeoutException:
            print(f"Attempt {attempt + 1} failed, retrying...")
            time.sleep(5)
        except WebDriverException as e:
            print(f"Browser error: {e}")
            # Restart driver
            driver.quit()
            driver = scraper.setup_driver()
    
    return []

Anti-Detection Tips

Don't use headless mode - It triggers more detection
Add realistic delays - Use time.sleep() between actions
Use realistic User-Agent - The scraper handles this automatically
Avoid rapid requests - Add delays between page loads
Rotate proxies - For high-volume scraping
Handle cookies - Accept or manage cookies appropriately

Troubleshooting

"Session not created" Error

# Chrome version mismatch - specify version
driver = uc.Chrome(options=options, version_main=120)

Cloudflare Challenge Not Passing

# Increase wait time
scraper.wait_for_protection_to_clear(driver, timeout=60)

# Or check for specific elements
WebDriverWait(driver, 60).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, ".content"))
)

Elements Not Found

# Try multiple selectors
selectors = [
    "#searchResultsTable",
    ".listings",
    "[data-listings]"
]

for selector in selectors:
    elements = driver.find_elements(By.CSS_SELECTOR, selector)
    if elements:
        break

Examples

Sahibinden.com - Rental Listings

python examples/sahibinden_rentals.py \
    --city istanbul \
    --district maltepe \
    --neighborhood altintepe \
    --output rentals.json

Generic Site Template

# examples/template_scraper.py
from protected_site_scraper import ProtectedSiteScraper

class MySiteScraper(ProtectedSiteScraper):
    def scrape(self, url):
        driver = self.setup_driver()
        driver.get(url)
        
        self.wait_for_protection_to_clear(driver)
        
        # Your extraction logic here
        data = self.extract_listings(driver)
        
        driver.quit()
        return data

if __name__ == "__main__":
    scraper = MySiteScraper()
    results = scraper.scrape("https://example.com/listings")
    scraper.save_results(results, "output.json")

Best Practices

Always quit driver - Use try/finally or context managers
Save incrementally - Don't lose data on crashes
Respect robots.txt - Check if scraping is allowed
Rate limiting - Don't overwhelm servers
Legal compliance - Check website terms of service

Dependencies

undetected-chromedriver>=3.5.0
selenium>=4.15.0
webdriver-manager>=4.0.0

License

MIT - Use responsibly and comply with website terms of service.

Related Skills

gbalabanli/stop-slop

data-ai

VerifiedTrustedCommunity

Remove AI writing patterns from prose. Use when drafting, editing, or reviewing text to eliminate predictable AI tells.

2SKILL.mdUpdated Apr 5, 2026

gbalabanli/humanizer

testing

VerifiedTrustedCommunity

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

2SKILL.mdUpdated Apr 5, 2026

gbalabanli/unsplash-image-search-download

development

VerifiedTrustedCommunity

Search Unsplash for images related to a natural-language user prompt and download a selected result to disk using the official API and tracked download endpoint. Use when Codex needs stock photos from Unsplash for articles, slide decks, social posts, mockups, inspiration boards, or any prompt-to-image download workflow.

2SKILL.mdUpdated Apr 5, 2026

gbalabanli/unsplash-image-search-download

gbalabanli/stitch-remotion-walkthrough

development

VerifiedTrustedCommunity

Build Stitch-based walkthrough videos and presentation-ready screen sequences with Remotion, including asset staging, screen manifests, modular compositions, and render validation. Use when Codex must turn Stitch project screens or exported screenshots into a demo video, promo walkthrough, app tour, or other screen-driven visual artifact.

2SKILL.mdUpdated Apr 5, 2026

gbalabanli/stitch-remotion-walkthrough

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/gbalabanli/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/agency-workflow/protected-site-scraper ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

gbalabanli/agent-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT