.claude/skills/browser-use-integration/SKILL.md
Self-hosted AI browser automation using Browser Use with any LLM (Claude, GPT, Ollama). Use when building web scraping agents, data extraction pipelines, self-hosted automation, or when you need flexibility without API rate limits.
npx skillsauth add adaptationio/skrillz browser-use-integrationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Browser Use is an open-source AI browser automation framework that works with any LLM. Unlike cloud-dependent solutions, you can self-host for unlimited usage with local models.
Key Advantages:
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install Browser Use
pip install browser-use
# Install LLM provider (choose one)
pip install langchain-anthropic # For Claude
pip install langchain-openai # For GPT-4
pip install langchain-ollama # For local models
# For Claude
export ANTHROPIC_API_KEY=your_key_here
# For OpenAI
export OPENAI_API_KEY=your_key_here
# For Ollama (no key needed, just run Ollama locally)
ollama serve
# agent.py
import asyncio
from browser_use import Agent
from langchain_anthropic import ChatAnthropic
async def main():
agent = Agent(
task="Go to google.com and search for 'Browser Use AI automation'",
llm=ChatAnthropic(model="claude-sonnet-4-20250514"),
)
result = await agent.run()
print(result)
asyncio.run(main())
python agent.py
from langchain_anthropic import ChatAnthropic
# Claude Sonnet (best balance)
llm = ChatAnthropic(
model="claude-sonnet-4-20250514",
api_key=os.environ.get("ANTHROPIC_API_KEY"),
)
# Claude Opus (highest quality)
llm = ChatAnthropic(model="claude-opus-4-20250514")
# Claude Haiku (fastest, cheapest)
llm = ChatAnthropic(model="claude-3-5-haiku-20241022")
from langchain_openai import ChatOpenAI
# GPT-4o
llm = ChatOpenAI(
model="gpt-4o",
api_key=os.environ.get("OPENAI_API_KEY"),
)
# GPT-4 Turbo
llm = ChatOpenAI(model="gpt-4-turbo-preview")
# First, install and run Ollama
ollama serve
# Pull a model
ollama pull llama3.2
from langchain_ollama import ChatOllama
# Local Llama 3.2
llm = ChatOllama(
model="llama3.2",
base_url="http://localhost:11434",
)
# Local Mistral
llm = ChatOllama(model="mistral")
# Local Code Llama
llm = ChatOllama(model="codellama")
| LLM | Cost per 1M tokens | Best For | |-----|-------------------|----------| | Claude Haiku | ~$0.25 | Simple tasks | | Claude Sonnet | ~$3.00 | Complex tasks | | GPT-4o | ~$5.00 | General use | | Ollama | Free | Unlimited local |
agent = Agent(
task="Search for 'Python tutorials' on YouTube and get the top 5 video titles",
llm=llm,
)
result = await agent.run()
agent = Agent(
task="""
1. Go to amazon.com
2. Search for 'wireless mouse'
3. Filter by 4+ star rating
4. Extract the top 5 products with name, price, and rating
5. Return as JSON
""",
llm=llm,
)
result = await agent.run()
from pydantic import BaseModel
from typing import List
class Product(BaseModel):
name: str
price: float
rating: float
url: str
class ProductList(BaseModel):
products: List[Product]
agent = Agent(
task="Find the top 5 laptops on BestBuy under $1000",
llm=llm,
output_schema=ProductList, # Structured output
)
result = await agent.run()
# result.products is List[Product]
from browser_use import Agent, Browser
browser = Browser(
headless=False, # Show browser
proxy="http://proxy.example.com:8080", # Use proxy
)
agent = Agent(
task="Navigate to example.com",
llm=llm,
browser=browser,
)
import asyncio
from browser_use import Agent, AgentError
async def run_with_retry(task: str, max_retries: int = 3):
for attempt in range(max_retries):
try:
agent = Agent(task=task, llm=llm)
result = await agent.run()
return result
except AgentError as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt == max_retries - 1:
raise
await asyncio.sleep(2 ** attempt) # Exponential backoff
# Usage
result = await run_with_retry("Search Google for 'AI news'")
async def run_with_timeout(task: str, timeout: int = 60):
agent = Agent(task=task, llm=llm)
try:
result = await asyncio.wait_for(agent.run(), timeout=timeout)
return result
except asyncio.TimeoutError:
print(f"Task timed out after {timeout}s")
return None
# Dockerfile
FROM python:3.11-slim
# Install Chrome
RUN apt-get update && apt-get install -y \
wget gnupg \
&& wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list \
&& apt-get update \
&& apt-get install -y google-chrome-stable \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "agent.py"]
# requirements.txt
browser-use
langchain-anthropic
langchain-ollama
# docker-compose.yml
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
deploy:
resources:
reservations:
devices:
- capabilities: [gpu] # If GPU available
browser-agent:
build: .
environment:
- OLLAMA_HOST=http://ollama:11434
depends_on:
- ollama
volumes:
ollama-data:
# Build and run
docker-compose up -d
# View logs
docker-compose logs -f browser-agent
agent = Agent(
task="""
Go to news.ycombinator.com
Extract the top 30 stories with: title, points, comments, and URL
Return as JSON array
""",
llm=llm,
)
agent = Agent(
task="""
Go to example.com/contact
Fill the form:
- Name: John Doe
- Email: [email protected]
- Message: I'm interested in your services
Submit the form
""",
llm=llm,
)
agent = Agent(
task="""
Check the price of 'Sony WH-1000XM5' on:
1. Amazon
2. BestBuy
3. Walmart
Return prices from each site
""",
llm=llm,
)
agent = Agent(
task="""
Visit competitor.com
Extract:
- Pricing tiers
- Feature list
- Customer testimonials
Format as structured report
""",
llm=llm,
)
# Batch process data entry
data_entries = [
{"name": "Product A", "price": 99.99},
{"name": "Product B", "price": 149.99},
]
for entry in data_entries:
agent = Agent(
task=f"""
Go to admin.example.com/products/new
Add product: {entry['name']} with price ${entry['price']}
Save and confirm
""",
llm=llm,
)
await agent.run()
# BAD - vague
agent = Agent(task="Find products", llm=llm)
# GOOD - specific
agent = Agent(
task="Go to amazon.com, search for 'mechanical keyboard', filter by 4+ stars, extract top 5 with name and price",
llm=llm,
)
from pydantic import BaseModel
class SearchResult(BaseModel):
title: str
url: str
snippet: str
agent = Agent(
task="Search Google for 'AI news' and get top 5 results",
llm=llm,
output_schema=SearchResult, # Type-safe output
)
# Option 1: Include credentials in task
agent = Agent(
task="""
Go to app.example.com/login
Login with email '[email protected]' and password 'secure123'
Navigate to dashboard
""",
llm=llm,
)
# Option 2: Use cookies/session (more secure)
browser = Browser()
await browser.load_cookies("session_cookies.json")
agent = Agent(task="...", llm=llm, browser=browser)
import asyncio
async def run_with_rate_limit(tasks: list, rate_per_minute: int = 10):
delay = 60 / rate_per_minute
results = []
for task in tasks:
agent = Agent(task=task, llm=llm)
result = await agent.run()
results.append(result)
await asyncio.sleep(delay)
return results
| Feature | Browser Use | Stagehand | |---------|-------------|-----------| | Language | Python | TypeScript | | Self-Hosted | Yes | Yes | | Local LLM | Yes (Ollama) | Limited | | Speed | 3-5x optimized | 44% faster (v3) | | Best For | Python scraping | TypeScript testing | | Learning Curve | Easy | Medium |
When to use Browser Use:
When to use Stagehand:
references/browser-use-setup.md - Complete installation guidereferences/llm-configuration.md - LLM setup for all providersBrowser Use gives you AI browser automation with full control - self-host with any LLM, no rate limits, no vendor lock-in.
development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.