skills/agent-browser/SKILL.md
Automate headless browser interactions using agent-browser CLI. Use when scraping web pages, automating form submissions, testing web apps, or performing browser automation tasks. Works with element refs (@e1, @e2) optimized for AI agent reasoning.
npx skillsauth add ckorhonen/claude-skills agent-browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A headless browser automation CLI designed for AI agents, with fast Rust-based execution and element refs optimized for LLM reasoning.
agent-browser provides programmatic browser control through a CLI that's purpose-built for AI agent workflows. It uses deterministic element references (@e1, @e2, etc.) from accessibility trees, making it ideal for LLM-based automation where consistent element targeting is critical.
Key Features:
@e1, @e2) for stable LLM reasoningagent-browser install)# Install globally
bun install -g agent-browser
# Download Chromium
agent-browser install
# Linux with system dependencies
agent-browser install --with-deps
Verify installation:
agent-browser --version
# Navigate to a page
agent-browser open https://example.com
# Get accessibility snapshot with element refs
agent-browser snapshot -i # -i = interactive elements only
# Click an element by ref
agent-browser click @e2
# Fill a form field
agent-browser fill @e3 "[email protected]"
# Take a screenshot
agent-browser screenshot output.png
# Open URL
agent-browser open https://example.com
# Get interactive elements with refs
agent-browser snapshot -i --json
The snapshot returns elements like:
@e1 link "Home"
@e2 textbox "Email"
@e3 button "Submit"
# Click by ref
agent-browser click @e2
# Type into focused element
agent-browser type @e2 "[email protected]"
# Fill (clears first, then types)
agent-browser fill @e2 "[email protected]"
# Press keyboard key
agent-browser press Enter
# Check element visibility
agent-browser is visible @e3
# Get element text
agent-browser get text @e1
# Get page URL
agent-browser get url
| Command | Description |
|---------|-------------|
| open <url> | Navigate to URL |
| back | Go back in history |
| forward | Go forward in history |
| reload | Reload current page |
| close | Close browser |
| Command | Description |
|---------|-------------|
| click <ref> | Click element |
| dblclick <ref> | Double-click element |
| type <ref> <text> | Type text into element |
| fill <ref> <text> | Clear and fill element |
| press <key> | Press keyboard key (Enter, Tab, Control+a) |
| hover <ref> | Hover over element |
| focus <ref> | Focus element |
| check <ref> | Check checkbox |
| uncheck <ref> | Uncheck checkbox |
| select <ref> <val> | Select dropdown option |
| scroll <dir> [px] | Scroll (up/down/left/right) |
| wait <ref\|ms> | Wait for element or milliseconds |
| Command | Description |
|---------|-------------|
| snapshot | Get accessibility tree with refs |
| snapshot -i | Interactive elements only |
| snapshot -c | Compact (remove empty elements) |
| snapshot -d <n> | Limit tree depth |
| get text <ref> | Get element text content |
| get html <ref> | Get element HTML |
| get value <ref> | Get input value |
| get url | Get current page URL |
| get title | Get page title |
| screenshot [path] | Take screenshot |
| screenshot --full | Full page screenshot |
| pdf <path> | Save page as PDF |
| Command | Description |
|---------|-------------|
| is visible <ref> | Check if element is visible |
| is enabled <ref> | Check if element is enabled |
| is checked <ref> | Check if checkbox is checked |
# Find by role and click
agent-browser find role button click --name Submit
# Find by text
agent-browser find text "Sign In" click
# Find by label
agent-browser find label "Email" fill "[email protected]"
# Find by placeholder
agent-browser find placeholder "Search..." type "query"
Sessions provide isolated browser instances for parallel execution:
# Use named session
agent-browser --session login-flow open https://app.com
# Different session for another task
agent-browser --session checkout open https://app.com/cart
# List active sessions
agent-browser session list
# Environment variable (persistent across commands)
export AGENT_BROWSER_SESSION=my-session
agent-browser open https://example.com
Use --json for machine-readable output:
# Snapshot as JSON
agent-browser snapshot -i --json
# Get element info as JSON
agent-browser get text @e1 --json
# Parse in scripts
agent-browser get url --json | jq -r '.url'
This section covers frequent failure modes and how to debug them. Each pitfall includes the symptom you'll see, the underlying cause, and the fix.
Symptom: Error: Browser executable not found at default path or agent-browser: command not found
Cause: Chromium isn't installed, or agent-browser CLI isn't on PATH
Fix:
# Install Chromium
agent-browser install
# For Linux, install system dependencies
agent-browser install --with-deps
# Verify installation
agent-browser --version
Symptom: Error: Element @e2 not found even though you just used it, or clicks fail silently
Cause: Element refs change when the DOM updates (navigation, page reload, dynamic content). Refs are only valid for the current DOM snapshot.
Fix:
# Always re-snapshot after major page changes
agent-browser click @e1 # Click submit
agent-browser wait 2000 # Wait for navigation
agent-browser snapshot -i --json # Get fresh refs
agent-browser click @e1 # Use new ref (was @e1 before? Verify!)
Prevention: In scripts, capture fresh refs after each navigation or dynamic load.
Symptom: Error: Session "default" already in use or Port 9222 already in use
Cause: Browser session is still running from a previous command or crashed script, preventing a new session from starting
Fix:
# List active sessions
agent-browser session list
# Kill a specific session
agent-browser session kill my-session
# Use unique session names to avoid conflicts
agent-browser --session upload-$(date +%s) open https://app.com
# Or set environment variable once
export AGENT_BROWSER_SESSION=task-$(date +%s)
agent-browser open https://example.com
agent-browser close
Symptom: agent-browser click @e5 succeeds but nothing happens, or element is invisible in screenshot
Cause: Element is outside viewport, hidden by CSS (display: none, visibility: hidden), or covered by another element. Accessibility tree may show it, but browser can't interact with it.
Fix:
# Check visibility before interaction
agent-browser is visible @e5
# Scroll to make element visible
agent-browser scroll down 500
# Take screenshot to visually verify
agent-browser screenshot current-state.png
# Try parent element if child is deeply nested
agent-browser click @e4 # Click parent button instead
Prevention: Always take screenshots before and after critical interactions to verify visual state.
Symptom: Error: Timeout waiting for element or script hangs indefinitely after clicking a link
Cause:
Fix:
# Explicit wait before checking for element
agent-browser wait 2000 # Wait 2 seconds
agent-browser snapshot -i --json # Check if element exists
# Wait for specific element to appear
agent-browser wait @e3
# Increase timeout with explicit waits in script
agent-browser click @submit
sleep 3 # Shell script sleep
agent-browser snapshot -i
# Check page title to verify navigation worked
agent-browser get title
Prevention: Add explicit wait commands after form submissions or link clicks that cause navigation.
Symptom: agent-browser snapshot -i --json returns [] or very few elements, but you see them visually in screenshot
Cause:
-i flag filters too aggressivelyFix:
# Wait for content to render
agent-browser wait 2000
agent-browser snapshot -i
# Remove interactive-only filter to see all elements
agent-browser snapshot -c # Compact but includes all elements
# Increase tree depth to find nested elements
agent-browser snapshot -d 5
# Use full snapshot to diagnose
agent-browser snapshot --json | jq . | less
# Check page title to verify page loaded
agent-browser get title
Prevention: Add wait delays after opening pages with heavy JavaScript rendering.
Symptom: Text appears to be cut off, only first few characters typed, or field stays empty
Cause:
Fix:
# Always focus explicitly before filling
agent-browser focus @e2
agent-browser wait 500 # Let field settle
agent-browser fill @e2 "[email protected]"
# For fields with slow JS event handling
agent-browser type @e2 "first part"
agent-browser wait 500
agent-browser type @e2 "second part"
# Verify what was actually entered
agent-browser get value @e2
Prevention: After any form fill, immediately verify the value was entered correctly with get value.
Symptom: Page loads but data doesn't populate, or integration tests fail when hitting real APIs
Cause:
Fix:
# Set custom headers for auth
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com
# Mock API responses instead
agent-browser network route "**/api/data" --body '{"result":"mocked"}'
# Add explicit wait for API response
agent-browser wait 3000 # Give API time to respond
agent-browser snapshot -i
# Fallback: Use `--headed` to see error messages
agent-browser --headed open https://app.com
agent-browser click @e1 # See browser console errors
Prevention: For testing, use network route to mock external APIs. For production, verify connectivity before running automation.
Symptom: agent-browser command hangs or process crashes with segmentation fault; Chrome window closes unexpectedly
Cause:
Fix:
# Restart with fresh session
agent-browser session kill all-sessions # Or specify a name
# Use lightweight mode with limited resources
agent-browser --session lite open https://minimal-site.com
# Reduce screenshot frequency
# Instead of: agent-browser screenshot after every click
# Do: agent-browser screenshot only on failure
# Check system resources
top -n 1 | head -20
# Run with --debug for crash diagnostics
agent-browser --debug open https://example.com 2>&1 | tail -50
Prevention:
agent-browser closeSymptom: Multiple elements resolve to the same @e number, clicking @e1 affects the wrong element
Cause: Accessibility tree groups similar elements (buttons, links) and agent-browser assigns refs based on order, not uniqueness
Fix:
# Use full snapshot to understand structure
agent-browser snapshot -d 3 --json > structure.json
cat structure.json | jq . # Examine hierarchy
# Click using find by text if available
agent-browser find text "Specific Button Text" click
# Use CSS selector if accessibility tree isn't clear
agent-browser find selector "button.submit-btn" click
# Take screenshot and count visually
agent-browser screenshot debug.png
# Then manually map which @eN corresponds to which button
# Use context to disambiguate
agent-browser get text @e1 # Check what @e1 actually is
Prevention: Always verify element identity by checking its text or taking screenshots before critical interactions.
Connect to an existing Chrome instance:
# Start Chrome with debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# Connect agent-browser
agent-browser --cdp 9222 snapshot
# Start recording
agent-browser record start recording.webm https://example.com
# Perform actions...
agent-browser click @e1
agent-browser fill @e2 "test"
# Stop and save
agent-browser record stop
# Block requests to URL pattern
agent-browser network route "**/analytics/*" --abort
# Mock API response
agent-browser network route "**/api/user" --body '{"name": "Test"}'
# View captured requests
agent-browser network requests
# Set auth headers
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com
# Load extension
agent-browser --extension /path/to/extension open https://example.com
#!/bin/bash
# Login automation script
agent-browser open https://app.com/login
# Get form elements
agent-browser snapshot -i --json > elements.json
# Fill credentials
agent-browser fill @e2 "[email protected]"
agent-browser fill @e3 "password123"
# Submit
agent-browser click @e4
# Wait for navigation
agent-browser wait 2000
# Verify login
agent-browser get url --json | jq -r '.url'
#!/bin/bash
# Scrape product data
agent-browser open https://shop.com/products
# Get page content
agent-browser snapshot --json > snapshot.json
# Extract specific element text
agent-browser get text '[data-testid="price"]' --json
# Screenshot for verification
agent-browser screenshot products.png
#!/bin/bash
SESSION="checkout-flow"
# Step 1: Add to cart
agent-browser --session $SESSION open https://shop.com/product/123
agent-browser --session $SESSION click @add-to-cart-button
agent-browser --session $SESSION wait 1000
# Step 2: Go to checkout
agent-browser --session $SESSION open https://shop.com/checkout
agent-browser --session $SESSION snapshot -i
# Step 3: Fill shipping
agent-browser --session $SESSION fill @shipping-name "John Doe"
agent-browser --session $SESSION fill @shipping-address "123 Main St"
# Cleanup
agent-browser --session $SESSION close
| Option | Description |
|--------|-------------|
| --session <name> | Isolated browser session |
| --json | JSON output format |
| --headed | Show browser window (not headless) |
| --cdp <port> | Connect via Chrome DevTools Protocol |
| --headers <json> | HTTP headers for requests |
| --proxy <url> | Proxy server |
| --executable-path <path> | Custom browser executable |
| --extension <path> | Load browser extension |
| --full, -f | Full page screenshot |
| --debug | Debug output |
| Variable | Description |
|----------|-------------|
| AGENT_BROWSER_SESSION | Default session name |
| AGENT_BROWSER_EXECUTABLE_PATH | Custom browser path |
| AGENT_BROWSER_STREAM_PORT | WebSocket streaming port |
User request:
"Scrape the product title and price from https://example.com/product/123"
Workflow:
# 1. Navigate to page
agent-browser open https://example.com/product/123
# 2. Get accessibility snapshot
agent-browser snapshot -i --json > page.json
# 3. Identify elements (example output):
# @e5 heading "Premium Widget Pro"
# @e12 text "$49.99"
# 4. Extract data
agent-browser get text @e5 --json # Returns: {"text": "Premium Widget Pro"}
agent-browser get text @e12 --json # Returns: {"text": "$49.99"}
Expected output:
{
"title": "Premium Widget Pro",
"price": "$49.99"
}
Time: 2-3 seconds
User request:
"Fill out the contact form at https://example.com/contact and verify submission"
Workflow:
# 1. Navigate and snapshot
agent-browser open https://example.com/contact
agent-browser snapshot -i
# Example snapshot output:
# @e1 textbox "Name"
# @e2 textbox "Email"
# @e3 textbox "Message"
# @e4 button "Send"
# 2. Fill form fields
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "[email protected]"
agent-browser fill @e3 "I have a question about your product"
# 3. Submit
agent-browser click @e4
# 4. Wait for response
agent-browser wait 2000
# 5. Verify success
agent-browser snapshot -i | grep "Thank you"
# Or take screenshot for manual verification
agent-browser screenshot success.png
Expected outcome:
Time: 4-6 seconds
User request:
"Add item to cart, proceed to checkout, and fill shipping info"
Workflow:
# Use session for state persistence across commands
SESSION="checkout-flow-$(date +%s)"
# 1. Add to cart
agent-browser --session $SESSION open https://shop.example.com/product/widget
agent-browser --session $SESSION snapshot -i | grep "Add to Cart"
agent-browser --session $SESSION click @add-to-cart # Assuming @add-to-cart ref found
agent-browser --session $SESSION wait 1000 # Wait for cart update
# 2. Go to checkout
agent-browser --session $SESSION open https://shop.example.com/checkout
agent-browser --session $SESSION snapshot -i
# 3. Fill shipping form (refs from snapshot)
agent-browser --session $SESSION fill @shipping-name "Jane Smith"
agent-browser --session $SESSION fill @shipping-address "456 Oak Ave"
agent-browser --session $SESSION fill @shipping-city "Portland"
agent-browser --session $SESSION select @shipping-state "OR"
agent-browser --session $SESSION fill @shipping-zip "97201"
# 4. Take screenshot before submission
agent-browser --session $SESSION screenshot checkout-filled.png
# 5. Cleanup
agent-browser --session $SESSION close
Expected outcome:
Time: 8-12 seconds
User request:
"Log into dashboard, navigate to reports, and extract table data"
Workflow:
SESSION="dashboard-scrape"
# 1. Login
agent-browser --session $SESSION open https://app.example.com/login
agent-browser --session $SESSION snapshot -i
# Fill login form
agent-browser --session $SESSION fill @email "[email protected]"
agent-browser --session $SESSION fill @password "secretpass"
agent-browser --session $SESSION click @submit
agent-browser --session $SESSION wait 2000
# 2. Navigate to reports
agent-browser --session $SESSION open https://app.example.com/reports
agent-browser --session $SESSION wait 1000
# 3. Extract table data
agent-browser --session $SESSION snapshot --json > reports.json
# Parse JSON to extract table rows
# @e20 table
# @e21 row "Q1 2024, $50,000, 15% growth"
# @e22 row "Q2 2024, $62,000, 24% growth"
agent-browser --session $SESSION get text @e20 --json
# 4. Cleanup
agent-browser --session $SESSION close
Expected output:
{
"reports": [
{"quarter": "Q1 2024", "revenue": "$50,000", "growth": "15%"},
{"quarter": "Q2 2024", "revenue": "$62,000", "growth": "24%"}
]
}
Time: 6-10 seconds
Symptoms:
Error: Element @e5 not found
Cause: Element ref changed after page update or snapshot was stale
Solution:
# 1. Get fresh snapshot
agent-browser snapshot -i
# 2. Verify element ref in output
# Look for expected element in snapshot output
# 3. If element doesn't appear, try without -i flag
agent-browser snapshot # Shows all elements, not just interactive
# 4. If still missing, check if page loaded completely
agent-browser wait 2000 # Wait for page to settle
agent-browser snapshot -i
Symptoms:
Error: Navigation timeout after 30000ms
Cause: Page takes too long to load or network issues
Solution:
# 1. Check if page is accessible
agent-browser --headed open https://example.com # Visual debugging
# 2. Increase wait time after navigation
agent-browser open https://slow-site.com
agent-browser wait 5000 # Wait 5 seconds
# 3. Check network requests
agent-browser network requests # See what's loading
# 4. Use CDP connection for better control
agent-browser --cdp 9222 open https://example.com
Symptoms: Element click command runs but nothing happens
Cause: Element not visible, disabled, or covered by another element
Solution:
# 1. Check if element is visible
agent-browser is visible @e3
# Returns: true/false
# 2. Check if element is enabled
agent-browser is enabled @e3
# 3. Try scrolling to element first
agent-browser scroll down 500
agent-browser wait 500
agent-browser click @e3
# 4. Take screenshot to see page state
agent-browser screenshot debug.png
# 5. Try double-click instead
agent-browser dblclick @e3
Symptoms: Text typed but form field remains empty
Cause: JavaScript framework requires special events or delays
Solution:
# 1. Focus element first
agent-browser focus @e2
agent-browser wait 200
# 2. Use fill instead of type (clears first)
agent-browser fill @e2 "text content"
# 3. Add delays between keystrokes
agent-browser type @e2 "slow"
agent-browser wait 100
# 4. Try pressing Tab after filling to trigger validation
agent-browser fill @e2 "[email protected]"
agent-browser press Tab
Symptoms: Previous session state not available
Cause: Session wasn't properly named or browser closed unexpectedly
Solution:
# 1. Always use explicit session names
SESSION="my-workflow-$(date +%s)" # Unique session ID
agent-browser --session $SESSION open https://example.com
# 2. Check active sessions
agent-browser --session $SESSION get url
# If error, session doesn't exist
# 3. Don't reuse session names across runs
# Each workflow should have unique session ID
# 4. Cleanup sessions when done
agent-browser --session $SESSION close
Symptoms:
Error: Failed to download Chromium
Cause: Network issues, disk space, or missing system dependencies
Solution:
# 1. Check disk space
df -h # Ensure >2GB free
# 2. Retry installation with verbose output
agent-browser install --debug
# 3. On Linux, install system dependencies first
agent-browser install --with-deps
# 4. Use custom browser if needed
agent-browser --executable-path /usr/bin/chromium open https://example.com
# 5. Verify installation
agent-browser --version
ls ~/.cache/ms-playwright/ # Check if Chromium exists
Symptoms: Can't parse JSON output from commands
Cause: Mixed text/JSON output or errors in JSON mode
Solution:
# 1. Use --json flag consistently
agent-browser get text @e5 --json # Not: agent-browser get text @e5
# 2. Redirect stderr to separate stream
agent-browser snapshot --json 2>errors.log >output.json
# 3. Validate JSON output
agent-browser snapshot --json | jq . # Will error if invalid
# 4. Check for error messages in output
agent-browser snapshot --json | grep -E '^{' | jq .
--json for parsing: Machine-readable output for scripts--session for parallel workflowswait commands after navigation/clicks-i flag to reduce noisedocumentation
Create or expand an Idea.md / IDEA.md file from a rough description, existing repo, conversation history, notes, or other early-stage product inputs. Use when the user asks to "write an Idea.md", "turn this into an idea file", "capture this product idea", "expand this concept", or wants a repo-grounded concept brief before validation, PRD, or implementation work.
development
Write structured implementation plans from specs or requirements before touching code. Use when given a spec, requirements doc, or feature description, when user says "plan this out", "write a plan for", "how should we implement", or before starting any multi-step coding task.
testing
Expert guidance for video editing with ffmpeg, encoding best practices, and quality optimization. Use when working with video files, transcoding, remuxing, encoding settings, color spaces, or troubleshooting video quality issues.
development
Opinionated constraints for building better interfaces with agents. Use when building UI components, implementing animations, designing layouts, reviewing frontend accessibility, or working with Tailwind CSS, motion/react, or accessible primitives like Radix/Base UI.