skills/agent-browser/SKILL.md
Headless browser automation CLI optimized for AI agents. Uses snapshot + refs system for 93% less context overhead vs Playwright. Purpose-built for web testing, form automation, screenshots, and data extraction.
npx skillsauth add mgiovani/cc-arsenal agent-browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
agent-browser is an open-source browser automation CLI from Vercel Labs, purpose-built for AI agents. Unlike traditional browser automation tools, it's designed from the ground up for LLM interaction with a snapshot + refs system that reduces context usage by up to 93% compared to Playwright MCP.
@e1 refs instead of fragile CSS selectorsThree-layer design for performance and reliability:
# Install globally via npm
npm install -g agent-browser
# Install browser dependencies
agent-browser install
# Linux: Install system dependencies
agent-browser install --with-deps
# 1. Navigate to a page
agent-browser open https://example.com
# 2. Get snapshot with refs
agent-browser snapshot -i
# Output shows:
# textbox "Email" [ref=e1]
# textbox "Password" [ref=e2]
# button "Submit" [ref=e3]
# 3. Interact using refs
agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "password123"
agent-browser click @e3
# 4. Wait and verify
agent-browser wait --load networkidle
agent-browser snapshot -i
# Run multiple isolated browsers
agent-browser --session auth open https://app.com/login
agent-browser --session test open https://staging.com
# List all active sessions
agent-browser session list
# Clean up
agent-browser --session auth close
The snapshot command is the core of agent-browser's AI optimization. It generates an accessibility tree - a structured, semantic representation of interactive elements.
Traditional tools expose full DOM trees with thousands of nodes. Accessibility trees contain only interactive elements (buttons, inputs, links) with semantic labels - exactly what AI agents need.
Comparison:
# Interactive elements only (recommended for AI)
agent-browser snapshot -i
# Full accessibility tree
agent-browser snapshot
# Compact format (fewer details)
agent-browser snapshot -c
# Limit tree depth (for large pages)
agent-browser snapshot -d 3
# Scope to specific section
agent-browser snapshot -s "#main-content"
Refs are stable identifiers assigned to interactive elements in snapshots:
textbox "Email address" [ref=e1]
placeholder: "Enter your email"
required: true
button "Sign In" [ref=e5]
role: button
enabled: true
Use refs in commands: @e1, @e5, etc.
Advantages over CSS selectors:
# Open URL (auto-prepends https://)
agent-browser open example.com
# History control
agent-browser back
agent-browser forward
agent-browser reload
# Close browser
agent-browser close
# Click elements
agent-browser click @e3
agent-browser dblclick @e5
# Fill forms (clears then types)
agent-browser fill @e1 "text"
# Type text (preserves existing content)
agent-browser type @e2 "additional text"
# Press keys
agent-browser press Enter
agent-browser press "Control+A"
# Checkboxes
agent-browser check @e4
agent-browser uncheck @e4
# Dropdowns
agent-browser select @e6 "Option 2"
# Hover (reveals hidden elements)
agent-browser hover @e7
# Scroll
agent-browser scroll 0 500
agent-browser scrollintoview @e8
# File upload
agent-browser upload @e9 /path/to/file.pdf
# Drag and drop
agent-browser drag @e10 @e11
# Get element data
agent-browser get text @e1
agent-browser get html @e2
agent-browser get value @e3 # Input field value
agent-browser get attr @e4 href # Attribute value
# Page metadata
agent-browser get title
agent-browser get url
# Element metrics
agent-browser get count ".product-card"
agent-browser get box @e5 # Bounding box coordinates
agent-browser get styles @e6 # Computed CSS
# Check element state before interaction
agent-browser is visible @e1
agent-browser is enabled @e2
agent-browser is checked @e3
# Wait for element
agent-browser wait @e5
# Wait duration (milliseconds)
agent-browser wait 2000
# Wait for text
agent-browser wait --text "Success"
# Wait for URL pattern (glob)
agent-browser wait --url "**/dashboard"
# Wait for network idle
agent-browser wait --load networkidle
# Wait for JavaScript condition
agent-browser wait --fn "document.readyState === 'complete'"
# Screenshot (PNG)
agent-browser screenshot page.png
agent-browser screenshot page.png --full # Full page scroll
# PDF export
agent-browser pdf document.pdf
# Video recording (webm)
agent-browser record start demo.webm
agent-browser click @e1
agent-browser record stop
Alternative to refs - use human-readable locators for direct targeting:
# By ARIA role
find role button click --name "Submit"
find role textbox fill --label "Email" "[email protected]"
# By text content
find text "Click here" click
find text "Exact Match" click --exact
# By form labels
find label "Username" fill "admin"
# By placeholder
find placeholder "Search..." fill "query"
# By alt text (images)
find alt "Logo" click
# By title attribute
find title "Close dialog" click
# By test ID
find testid "submit-btn" click
# Position-based
find first "button" click
find last ".item" click
find nth 2 ".card" click
When to use find vs refs:
✓ AI agent automation - Optimized for LLM workflows ✓ CLI-first workflows - Simple command-line usage ✓ Context efficiency matters - 93% less token overhead ✓ Rapid prototyping - Zero configuration needed ✓ Multiple sessions - Easy session isolation ✓ Semantic targeting - Prefer accessibility tree over DOM
✓ Complex programmatic control - Full JavaScript API ✓ Advanced browser features - Service workers, device emulation ✓ Existing Playwright tests - Reuse test infrastructure ✓ Fine-grained control - Direct access to CDP ✓ TypeScript integration - Type-safe browser automation
Summary: agent-browser excels at AI-driven automation with minimal context. Playwright excels at programmatic control with maximum flexibility.
Detailed information is available in bundled reference files (loaded on-demand):
references/command-reference.mdComplete command documentation including:
eval)references/advanced-patterns.mdAdvanced usage patterns:
references/best-practices.mdOptimization and reliability guidance:
references/examples.mdReal-world scenarios:
opensrc/ directoryAGENT_BROWSER_SESSION # Default session name
AGENT_BROWSER_EXECUTABLE_PATH # Custom browser binary
AGENT_BROWSER_EXTENSIONS # Comma-separated extension paths
AGENT_BROWSER_PROVIDER # Cloud provider (browseruse, browserbase)
AGENT_BROWSER_STREAM_PORT # WebSocket port for streaming
AGENT_BROWSER_HOME # Installation directory
cli/src/color.rs for colored output (respects NO_COLOR)# npm packages
npx opensrc <package>
# Python packages
npx opensrc pypi:<package>
# Rust crates
npx opensrc crates:<package>
# GitHub repos
npx opensrc <owner>/<repo>
Quick Reference Card
# Navigate
agent-browser open <url>
# Analyze
agent-browser snapshot -i
# Interact
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser wait @e3
# Verify
agent-browser is visible @e1
# Capture
agent-browser screenshot page.png
# Semantic find
find role button click --name "Submit"
Best Practices:
snapshot -i before interacting@e1) for reliability--load networkidle, --url patterns)-s selector) for large pagesis visible, is enabled) before interaction--session) for isolationdevelopment
Generate comprehensive test suites with coverage analysis and parallel test writing. Automatically activates when users want to write tests, add test coverage, generate test cases, improve testing, or analyze coverage gaps. Supports pytest, vitest, jest, and all major test frameworks.
development
Multi-agent PR review team orchestration with 7 specialized reviewers for security-sensitive or architectural PRs. Spawns architecture, security, performance, testing, style, docs/UX, and adversary reviewers as a coordinated team. Premium review for critical code changes.
development
Spec-driven team orchestration: adaptive development team scaling from 3 to 11 agents based on complexity.
development
Perform comprehensive security review targeting OWASP Top 10 2025 vulnerabilities for PRs, commits, or entire codebases. This skill should be used when a user wants to audit code security, scan for vulnerabilities, review security posture, or check for OWASP compliance. Analysis only - identifies vulnerabilities without modifying code.