workers/public/qa-tester/skills/agent-browser/SKILL.md
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
npx skillsauth add indigoai-us/hq-core agent-browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
100% native Rust — no Node.js or Playwright dependency. 7MB install, 8MB memory. Direct CDP connection to Chromium.
Every browser automation follows this pattern:
agent-browser open <url>agent-browser snapshot -i (get element refs like @e1, @e2)agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser scroll down 500 # Scroll page
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser pdf output.pdf # Save as PDF
agent-browser download @e1 ./file # Download file by clicking element
# Chrome DevTools inspector (0.18) — opens DevTools for active page
agent-browser inspect
# CDP WebSocket URL (0.18) — for external debugging tools
agent-browser get cdp-url
# Clipboard access (0.19)
agent-browser clipboard read # Read clipboard
agent-browser clipboard write "text" # Write to clipboard
agent-browser clipboard copy # Ctrl+C
agent-browser clipboard paste # Ctrl+V
# Screenshot configuration (0.19)
agent-browser screenshot --screenshot-dir ./captures
agent-browser screenshot --screenshot-quality 80
agent-browser screenshot --screenshot-format jpeg
# Also: AGENT_BROWSER_SCREENSHOT_DIR, _QUALITY, _FORMAT env vars
# Browserless.io provider (0.19)
agent-browser --provider browserless open https://example.com
# Env: BROWSERLESS_API_KEY, BROWSERLESS_API_URL
# Brave browser auto-discovery (0.20.7)
# Automatically finds Brave for CDP connections on macOS/Linux/Windows
# Annotated screenshots — bounding boxes + ref labels for vision models
agent-browser screenshot --annotate
# Lightpanda browser engine
agent-browser --engine lightpanda open https://example.com
# or: AGENT_BROWSER_ENGINE=lightpanda
# Dialog handling
agent-browser dialog dismiss
# Retina / device scale factor
agent-browser viewport 1440 900 --scale 2
# Headed mode via env var (no --headed flag needed)
# AGENT_BROWSER_HEADED=1
# Electron webview support — pages list shows target type
agent-browser pages # Shows webview targets
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "[email protected]"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5 # Get specific element text
agent-browser get text body > page.txt # Get all page text
# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
agent-browser --headed open https://example.com
agent-browser highlight @e1 # Highlight element
agent-browser record start demo.webm # Record session
# Open local files with file:// URLs
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
# List available iOS simulators
agent-browser device list
# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1 # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up # Mobile-specific gesture
# Take screenshot
agent-browser -p ios screenshot mobile.png
# Close session (shuts down simulator)
agent-browser -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # MUST re-snapshot
agent-browser click @e1 # Use new refs
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "[email protected]"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
| Reference | When to Use | |-----------|-------------| | references/commands.md | Full command reference with all options | | references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting | | references/session-management.md | Parallel sessions, state persistence, concurrent scraping | | references/authentication.md | Login flows, OAuth, 2FA handling, state reuse | | references/video-recording.md | Recording workflows for debugging and documentation | | references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
| Template | Description | |----------|-------------| | templates/form-automation.sh | Form filling with validation | | templates/authenticated-session.sh | Login once, reuse state | | templates/capture-workflow.sh | Content extraction with screenshots |
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output
tools
Discovery + dispatch entry point for native HQ inside Cowork (or any sandboxed Claude Code plugin host). Enumerates every HQ capability available through hq-pack-cowork's host-side MCP server (identity, sync, qmd/search, secrets, vault files, team & membership, packages & modules, meeting intelligence, feedback, schema-backed runs, guarded long-tail CLI) and routes to the right `mcp__hq__*` tool while preserving default HQ behavior through a different transport. Use when the agent needs HQ but `hq`/`qmd` aren't reachable from its bash sandbox and isn't sure which tool to call.
tools
Run a full HQ sync (all cloud-backed companies, bidirectional) from a sandboxed Claude Code plugin host (Cowork) by calling the host-side `hq_sync` MCP tool. Same engine as AppBar HQ Sync and the `/hq-sync` skill, but routed through the hq-pack-cowork MCP server so it works even when the `hq` binary and `~/.hq` auth are not reachable from the agent's bash sandbox.
tools
Share an HQ vault path from a sandboxed Claude Code plugin host (Cowork) by calling the host-side `hq_share` MCP tool. Without `--with`, mints an encrypted single-use share-session URL (default 15-min expiry). With `--with`, grants direct ACL access to a person, group, or `@all`. Same capability as `/hq-share`, routed through hq-pack-cowork's MCP server so it works from a sandboxed agent.
tools
--- name: hq-cowork-secrets description: Use HQ secrets from a sandboxed Claude Code plugin host (Cowork). The host-side MCP server never returns a secret value itself: `mcp__hq__hq_secrets_exec` runs a command on the host with named secrets injected as env vars (only the command's output returns), and refuses to launch a shell or value-printing binary; `mcp__hq__hq_secrets_list` lists secret NAMES/metadata only. These tools run host commands with the user's privileges, so treat them as host-tru