plugins/vercel/skills/agent-browser/SKILL.md
Browser automation CLI for AI agents. Use when the user needs to interact with websites, verify dev server output, test web apps, navigate pages, fill forms, click buttons, take screenshots, extract data, or automate any browser task. Also triggers when a dev server starts so you can verify it visually.
npx skillsauth add openai/plugins agent-browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a dev server is running or the user asks to verify, test, or interact with a web page, use agent-browser to automate the browser.
Every browser automation follows this pattern:
agent-browser open <url>agent-browser snapshot -i (get element refs like @e1, @e2)agent-browser open http://localhost:3000
agent-browser wait --load networkidle
agent-browser snapshot -i
When a dev server starts, use agent-browser to verify it's working:
# After starting a dev server (next dev, vite, etc.)
agent-browser open http://localhost:3000
agent-browser wait --load networkidle
agent-browser screenshot dev-check.png
agent-browser snapshot -i
Commands can be chained with &&. The browser persists between commands via a background daemon.
agent-browser open http://localhost:3000 && agent-browser wait --load networkidle && agent-browser snapshot -i
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
# Snapshot
agent-browser snapshot -i # Interactive elements with refs
agent-browser snapshot -i -C # Include cursor-interactive elements
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser scroll down 500 # Scroll page
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered labels
agent-browser pdf output.pdf # Save as PDF
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser open http://localhost:3000/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "[email protected]"
agent-browser click @e5
agent-browser wait --load networkidle
# Login once and save state
agent-browser open http://localhost:3000/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Reuse in future sessions
agent-browser state load auth.json
agent-browser open http://localhost:3000/dashboard
agent-browser open http://localhost:3000/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > page.txt
agent-browser --headed open http://localhost:3000
agent-browser highlight @e1
agent-browser record start demo.webm
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # MUST re-snapshot
agent-browser click @e1 # Use new refs
Use --annotate for screenshots with numbered labels on interactive elements:
agent-browser screenshot --annotate
# Output: [1] @e1 button "Submit", [2] @e2 link "Home", ...
agent-browser click @e2
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "[email protected]"
agent-browser find role button click --name "Submit"
# Simple expressions
agent-browser eval 'document.title'
# Complex JS: use --stdin with heredoc
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
agent-browser --session site1 open http://localhost:3000
agent-browser --session site2 open http://localhost:3001
agent-browser session list
agent-browser close # Always close when done
agent-browser wait --load networkidle # Best for slow pages
agent-browser wait "#content" # Wait for specific element
agent-browser wait --url "**/dashboard" # Wait for URL pattern
agent-browser wait 5000 # Fixed wait (last resort)
tools
Top-level workflow skill for USD performance diagnosis and optimization. Use for slow loading, high memory, low FPS, or 'optimize my scene' requests; delegates auth/runtime setup to Phase 0 owners.
data-ai
Use when the user mentions MagicPath, designs, UI components, themes, canvas selections, or repo-to-canvas UI work; run magicpath-ai to search, inspect, install, or author components.
documentation
Use as the top-level router for Omniverse Realtime Viewer USD app requests and focused viewer reference documents.
tools
Turn Notion specs into implementation plans, tasks, and progress tracking; use when implementing PRDs/feature specs and creating Notion plans + tasks from them.