skills/agent-browser/SKILL.md
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
npx skillsauth add jim60105/copilot-prompt agent-browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every browser automation follows this pattern:
agent-browser open <url>agent-browser snapshot -i (get element refs like @e1, @e2)agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser scroll down 500 # Scroll page
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser pdf output.pdf # Save as PDF
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "[email protected]"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5 # Get specific element text
agent-browser get text body > page.txt # Get all page text
# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
agent-browser --headed open https://example.com
agent-browser highlight @e1 # Highlight element
agent-browser record start demo.webm # Record session
# Open local files with file:// URLs
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
# List available iOS simulators
agent-browser device list
# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1 # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up # Mobile-specific gesture
# Take screenshot
agent-browser -p ios screenshot mobile.png
# Close session (shuts down simulator)
agent-browser -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # MUST re-snapshot
agent-browser click @e1 # Use new refs
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "[email protected]"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
| Reference | When to Use | | -------------------------------------------------------------------- | --------------------------------------------------------- | | references/commands.md | Full command reference with all options | | references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting | | references/session-management.md | Parallel sessions, state persistence, concurrent scraping | | references/authentication.md | Login flows, OAuth, 2FA handling, state reuse | | references/video-recording.md | Recording workflows for debugging and documentation | | references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
| Template | Description | | ------------------------------------------------------------------------ | ----------------------------------- | | templates/form-automation.sh | Form filling with validation | | templates/authenticated-session.sh | Login once, reuse state | | templates/capture-workflow.sh | Content extraction with screenshots |
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output
development
Diátaxis Documentation Expert. An expert technical writer specializing in creating high-quality software documentation, guided by the principles and structure of the Diátaxis technical documentation authoring framework.
testing
Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.
tools
Comprehensive guide for building, configuring, customizing, and deploying Docsify documentation sites. Use when the user wants to (1) initialize a new Docsify site, (2) add or organize Markdown pages, sidebars, navbars, or cover pages, (3) configure `window.$docsify` options, (4) customize themes / CSS variables / fonts, (5) install built-in or third-party Docsify plugins (search, GA, emoji, zoom, copy-code, comments, pagination, tabs, etc.), (6) write a custom Docsify plugin using lifecycle hooks, (7) use Docsify Markdown helpers (callouts, link attributes, image attributes, heading IDs, task lists, embed files with `:include`), (8) deploy to GitHub Pages, GitLab Pages, Netlify, Vercel, Firebase, Docker, Nginx, etc., (9) enable PWA / offline mode, virtual routes, or Vue compatibility, or (10) upgrade a Docsify site from v4 to v5. Triggers on mentions of "docsify", "_sidebar.md", "_navbar.md", "_coverpage.md", "$docsify", or `docsify-cli`.
testing
Writing guidelines for producing high-quality Traditional Chinese (zh-TW) content. Use when writing any kind of content. Including blog posts, notes, technical articles, technical writing, chitchat, social media posts, etc., even when you are just sending a text message. Also use when reviewing or editing existing Chinese content for tone, style, and terminology compliance.