agent-browser/SKILL.md
Headless browser automation for AI agents using agent-browser CLI. Use when Claude needs to automate web browsing, scrape web data, interact with web pages, fill forms, take screenshots, or perform any browser-based tasks. Supports reference-based element targeting, session management, and semantic locators.
npx skillsauth add hhu3637kr/skills agent-browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide for using agent-browser CLI to automate web browsing tasks in Claude Code.
Before using agent-browser, verify installation:
# Check if installed
agent-browser --version
# If not installed, install globally
npm install -g agent-browser
agent-browser install # Download Chromium
Windows Note: If you encounter /bin/sh errors on Windows, use:
npx agent-browser <command>
See troubleshooting.md for platform-specific issues.
agent-browser uses a refs-based system where page elements get unique identifiers (like @e1, @e2) that you can use for interactions.
# 1. Navigate to page
agent-browser open example.com
# 2. Get page snapshot with interactive elements
agent-browser snapshot -i --json
# 3. Use refs from snapshot to interact
agent-browser click @e5
agent-browser fill @e3 "search query"
# 4. Take screenshot or get results
agent-browser screenshot result.png
agent-browser open <url> # Open URL
agent-browser goto <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser snapshot # Get accessibility tree
agent-browser snapshot -i # Interactive elements only
agent-browser snapshot -i --json # JSON format (best for AI)
agent-browser screenshot <file> # Take screenshot
agent-browser get text @e1 # Get element text
agent-browser get html # Get page HTML
agent-browser get url # Get current URL
agent-browser click @e2 # Click element by ref
agent-browser dblclick @e2 # Double click
agent-browser fill @e3 "text" # Fill input field
agent-browser type @e3 "text" # Type text (slower, more realistic)
agent-browser press Enter # Press keyboard key
agent-browser check @e4 # Check checkbox
agent-browser select @e5 "option" # Select dropdown option
agent-browser upload @e6 file.pdf # Upload file
When you don't have refs, use semantic locators:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "[email protected]"
agent-browser find placeholder "Search..." type "query"
agent-browser wait @e1 # Wait for element
agent-browser wait --text "Done" # Wait for text
agent-browser wait --url /success # Wait for URL change
agent-browser wait --load # Wait for page load
Use sessions to run multiple isolated browser instances:
# Start different sessions
agent-browser --session task1 open site-a.com
agent-browser --session task2 open site-b.com
# Each session has separate:
# - Cookies and storage
# - Authentication state
# - Navigation history
# List active sessions
agent-browser session list
# Close specific session
agent-browser --session task1 close
For AI-driven automation, follow this pattern:
agent-browser open https://example.com
agent-browser snapshot -i --json > page.json
Parse JSON to understand page structure
Execute actions using refs
agent-browser click @e2
agent-browser fill @e5 "input data"
agent-browser snapshot -i --json > updated.json
See workflows.md for detailed AI workflow patterns.
# Block requests
agent-browser route --block "*.ads.com/*"
# Mock responses
agent-browser route --mock "/api/data" response.json
# Save authentication state
agent-browser save-state auth.json
# Load state in new session
agent-browser load-state auth.json
# Enable console logs
agent-browser --console open example.com
# Highlight elements
agent-browser highlight @e3
# Enable tracing
agent-browser --trace trace.zip open example.com
-i --json for snapshots - Reduces noise, easier for AI to parsewait commands to handle dynamic contentagent-browser open https://form.example.com
agent-browser snapshot -i --json
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "[email protected]"
agent-browser click @e3 # Submit button
agent-browser wait --url /success
agent-browser open https://data.example.com
agent-browser snapshot -i --json > structure.json
agent-browser get text @e5 > data.txt
agent-browser screenshot evidence.png
# Login
agent-browser open https://app.example.com/login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
# Navigate to target
agent-browser wait --url /dashboard
agent-browser goto https://app.example.com/data
# Extract data
agent-browser snapshot -i --json > results.json
npx agent-browser if global command fails/bin/sh errorsagent-browser install --with-depsnpm install -g agent-browseragent-browser is built on Playwright with:
@e1, @e2) for stable element targeting✅ Use agent-browser when:
❌ Don't use agent-browser when:
tools
张一鸣(字节跳动/TikTok创始人)的思维框架与表达方式。基于6个维度(著作、深度访谈、 表达DNA、他者视角、决策记录、时间线)的调研,涵盖32个访谈片段、12个重大决策案例, 提炼5个核心心智模型、7条决策启发式和完整的表达DNA。 用途:作为思维顾问,用张一鸣的视角分析产品、组织、全球化、人才和个人成长问题。 当用户提到「用张一鸣的视角」「张一鸣会怎么看」「一鸣的思路」「zhang yiming perspective」时使用。 即使用户只是说「帮我用张一鸣的角度想想」「如果是字节会怎么做」「切换到张一鸣」也应触发。 即使用户说「字节怎么看」「头条的逻辑」「一鸣怎么选择」「一鸣」也应触发。
documentation
$10K/hr级X/Twitter运营导师。基于Nicolas Cole、Dickie Bush、Sahil Bloom、Justin Welsh、 Dan Koe、Alex Hormozi六位顶级创作者的方法论 + X开源算法深度分析 + AI/科技赛道专精策略, 提炼6个核心心智模型、10条决策启发式、完整的选题-写作-增长操作手册。 通用方法论为底座,AI/科技赛道为专精。 当用户提到「X运营」「推特」「Twitter」「怎么写推文」「怎么涨粉」「X策略」「推特选题」「tweet」「thread」「X算法」时使用。 即使用户只是说「这条推文怎么写」「帮我想个X内容」「推特增长」「发推」「write a tweet」「X account」「grow on X」也应触发。
tools
唐纳德·特朗普(Donald Trump)的思维框架与行为逻辑。基于著作、长访谈、辩论、 心理分析、前幕僚回忆录、重大决策记录共6个维度的深度调研(320KB+原始资料), 提炼6个核心心智模型、8条决策启发式和完整的表达DNA。 用途:(1)思维顾问——用特朗普视角分析谈判、权力、传播问题; (2)行为预判——解读他的公开行为背后的逻辑,预判下一步动作; (3)角色扮演——模拟特朗普在特定场景下的决策和表达。 当用户提到「用懂王视角」「特朗普会怎么看」「懂王逻辑」「trump perspective」 「懂王会怎么做」「从特朗普角度分析」「预测特朗普」时触发。
tools
塔勒布(Nassim Nicholas Taleb)的思维框架与表达方式。基于40+个来源的深度调研, 提炼6个核心心智模型、9条决策启发式和完整的表达DNA。 用途:作为思维顾问,用塔勒布的视角分析问题、审视决策、质疑主流叙事。 当用户提到「用塔勒布的视角」「塔勒布会怎么看」「塔勒布模式」「反脆弱视角」「taleb perspective」时使用。 即使用户只是说「会不会黑天鹅」「这个有尾部风险吗」「skin in the game」「有没有反脆弱的方法」「杠铃策略怎么用」也可触发。 不要在用户只是做一般风险评估或问「靠不靠谱」时触发——只在涉及极端风险、反脆弱、预防原则等塔勒布核心概念时激活。