Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

hhu3637kr/agent-browser

Name: agent-browser
Author: hhu3637kr

agent-browser/SKILL.md

npx skillsauth add hhu3637kr/skills agent-browser

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent Browser Automation

Guide for using agent-browser CLI to automate web browsing tasks in Claude Code.

Quick Start

Installation Check

Before using agent-browser, verify installation:

# Check if installed
agent-browser --version

# If not installed, install globally
npm install -g agent-browser
agent-browser install  # Download Chromium

Windows Note: If you encounter /bin/sh errors on Windows, use:

npx agent-browser <command>

See troubleshooting.md for platform-specific issues.

Core Workflow

agent-browser uses a refs-based system where page elements get unique identifiers (like @e1, @e2) that you can use for interactions.

Basic Pattern

Open a page
Get snapshot with refs
Interact using refs
Repeat as needed

# 1. Navigate to page
agent-browser open example.com

# 2. Get page snapshot with interactive elements
agent-browser snapshot -i --json

# 3. Use refs from snapshot to interact
agent-browser click @e5
agent-browser fill @e3 "search query"

# 4. Take screenshot or get results
agent-browser screenshot result.png

Essential Commands

Navigation

agent-browser open <url>           # Open URL
agent-browser goto <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page

Getting Page Information

agent-browser snapshot             # Get accessibility tree
agent-browser snapshot -i          # Interactive elements only
agent-browser snapshot -i --json   # JSON format (best for AI)
agent-browser screenshot <file>    # Take screenshot
agent-browser get text @e1         # Get element text
agent-browser get html             # Get page HTML
agent-browser get url              # Get current URL

Interacting with Elements

agent-browser click @e2            # Click element by ref
agent-browser dblclick @e2         # Double click
agent-browser fill @e3 "text"      # Fill input field
agent-browser type @e3 "text"      # Type text (slower, more realistic)
agent-browser press Enter          # Press keyboard key
agent-browser check @e4            # Check checkbox
agent-browser select @e5 "option"  # Select dropdown option
agent-browser upload @e6 file.pdf  # Upload file

Semantic Locators (Find Commands)

When you don't have refs, use semantic locators:

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "[email protected]"
agent-browser find placeholder "Search..." type "query"

Waiting

agent-browser wait @e1             # Wait for element
agent-browser wait --text "Done"   # Wait for text
agent-browser wait --url /success  # Wait for URL change
agent-browser wait --load          # Wait for page load

Session Management

Use sessions to run multiple isolated browser instances:

# Start different sessions
agent-browser --session task1 open site-a.com
agent-browser --session task2 open site-b.com

# Each session has separate:
# - Cookies and storage
# - Authentication state
# - Navigation history

# List active sessions
agent-browser session list

# Close specific session
agent-browser --session task1 close

AI Agent Workflow

For AI-driven automation, follow this pattern:

Navigate and snapshot

agent-browser open https://example.com
agent-browser snapshot -i --json > page.json

Parse JSON to understand page structure
- Identify interactive elements and their refs
- Understand page layout and available actions
Execute actions using refs

agent-browser click @e2
agent-browser fill @e5 "input data"

Get new snapshot after page changes

agent-browser snapshot -i --json > updated.json

Repeat until task complete

See workflows.md for detailed AI workflow patterns.

Advanced Features

Network Interception

# Block requests
agent-browser route --block "*.ads.com/*"

# Mock responses
agent-browser route --mock "/api/data" response.json

State Persistence

# Save authentication state
agent-browser save-state auth.json

# Load state in new session
agent-browser load-state auth.json

Debugging

# Enable console logs
agent-browser --console open example.com

# Highlight elements
agent-browser highlight @e3

# Enable tracing
agent-browser --trace trace.zip open example.com

Best Practices

Use -i --json for snapshots - Reduces noise, easier for AI to parse
Prefer refs over selectors - More reliable than CSS/XPath
Use sessions for parallel tasks - Isolate different workflows
Wait for elements - Use wait commands to handle dynamic content
Take screenshots - Visual confirmation of state
Use semantic locators as fallback - When refs aren't available

Common Patterns

Form Filling

agent-browser open https://form.example.com
agent-browser snapshot -i --json
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "[email protected]"
agent-browser click @e3  # Submit button
agent-browser wait --url /success

Data Extraction

agent-browser open https://data.example.com
agent-browser snapshot -i --json > structure.json
agent-browser get text @e5 > data.txt
agent-browser screenshot evidence.png

Multi-Step Workflow

# Login
agent-browser open https://app.example.com/login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3

# Navigate to target
agent-browser wait --url /dashboard
agent-browser goto https://app.example.com/data

# Extract data
agent-browser snapshot -i --json > results.json

Platform-Specific Notes

Windows

Use npx agent-browser if global command fails
PowerShell may require quotes around URLs with special characters
See troubleshooting.md for /bin/sh errors

Linux

Install with dependencies: agent-browser install --with-deps
May need to install Playwright system dependencies manually

macOS

Works out of the box after npm install -g agent-browser

Reference Documentation

commands.md - Complete command reference
workflows.md - AI workflow patterns and examples
troubleshooting.md - Common issues and solutions

Architecture

agent-browser is built on Playwright with:

Fast Rust CLI implementation (with Node.js fallback)
Accessibility tree parsing for AI-friendly page representation
Reference system (@e1, @e2) for stable element targeting
Chrome DevTools Protocol (CDP) for persistent sessions

When to Use agent-browser

✅ Use agent-browser when:

Automating web browsing tasks
Scraping data from websites
Filling and submitting forms
Testing web applications
Interacting with dynamic web pages
Need AI-friendly element targeting

❌ Don't use agent-browser when:

Simple HTTP requests suffice (use curl/fetch instead)
API endpoints are available (use API directly)
Task doesn't require browser rendering

hhu3637kr/agent-browser

agent-browser/SKILL.md

Headless browser automation for AI agents using agent-browser CLI. Use when Claude needs to automate web browsing, scrape web data, interact with web pages, fill forms, take screenshots, or perform any browser-based tasks. Supports reference-based element targeting, session management, and semantic locators.

131 stars

tools

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add hhu3637kr/skills agent-browser

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 11:28 AM8.2s5 files scanned

SKILL.md

name:: agent-browser
description:: Headless browser automation for AI agents using agent-browser CLI. Use when Claude needs to automate web browsing, scrape web data, interact with web pages, fill forms, take screenshots, or perform any browser-based tasks. Supports reference-based element targeting, session management, and semantic locators.

Agent Browser Automation

Guide for using agent-browser CLI to automate web browsing tasks in Claude Code.

Quick Start

Installation Check

Before using agent-browser, verify installation:

# Check if installed
agent-browser --version

# If not installed, install globally
npm install -g agent-browser
agent-browser install  # Download Chromium

Windows Note: If you encounter /bin/sh errors on Windows, use:

npx agent-browser <command>

See troubleshooting.md for platform-specific issues.

Core Workflow

agent-browser uses a refs-based system where page elements get unique identifiers (like @e1, @e2) that you can use for interactions.

Basic Pattern

Open a page
Get snapshot with refs
Interact using refs
Repeat as needed

# 1. Navigate to page
agent-browser open example.com

# 2. Get page snapshot with interactive elements
agent-browser snapshot -i --json

# 3. Use refs from snapshot to interact
agent-browser click @e5
agent-browser fill @e3 "search query"

# 4. Take screenshot or get results
agent-browser screenshot result.png

Essential Commands

Navigation

agent-browser open <url>           # Open URL
agent-browser goto <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page

Getting Page Information

agent-browser snapshot             # Get accessibility tree
agent-browser snapshot -i          # Interactive elements only
agent-browser snapshot -i --json   # JSON format (best for AI)
agent-browser screenshot <file>    # Take screenshot
agent-browser get text @e1         # Get element text
agent-browser get html             # Get page HTML
agent-browser get url              # Get current URL

Interacting with Elements

agent-browser click @e2            # Click element by ref
agent-browser dblclick @e2         # Double click
agent-browser fill @e3 "text"      # Fill input field
agent-browser type @e3 "text"      # Type text (slower, more realistic)
agent-browser press Enter          # Press keyboard key
agent-browser check @e4            # Check checkbox
agent-browser select @e5 "option"  # Select dropdown option
agent-browser upload @e6 file.pdf  # Upload file

Semantic Locators (Find Commands)

When you don't have refs, use semantic locators:

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "[email protected]"
agent-browser find placeholder "Search..." type "query"

Waiting

agent-browser wait @e1             # Wait for element
agent-browser wait --text "Done"   # Wait for text
agent-browser wait --url /success  # Wait for URL change
agent-browser wait --load          # Wait for page load

Session Management

Use sessions to run multiple isolated browser instances:

# Start different sessions
agent-browser --session task1 open site-a.com
agent-browser --session task2 open site-b.com

# Each session has separate:
# - Cookies and storage
# - Authentication state
# - Navigation history

# List active sessions
agent-browser session list

# Close specific session
agent-browser --session task1 close

AI Agent Workflow

For AI-driven automation, follow this pattern:

Navigate and snapshot

agent-browser open https://example.com
agent-browser snapshot -i --json > page.json

Parse JSON to understand page structure
- Identify interactive elements and their refs
- Understand page layout and available actions
Execute actions using refs

agent-browser click @e2
agent-browser fill @e5 "input data"

Get new snapshot after page changes

agent-browser snapshot -i --json > updated.json

Repeat until task complete

See workflows.md for detailed AI workflow patterns.

Advanced Features

Network Interception

# Block requests
agent-browser route --block "*.ads.com/*"

# Mock responses
agent-browser route --mock "/api/data" response.json

State Persistence

# Save authentication state
agent-browser save-state auth.json

# Load state in new session
agent-browser load-state auth.json

Debugging

# Enable console logs
agent-browser --console open example.com

# Highlight elements
agent-browser highlight @e3

# Enable tracing
agent-browser --trace trace.zip open example.com

Best Practices

Use -i --json for snapshots - Reduces noise, easier for AI to parse
Prefer refs over selectors - More reliable than CSS/XPath
Use sessions for parallel tasks - Isolate different workflows
Wait for elements - Use wait commands to handle dynamic content
Take screenshots - Visual confirmation of state
Use semantic locators as fallback - When refs aren't available

Common Patterns

Form Filling

agent-browser open https://form.example.com
agent-browser snapshot -i --json
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "[email protected]"
agent-browser click @e3  # Submit button
agent-browser wait --url /success

Data Extraction

agent-browser open https://data.example.com
agent-browser snapshot -i --json > structure.json
agent-browser get text @e5 > data.txt
agent-browser screenshot evidence.png

Multi-Step Workflow

# Login
agent-browser open https://app.example.com/login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3

# Navigate to target
agent-browser wait --url /dashboard
agent-browser goto https://app.example.com/data

# Extract data
agent-browser snapshot -i --json > results.json

Platform-Specific Notes

Windows

Use npx agent-browser if global command fails
PowerShell may require quotes around URLs with special characters
See troubleshooting.md for /bin/sh errors

Linux

Install with dependencies: agent-browser install --with-deps
May need to install Playwright system dependencies manually

macOS

Works out of the box after npm install -g agent-browser

Reference Documentation

commands.md - Complete command reference
workflows.md - AI workflow patterns and examples
troubleshooting.md - Common issues and solutions

Architecture

agent-browser is built on Playwright with:

Fast Rust CLI implementation (with Node.js fallback)
Accessibility tree parsing for AI-friendly page representation
Reference system (@e1, @e2) for stable element targeting
Chrome DevTools Protocol (CDP) for persistent sessions

When to Use agent-browser

✅ Use agent-browser when:

Automating web browsing tasks
Scraping data from websites
Filling and submitting forms
Testing web applications
Interacting with dynamic web pages
Need AI-friendly element targeting

❌ Don't use agent-browser when:

Simple HTTP requests suffice (use curl/fetch instead)
API endpoints are available (use API directly)
Task doesn't require browser rendering

Related Skills

hhu3637kr/zhang-yiming-perspective

tools

VerifiedTrustedCommunity

张一鸣（字节跳动/TikTok创始人）的思维框架与表达方式。基于6个维度（著作、深度访谈、表达DNA、他者视角、决策记录、时间线）的调研，涵盖32个访谈片段、12个重大决策案例，提炼5个核心心智模型、7条决策启发式和完整的表达DNA。用途：作为思维顾问，用张一鸣的视角分析产品、组织、全球化、人才和个人成长问题。当用户提到「用张一鸣的视角」「张一鸣会怎么看」「一鸣的思路」「zhang yiming perspective」时使用。即使用户只是说「帮我用张一鸣的角度想想」「如果是字节会怎么做」「切换到张一鸣」也应触发。即使用户说「字节怎么看」「头条的逻辑」「一鸣怎么选择」「一鸣」也应触发。

131SKILL.mdUpdated Apr 17, 2026

hhu3637kr/zhang-yiming-perspective

hhu3637kr/x-mastery-mentor

documentation

VerifiedTrustedCommunity

$10K/hr级X/Twitter运营导师。基于Nicolas Cole、Dickie Bush、Sahil Bloom、Justin Welsh、 Dan Koe、Alex Hormozi六位顶级创作者的方法论 + X开源算法深度分析 + AI/科技赛道专精策略，提炼6个核心心智模型、10条决策启发式、完整的选题-写作-增长操作手册。通用方法论为底座，AI/科技赛道为专精。当用户提到「X运营」「推特」「Twitter」「怎么写推文」「怎么涨粉」「X策略」「推特选题」「tweet」「thread」「X算法」时使用。即使用户只是说「这条推文怎么写」「帮我想个X内容」「推特增长」「发推」「write a tweet」「X account」「grow on X」也应触发。

131SKILL.mdUpdated Apr 17, 2026

hhu3637kr/x-mastery-mentor

hhu3637kr/trump-perspective

tools

VerifiedTrustedCommunity

唐纳德·特朗普（Donald Trump）的思维框架与行为逻辑。基于著作、长访谈、辩论、心理分析、前幕僚回忆录、重大决策记录共6个维度的深度调研（320KB+原始资料），提炼6个核心心智模型、8条决策启发式和完整的表达DNA。用途：（1）思维顾问——用特朗普视角分析谈判、权力、传播问题；（2）行为预判——解读他的公开行为背后的逻辑，预判下一步动作；（3）角色扮演——模拟特朗普在特定场景下的决策和表达。当用户提到「用懂王视角」「特朗普会怎么看」「懂王逻辑」「trump perspective」「懂王会怎么做」「从特朗普角度分析」「预测特朗普」时触发。

131SKILL.mdUpdated Apr 17, 2026

hhu3637kr/trump-perspective

hhu3637kr/taleb-perspective

tools

VerifiedTrustedCommunity

塔勒布(Nassim Nicholas Taleb)的思维框架与表达方式。基于40+个来源的深度调研，提炼6个核心心智模型、9条决策启发式和完整的表达DNA。用途：作为思维顾问，用塔勒布的视角分析问题、审视决策、质疑主流叙事。当用户提到「用塔勒布的视角」「塔勒布会怎么看」「塔勒布模式」「反脆弱视角」「taleb perspective」时使用。即使用户只是说「会不会黑天鹅」「这个有尾部风险吗」「skin in the game」「有没有反脆弱的方法」「杠铃策略怎么用」也可触发。不要在用户只是做一般风险评估或问「靠不靠谱」时触发——只在涉及极端风险、反脆弱、预防原则等塔勒布核心概念时激活。

131SKILL.mdUpdated Apr 17, 2026

hhu3637kr/taleb-perspective

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/hhu3637kr/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/agent-browser ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

hhu3637kr/skills

131 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT