skills/browser/SKILL.md
Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.
npx skillsauth add infquest/vibe-ops-plugin browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Browser automation that maintains page state across command executions. Write small, focused commands to accomplish tasks incrementally.
snapshot to discover elements, then select-ref to interactscreenshot to see current page state# Check browser server running (Max must be open)
curl -s http://localhost:9222/ | head -1 || echo "SERVER_NOT_RUNNING"
All commands use client.py from the skill directory:
uv run skills/browser/client.py <command> [arguments]
⚠️ IMPORTANT: Always use
uv run client.py, NOTuv run python client.py. Theuv runcommand automatically handles Python and dependencies frompyproject.toml. Addingpythonbreaks dependency resolution.
Follow this pattern for complex tasks:
Code passed to page.evaluate() runs in the browser, which doesn't understand TypeScript:
// ✅ Correct: plain JavaScript
const text = await page.evaluate(() => {
return document.body.innerText;
});
// ❌ Wrong: TypeScript syntax will fail at runtime
const text = await page.evaluate(() => {
const el: HTMLElement = document.body; // Type annotation breaks in browser!
return el.innerText;
});
uv run skills/browser/client.py wait-load main # After navigation
uv run skills/browser/client.py wait-selector main ".results" # For specific elements
uv run skills/browser/client.py wait-url main "**/success" # For specific URL
For large datasets, intercept and replay API requests rather than scrolling DOM. See refs/scraping.md for the complete guide covering request capture, schema discovery, and paginated API replay.
uv run skills/browser/client.py screenshot main screenshot.png
uv run skills/browser/client.py screenshot main full.png --full-page # Capture entire scrollable page
Use snapshot to discover page elements. Returns YAML-formatted accessibility tree:
- banner:
- link "Hacker News" [ref=e1]
- navigation:
- link "new" [ref=e2]
- main:
- heading "Products" [ref=e3] [level=1]
- list:
- listitem:
- link "Article Title" [ref=e4]
- button "Add to Cart" [ref=e5]
- listitem:
- link "Another Article" [ref=e6]
- button "Add to Cart" [ref=e7] [nth=1]
- contentinfo:
- textbox [ref=e8]
- /placeholder: "Search"
Interpreting refs:
[ref=eN] - Element reference for interaction[nth=N] - Nth duplicate element with same role+name (0-indexed, first one omitted)[checked], [disabled], [expanded] - Element states[level=N] - Heading level/url:, /placeholder: - Element propertiesInteracting with refs:
# Get snapshot to find refs
uv run skills/browser/client.py snapshot main
# Only show interactive elements (buttons, links, inputs, etc.)
uv run skills/browser/client.py snapshot main -i
# Use ref to interact
uv run skills/browser/client.py select-ref main e2 click
uv run skills/browser/client.py select-ref main e7 click # Click second "Add to Cart"
uv run skills/browser/client.py select-ref main e8 fill "search term"
Page state persists after failures. Debug with:
# Take screenshot to see current state
uv run skills/browser/client.py screenshot main debug.png
# Get page info
uv run skills/browser/client.py info main
# Get text content
uv run skills/browser/client.py text main "body"
uv run skills/browser/client.py list # List all pages
uv run skills/browser/client.py create main # Create a new page
uv run skills/browser/client.py create main "https://..." # Create and navigate
uv run skills/browser/client.py goto main "https://..." # Navigate existing page
uv run skills/browser/client.py close main # Close a page
uv run skills/browser/client.py info main # Get page URL and title
uv run skills/browser/client.py click main "button.submit" # Click element
uv run skills/browser/client.py fill main "input#email" "[email protected]" # Fill input
uv run skills/browser/client.py hover main ".dropdown" # Hover over element
uv run skills/browser/client.py keyboard main "Enter" # Press key
uv run skills/browser/client.py text main "h1" # Get element text
uv run skills/browser/client.py evaluate main "document.title"
uv run skills/browser/client.py evaluate main "document.querySelectorAll('.item').length"
For complex tasks requiring loops or page.on() event handlers, use heredoc with BrowserClient:
cd skills/browser && uv run python <<'EOF'
from client import BrowserClient
client = BrowserClient()
page = client.get_playwright_page("main")
# Full Playwright API available
page.goto("https://example.com")
page.click("button")
# Event handlers for request interception
page.on("response", lambda r: print(r.url))
EOF
The page object is a standard Playwright Page.
content-media
使用 yt-dlp 下载 YouTube 视频、音频或字幕。Use when user wants to 下载视频, 下载YouTube, youtube下载, 下载油管, download youtube, download video, 下载B站, bilibili下载.
tools
裁剪视频片段,支持压缩、音频控制等选项。Use when user wants to 剪辑视频, 裁剪视频, 截取视频, 视频剪切, 切视频, trim video, cut video, clip video, extract video segment.
data-ai
使用 AI 生成视频,支持 Veo/Sora 模型。Use when user wants to 生成视频, AI视频, 文生视频, 图生视频, generate video, create video, text to video, image to video, 做一个视频.
content-media
合并多个视频文件为一个视频。Use when user wants to 合并视频, 拼接视频, 视频合并, 视频拼接, 把视频合在一起, 连接视频, join videos, merge videos, combine videos, concatenate videos.