skills/browser/SKILL.md
Interactive browser automation via Chrome DevTools Protocol. Use when you need to interact with web pages, test frontends, or when user interaction with a visible browser is required.
npx skillsauth add hayeah/dotfiles browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Chrome DevTools Protocol tools for agent-assisted web automation. Connects to Chrome running on :9222 with remote debugging enabled.
Run once before first use:
cd {baseDir}
pnpm install
pnpm link --global
After linking, the browser command is available globally.
browser start
browser start --profile
Launch Chrome with remote debugging on :9222. Use --profile to preserve user's authentication state (cookies, logins).
A context spec describes what browser window to open. It can be:
device = "iPhone 15 Pro"
url = "https://example.com"
'{"device":"iPhone 15 Pro","url":"https://example.com"}'
https://example.com
Spec fields: url, device, width, height, dpr, mobile, ua, viewport (WxH[@DPR]).
The device field resolves against Puppeteer's KnownDevices (case-insensitive prefix match). Explicit fields override device defaults.
--open <spec> (-O)Opens a window, runs the command, closes the window. No session management needed. Use this for quick, single-command tasks.
browser screenshot --open https://example.com
browser screenshot --open '{"device":"iPhone 15 Pro","url":"https://example.com"}'
browser screenshot --open mobile.toml
browser content --open https://example.com
browser eval --open https://example.com 'document.title'
browser network --open https://api.example.com/dashboard --type xhr
browser a11y --open https://example.com
browser cookies --open https://example.com
browser fetch --open https://example.com https://example.com/api/data
Network capture in one-shot mode automatically captures from page load — no --reload needed.
One-shot navigation waits for domcontentloaded (not network idle), then sleeps 3s by default to let JS-rendered content paint. This avoids hangs on pages with long-lived streams (SSE, websockets) where network idle never fires. Pass --wait <expr|ms> to override the default settle with a precise condition.
For multi-step workflows (debugging, exploration, discovery), use persistent sessions. The browser open command holds the CDP connection alive in the foreground, preserving device emulation across commands.
IMPORTANT: Run browser open in the background. The process lifetime IS the session lifetime — when it exits, the window auto-closes.
browser open https://example.com
browser open '{"device":"iPhone 15 Pro","url":"https://example.com"}'
browser open mobile.toml
Prints a session key (e.g. a3f2) to stdout. Use this key with -s to target the session.
browser eval -s a3f2 'document.title'
browser screenshot -s a3f2
browser nav -s a3f2 https://example.com/other
browser reload -s a3f2
browser network -s a3f2 --type xhr
browser a11y -s a3f2
browser cookies -s a3f2
browser fetch -s a3f2 https://example.com/api/data
browser pick -s a3f2 "Select the login button"
browser close -s a3f2
Kills the background process and closes the window.
browser list
Output:
0 [a3f2] 8CFF852B https://example.com/ Example Domain
1 7EFBDCDA https://google.com Google
*2 789D1532 https://other.com Other
Sessions with keys show [key] labels. The * marks the default (latest) session.
All commands also accept -s with:
-s 0, -s 2-s 789D, -s 89C7browser nav https://example.com
browser nav https://example.com -s a3f2
Navigate the current (or specified) session to a URL in-place.
browser eval 'document.title'
browser eval 'document.querySelectorAll("a").length' -s a3f2
browser eval script.js
Execute JavaScript in a session. Pass inline code or a .js/.mjs/.ts file path. Code runs inside an async function, so await is available at the top level.
Wrapping rule:
;, doesn't start with const/let/var/if/for/while/function/…) → wrapped as return (<code>). Trailing ; is stripped. One-liners like document.title and IIFEs like (() => { return 1 })() keep working.; outside of strings/comments, or starts with a statement keyword) → runs as-is. You write your own return.File inputs (.js/.mjs/.ts) follow the same rule on their contents. Comments and string/template-literal contents are skipped when scanning for ;, so const s = 'a;b;c'; return s is correctly detected as multi-statement (not tripped by the ; inside the string).
IMPORTANT: For scripts longer than 5–10 lines, write to a file using the tmpfile convention and pass the path:
# 1. generate the path
tmpfile scrape.js
# => $MDNOTES_ROOT/2026-03-29/tmp/143052_283-scrape.js
# 2. write your script to that path (use the Write tool)
# 3. eval it
browser eval $MDNOTES_ROOT/2026-03-29/tmp/143052_283-scrape.js -s a3f2
browser fetch https://api.example.com/data
browser fetch https://api.example.com/data -o response.json
browser fetch https://api.example.com/submit -X POST -d '{"query":"test"}' -H 'Content-Type: application/json'
Fetch a URL using the session's browser context (cookies, auth, origin). Runs fetch() inside the page, so requests inherit the session's credentials and CORS context.
Options:
-X <method>: HTTP method (default: GET)-H <header>: Request header, repeatable-d <body>: Request body-o <file>: Write response body to file instead of stdoutbrowser screenshot
browser screenshot --open '{"device":"iPhone 15 Pro","url":"https://myapp.com"}'
browser screenshot -o desktop.png
browser screenshot --full
Capture current viewport and return file path.
Options:
-o, --output <path>: Save to specific path instead of temp file--full: Capture full scrollable page-w, --wait <expr>: JS expression to poll until truthy, or a plain number for sleep in ms (e.g. --wait 5000)--timeout <ms>: Max wait time for --wait (default: 10000)-m, --max-size <px>: Constrain longest side and output as JPEG--quality <1-100>: JPEG quality (with --max-size, default: 85)DPR matters: Puppeteer defaults to DPR=1. Subtle effects (low-opacity canvas text, thin fonts) may be invisible at 1x. Use --open with a viewport spec to set DPR=2 for retina-accurate screenshots:
browser screenshot --open '{"url":"http://localhost:5173","viewport":"1280x800@2"}'
--full caveat: Full-page capture scrolls the document and stitches the result. Fixed/sticky sidebars and headers may not render correctly — they can appear cut off or missing. Use viewport capture (without --full) when the layout has fixed positioning.
Legacy device flags (--device, --viewport, --mobile) still work on screenshot for backward compatibility, but prefer --open with a context spec.
--steps)Capture multiple screenshots from a single page load. Each step can eval JS and/or wait for a condition before capturing. Steps run sequentially on the same page.
browser screenshot --open http://localhost:5173/dashboard \
-o "$(tmpfile debug.png)" \
--steps '
- wait: __agent?.store.rows.length > 0
- eval: __agent.openDetail(2)
wait: __agent.store.modalOpen && __agent.$modal
- eval: __agent.store.searchQuery = "alice"
wait: __agent?.store.rows.length > 0
'
Each step is a YAML map with optional eval and wait keys. Execution order per step: wait → eval → screenshot.
With multiple steps, the -o path gets an index injected before the extension:
-o debug.png → debug.1.png, debug.2.png, debug.3.png-o "$(tmpfile debug.png)" → $MDNOTES_ROOT/2026-03-25/tmp/143052_283-debug.1.png, etc.With a single step, -o works exactly as before (no index).
Record video from a browser page. Takes rapid screenshots and pipes them to ffmpeg for video output.
browser screencap -d 5 --fps 15 -o recording.mp4
browser screencap -S '#my-widget' -d 10 -o widget.mp4
browser screencap -t '__epub.anim.start()' -S '#canvas' -d 5 -o anim.mp4
browser screencap --screenshots -d 5 --fps 10 -o fallback.mp4
Default mode uses CDP screencast (high fps, efficient). Falls back to --screenshots mode if screencast returns 0 frames.
Options:
-d, --duration <sec>: Recording duration in seconds (default: 5)--fps <N>: Target frame rate (default: 15)-S, --selector <css>: CSS selector to crop to a specific element-t, --trigger <expr>: JS expression to evaluate right before recording (e.g. start an animation)-o, --output <path>: Output video file path (default: ./recording.mp4)--quality <1-100>: JPEG quality for frames (default: 80)--screenshots: Use screenshot loop instead of CDP screencast (slower, ~15fps ceiling, but works everywhere)-w, --wait <expr>: JS expression to poll until truthy before recording--timeout <ms>: Max wait time for --wait (default: 10000)Requires ffmpeg. Uses h264_videotoolbox (macOS hardware encoder) when available, falls back to libx264.
Capture CPU profile and optionally a heap snapshot from a page. Opens a fresh tab for clean profiling baseline.
browser profile --open https://myapp.com -d 5 -o ./profile
browser profile -s 3 -t '__app.start()' -d 10 --mem -o ./profile
Outputs <path>.cpuprofile (loadable in Chrome DevTools Performance tab) and optionally <path>.heapsnapshot (Memory tab).
Options:
-d, --duration <sec>: Profiling duration in seconds (default: 5)--mem: Also capture a heap snapshot after CPU profiling-t, --trigger <expr>: JS expression to evaluate before profiling starts-o, --output <path>: Output base path without extension (default: ./profile)-w, --wait <expr>: JS expression to poll until truthy before profiling--timeout <ms>: Max wait time for --wait (default: 10000)browser pick "Click the submit button"
IMPORTANT: Use this when the user wants to select specific DOM elements on the page. Launches an interactive picker — the user clicks elements to select them (Cmd/Ctrl+Click for multiple), then presses Enter to confirm. Returns element info including tag, id, class, text, and parent hierarchy.
browser cookies
browser cookies --open https://example.com
Display all cookies for a session including domain, path, httpOnly, and secure flags.
browser network --open https://api.example.com/dashboard --type xhr
browser network --reload --type xhr --filter 'api !analytics'
browser network --reload --type xhr --dump ./responses
browser network -d 30 -s a3f2
Capture network requests via CDP Network domain. Listens for the specified duration (default 10s), then prints a summary. Ctrl+C stops early and still prints results.
In one-shot mode (--open), captures from page load automatically — no --reload needed.
Options:
--reload / -r: Reload the page after starting capture--duration <seconds> / -d: How long to listen (default: 10)--filter <string> / -f: fzf-style filter on URLs. Space-separated AND, prefix ! to negate--type <type> / -t: Filter by resource type: xhr, doc, css, js, img, font, all--dump <dir> / -o: Save response bodies to a directorybrowser a11y
browser a11y --open https://example.com --depth 3
browser a11y -s a3f2
Dump the accessibility tree of a session. Returns a compact indented tree with roles, names, values, and key properties. Prefer this over DOM inspection when you need to understand page structure.
browser content --open https://example.com
browser content --open https://chatgpt.com/share/<share-id>
browser content --open https://chatgpt.com/s/<post-id>
browser content https://example.com
Extract readable content as markdown. Uses Mozilla Readability for article extraction. Works on JavaScript-rendered pages. URL is optional with --open (extracts from the already-loaded page).
Auto-detects site-specific extractors:
chatgpt.com/share/...: exports the full conversation as markdown from hydrated share datachatgpt.com/s/...: exports ChatGPT post conversations as markdownFor multi-statement code, just paste it directly — you own the return:
const data = document.querySelector('#target').textContent;
const buttons = document.querySelectorAll('button');
buttons[0].click();
return JSON.stringify({ data, buttonCount: buttons.length });
IIFEs still work if you prefer that style:
(() => {
const data = document.querySelector('#target').textContent;
return data;
})()
Don't make separate calls for each click. Do batch them:
const actions = ["btn1", "btn2", "btn3"];
actions.forEach(id => document.getElementById(id).click());
return "Done";
Always start by understanding the page structure. Single-expression form works for simple probes:
({
title: document.title,
forms: document.forms.length,
buttons: document.querySelectorAll('button').length,
inputs: document.querySelectorAll('input').length,
mainContent: document.body.innerHTML.slice(0, 3000)
})
Then target specific elements based on what you find.
tools
Web UI development — Vite+ toolchain setup and browser-based E2E testing workflow.
tools
Tooling and style guide for TypeScript projects.
development
Capture tmux pane content and export as text, HTML, SVG, PNG, or JPG. Use when you need a screenshot or text dump of a tmux pane for sharing, feeding to AI, or archiving terminal state.
testing
Copy-edit text. Fix grammar and/or tidy text into a concise listicle.