skills/chrome-control/SKILL.md
Control Chrome browser via CLI for browser automation tasks. Use when you need to interact with web pages, take screenshots, click elements, or automate browser workflows. ALWAYS use this skill when controlling Chrome - NEVER use osascript, cliclick, or AppleScript.
npx skillsauth add svenflow/dispatch chrome-controlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Control Chrome browser from the command line via native messaging extension.
Before using chrome-control for web scraping, try the /scraping skill first!
# Try this first (faster, no browser needed):
~/.claude/skills/webfetch/scripts/webfetch "https://example.com"
The scraping skill uses scrapling which is:
Only use chrome-control when:
CRITICAL: ALWAYS use this CLI tool for Chrome automation. NEVER use osascript, cliclick, AppleScript, or any other method to interact with Chrome. This extension provides reliable, precise control.
ALWAYS close tabs you created when you're done with a task. Leaving tabs open causes memory leaks that accumulate over time and degrade system performance.
Rules:
chrome openchrome close <tab_id> after completing your taskExample workflow:
# Open tab and note the ID
chrome open "https://example.com"
# Output: Opened tab 123456
# ... do your work ...
# ALWAYS clean up when done
chrome close 123456
Why this matters: Chrome tabs consume significant memory. An assistant that opens tabs without closing them will cause the system to slow down and eventually crash. This is non-negotiable.
Requires Chrome with the Chrome Control extension loaded and native messaging host running.
The extension supports multiple Chrome profiles simultaneously. Each profile is identified by a unique UUID stored in chrome.storage.local.
chrome profiles
# Output:
# 0: assistant (see config.local.yaml chrome.profiles.0)
# 1: owner (see config.local.yaml chrome.profiles.1)
Use -p or --profile flag with index or name:
chrome -p 0 tabs # Profile by index
chrome -p assistant tabs # Profile by name
chrome -p 1 screenshot 123 # Different profile
If no profile specified, uses first available profile (index 0).
Downloads are automatically saved to profile-specific directories:
| Profile | Download Location |
|---------|-------------------|
| assistant (profile 0) | ~/Downloads/assistant/ |
| owner-profile (profile 1) | ~/Downloads/owner-profile/ |
CRITICAL: Different profiles have different permission levels.
This is YOUR account (Claude's). You have full autonomy to:
EXCEPTION: Payments require explicit permission from the admin via text before proceeding.
This is the owner's personal account. Requires explicit consent for ANY changes:
Rules:
~/.claude/skills/chrome-control/scripts/chrome
# List all open tabs (shows tab_id, title, url)
chrome tabs
# Open new tab with URL
chrome open <url>
chrome open chess.com
# Close a tab
chrome close <tab_id>
# Focus/activate a tab
chrome focus <tab_id>
# Navigate tab to URL (or 'back'/'forward')
chrome navigate <tab_id> <url>
chrome nav <tab_id> back
# Read interactive elements (buttons, links, inputs)
# Returns: ref_id, role, tag, label
chrome read <tab_id>
chrome read <tab_id> all # all elements
chrome read <tab_id> forms # form elements only
chrome read <tab_id> links # links only
# Get page text content
chrome text <tab_id>
# Get page HTML
chrome html <tab_id>
# Find elements containing text
chrome find <tab_id> <query>
chrome find 123456 "Sign In"
# Click element by ref (e.g., ref_1, ref_23)
chrome click <tab_id> <ref>
chrome click 123456 ref_5
# Click at screen coordinates
chrome click-at <tab_id> <x> <y>
# Type text into element
chrome type <tab_id> <ref> <text>
chrome type 123456 ref_3 "hello world"
# Set form input value
chrome input <tab_id> <ref> <value>
# Send key press
chrome key <tab_id> <key> [modifiers]
chrome key 123456 Enter
chrome key 123456 a ctrl # Ctrl+A
chrome key 123456 c ctrl,meta # Cmd+Ctrl+C
# Keys: Enter, Tab, Escape, Backspace, Delete, ArrowUp, ArrowDown, ArrowLeft, ArrowRight
# Modifiers: ctrl, alt, shift, meta (comma-separated)
# Scroll the page
chrome scroll <tab_id> <direction> [amount]
chrome scroll 123456 down
chrome scroll 123456 up 5
# Hover at coordinates
chrome hover <tab_id> <x> <y>
# Take screenshot (saves to ~/Pictures/chrome-screenshots/)
chrome screenshot <tab_id>
chrome shot <tab_id>
# Returns: ~/Pictures/chrome-screenshots/screenshot_20260124_123456.jpg
# Execute JavaScript in page
chrome js <tab_id> <code>
chrome js 123456 "document.title"
chrome js 123456 "document.querySelector('button').click()"
# Read console messages
chrome console <tab_id>
chrome console <tab_id> error # filter by pattern
chrome console <tab_id> --clear # clear after reading
# Read network requests
chrome network <tab_id>
chrome network <tab_id> api.example.com # filter by URL pattern
# Test connection to extension
chrome ping
# 1. List tabs to get tab_id
chrome tabs
# Output: 123456 Google - www.google.com
# 2. Read interactive elements
chrome read 123456
# Output: ref_1 textbox input Search
# ref_2 button button Google Search
# 3. Type into search box
chrome type 123456 ref_1 "chess strategy"
# 4. Click search button
chrome click 123456 ref_2
# 5. Take screenshot of results
chrome screenshot 123456
# Output: ~/Pictures/chrome-screenshots/screenshot_20260124_123456.jpg
# 6. ALWAYS clean up tabs you created
chrome close 123456
Some sites have strict Content Security Policy (CSP) that blocks normal JS injection:
eval() via Trusted Types)Good news: The text, html, iframe-click, and insert-text commands all use Chrome Debugger API with Page.createIsolatedWorld to bypass CSP restrictions automatically.
All these commands work on CSP-protected pages:
# Text/HTML extraction (bypasses CSP automatically)
chrome text <tab_id> # Get page text - works on discord.com, etc.
chrome html <tab_id> # Get page HTML - works on CSP-protected sites
# Click element (works on main frame OR iframes)
chrome iframe-click <tab_id> '<css-selector>'
chrome iframe-click 123456 'input[type="password"]'
chrome iframe-click 123456 'button#sign-in'
chrome iframe-click 123456 'text:Desktop client 1' # Click by text content
# Insert text at current focus
chrome insert-text <tab_id> '<text>'
chrome insert-text 123456 'mypassword123'
When to use iframe-click: If normal chrome click fails with CSP/Trusted Types errors, use iframe-click instead.
# 1. Navigate to App Store Connect (or any Apple sign-in page)
chrome navigate 123456 "https://appstoreconnect.apple.com"
# 2. Click email field and insert email
chrome iframe-click 123456 'input[type="text"]'
chrome insert-text 123456 '[email protected]'
# 3. Click Continue button
chrome iframe-click 123456 'button#sign-in'
# 4. Click "Continue with Password" (Apple shows password vs passkey options)
chrome iframe-click 123456 'text:Continue with Password'
# 5. Insert password (field is auto-focused after step 4)
chrome insert-text 123456 'yourpassword'
# 6. Click Sign In button
chrome iframe-click 123456 'button#sign-in'
# 7. Handle 2FA if needed (code sent to trusted devices)
# Click first input box to focus it
chrome iframe-click 123456 'input[type="text"]'
# Insert the 6-digit code
chrome insert-text 123456 '123456'
The iframe-click command supports text:XXX selectors to find buttons by their visible text:
# Click by exact or partial text match (case-insensitive)
chrome iframe-click 123456 'text:Continue with Password'
chrome iframe-click 123456 'text:Sign In'
chrome iframe-click 123456 'text:Resend'
This is useful when buttons don't have stable IDs or CSS classes.
How it works: The iframe-click command uses Page.createIsolatedWorld with grantUniversalAccess:true to execute JS inside cross-origin iframes, bypassing CSP restrictions. It dispatches a full mouse event sequence (mouseenter → mouseover → mousemove → mousedown → mouseup → click) which is required for modern web frameworks that listen for the complete event chain.
Google's OAuth sign-in pages have aggressive bot detection that blocks normal chrome click commands. The iframe-click command bypasses this protection.
This works for any site using Google OAuth: ElevenLabs, Figma, Notion, etc.
# 1. Navigate to site that uses Google OAuth
chrome open "https://elevenlabs.io/app/sign-up"
# Get tab_id from output
# 2. Click "Sign in with Google" (or similar button)
chrome iframe-click <tab_id> 'text:Google'
# 3. On Google sign-in page, email may already be filled
# If not, click email field and insert
chrome iframe-click <tab_id> 'input[type="email"]'
chrome insert-text <tab_id> '[email protected]'
# 4. Click Next button - THIS IS THE KEY STEP
# Regular chrome click FAILS here, iframe-click WORKS
chrome iframe-click <tab_id> 'text:Next'
# 5. Wait for password page to load, then enter password
chrome iframe-click <tab_id> 'input[type="password"]'
chrome insert-text <tab_id> 'yourpassword'
# 6. Click Next to submit password
chrome iframe-click <tab_id> 'text:Next'
# 7. Handle 2FA if required (varies by account settings)
Why this works: Google's bot detection looks for synthetic click events that lack the full mouse event sequence. The iframe-click command dispatches the complete sequence (mouseenter → mouseover → mousemove → mousedown → mouseup → click) that real user clicks generate, bypassing the detection.
Note: You still need valid credentials. Get passwords from keychain:
security find-generic-password -s "service-name" -w
security find-generic-password -a "[email protected]" -w
To reload the Chrome Control extension after making changes:
echo '{"command": "_reload_extension"}' | nc -U /tmp/chrome_control_*.sock
debugger_eval (Direct Socket)When sites block both chrome js (uses chrome.scripting.executeScript with eval() in MAIN world, blocked by CSP) and iframe-click doesn't find the element, you can use the debugger_eval command directly via the Unix socket. This uses CDP Runtime.evaluate in an isolated world with universal access, completely bypassing CSP.
When to use: Sites like Bandcamp, PayPal, or any site where:
chrome js fails with CSP error ("unsafe-eval not allowed")chrome iframe-click can't find the target elementchrome click-by-name clicks the wrong element (e.g., a text node instead of the button)How to call it (Python):
import json, socket
SOCK_PATH = "/tmp/chrome_control_PROFILE_ID.sock" # Find via: ls /tmp/chrome_control_*.sock
def debugger_eval(tab_id, code):
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
sock.connect(SOCK_PATH)
sock.settimeout(15)
msg = json.dumps({
"command": "debugger_eval",
"params": {"tabId": tab_id, "code": code}
}) + "\n"
sock.sendall(msg.encode())
data = b""
while b"\n" not in data:
data += sock.recv(65536)
result = json.loads(data.decode().strip())
sock.close()
return result.get("result", {})
# Example: Click a button on a CSP-protected page
r = debugger_eval(TAB_ID, """
var btn = document.querySelector('button.buy-link');
if (btn) { btn.click(); return 'clicked'; }
return 'not found';
""")
print(r) # {'success': True, 'result': 'clicked'}
Key details:
window.Cart) are NOT accessiblereturn statements work - the code is wrapped in a function{}. Use XMLHttpRequest (sync mode) instead of fetch for HTTP calls/tmp/chrome_control_*.sock - find it with ls /tmp/chrome_control_*.socknavigate, screenshot, and other commands through the same socketExample: Bandcamp add-to-cart (CSP blocks all normal JS):
# 1. Navigate to track page
send_command("navigate", tabId=TAB, url="https://artist.bandcamp.com/track/name")
time.sleep(4)
# 2. Click "Buy Digital Track" button
debugger_eval(TAB, "document.querySelector('button.download-link.buy-link').click(); return 'ok';")
time.sleep(1.5)
# 3. Set price in dialog
debugger_eval(TAB, """
var pi = document.querySelector('#userPrice');
pi.value = '1';
pi.dispatchEvent(new Event('input', {bubbles: true}));
pi.dispatchEvent(new Event('change', {bubbles: true}));
return 'price set';
""")
# 4. Click "Add to cart"
debugger_eval(TAB, """
var btns = document.querySelectorAll('button');
for (var i = 0; i < btns.length; i++) {
if (btns[i].textContent.trim() === 'Add to cart') {
btns[i].click(); return 'added';
}
}
return 'not found';
""")
iframe-type double-typing bug (fixed): The iframeType function previously sent a keyDown event with text properties AND a separate char event, both of which caused character insertion. This resulted in every character being typed twice. Fixed by changing keyDown to rawKeyDown and removing text/unmodifiedText from it, so only the char event inserts text.insert-text for cross-origin payment iframes: When typing into cross-origin iframes (e.g., payment forms), use chrome insert-text instead of iframe-type. The insert-text command uses Input.insertText which is more reliable for these contexts.The native messaging protocol has an ~8KB message size limit, which means there is no built-in file upload command.
Workarounds:
Chunked base64 upload (JS injection): Encode the file as base64, split into <5KB chunks, and send each chunk via chrome js to reconstruct the file in the page's JS context. This works but is slow and token-heavy.
axctl drag-and-drop: Use the accessibility automation tool to simulate a file drop onto file input elements:
~/.claude/skills/axctl/scripts/axctl ...
Direct form submission: If the site has a standard <input type="file">, you may be able to set its value via Chrome Debugger protocol or use chrome js to create a DataTransfer object.
Recommendation: For simple file uploads, try axctl first. For complex multi-file uploads, use the chunked base64 approach. Both are significantly more work than a simple command — plan accordingly.
chrome tabschrome read~/Pictures/chrome-screenshots/ with timestampsSome sites (Amazon, banks, payment processors) use cross-origin secure iframes (like Amazon's apx-secure-iframe) that block all standard automation: JS injection, CDP mouse events, chrome click, iframe-click, and cliclick. These require a combination of tools to interact with.
Use the accessibility automation tool axctl to type into text fields inside cross-origin secure iframes where all other methods fail:
~/.claude/skills/axctl/scripts/axctl type "Google Chrome" --title "Field Name" "value"
Example for Amazon payment fields:
axctl type "Google Chrome" --title "Card number" "4111111111111111"
axctl type "Google Chrome" --title "Expiration date" "12/28"
Regular JS, Chrome key commands, insert-text, and cliclick all fail because the secure iframe blocks them. axctl works because it operates at the macOS accessibility layer, bypassing browser security entirely.
<select> DropdownsSecure iframes often render native <select> dropdowns that can't be changed via JS. Use axctl to find the dropdown's screen position, then cliclick to interact:
# Get the dropdown's position via accessibility
axctl get "Google Chrome" --title "Dropdown Label" AXPosition
# Click to open the native popup
cliclick c:<x>,<y>
# Click on the desired option at calculated coordinates
cliclick c:<x>,<option_y>
Coordinate mapping from Chrome screenshot pixels to screen points:
x_screen = x_screenshot * 1.2
y_screen = y_screenshot * 1.2 + 139
The + 139 offset accounts for Chrome's toolbar height (title bar + tab bar + address bar). Adjust if your toolbar configuration differs.
When buttons inside secure iframes can't be clicked by any method (JS, CDP mouse events, axctl AXPress, cliclick), use keyboard navigation via CDP:
# Tab repeatedly to move focus into the iframe and onto the target button
chrome key <tab_id> "Tab"
chrome key <tab_id> "Tab"
chrome key <tab_id> "Tab"
# ... keep tabbing until the button is focused
# Activate the focused button
chrome key <tab_id> "Space"
Why this works: CDP Input.dispatchKeyEvent for Tab and Space crosses iframe boundaries, unlike mouse events which get blocked by cross-origin restrictions. Take screenshots between tabs to verify focus position.
The chrome click-at command dispatches CDP Input.dispatchMouseEvent, which takes CSS pixel viewport coordinates (not screenshot pixels):
# Get actual viewport dimensions
chrome js <tab_id> "JSON.stringify({w: window.innerWidth, h: window.innerHeight})"
# click-at uses these CSS viewport coordinates
chrome click-at <tab_id> <css_x> <css_y>
Important: Screenshot resolution often differs from viewport dimensions (e.g., a 1600px-wide screenshot may represent a 1920px-wide viewport on Retina displays). Always check window.innerWidth/innerHeight to understand the coordinate space.
The iframe-click command has been updated to detect Amazon payment iframes (apx-secure-iframe) and fall back to the first child frame if no known iframe pattern matches. This means iframe-click may work for some elements in payment iframes, but for text input and dropdowns, use axctl and cliclick as described above.
development
Use when building React/Next.js components, dashboards, admin panels, apps, or any web interface. Trigger words - react, frontend, ui, dashboard, component, interface, web app, polish, audit, design review.
tools
Track flight status and get FlightAware links. Use when asked about flights, flight status, arrival times, or flight tracking. Trigger words - flight, flying, UA, AA, DL, landing, arriving, departure.
development
Query real-time locations of people sharing via Find My. Look up where someone is, reverse geocode GPS coordinates, set up geofence alerts. Trigger words - findmy, find my, location, where is, geofence, track location.
tools
Access Figma designs via MCP or Chrome. Use when asked about Figma files, design mockups, wireframes, or UI designs. Trigger words - figma, design, mockup, wireframe, UI design, FigJam.