ov-advanced/skills/cdp/SKILL.md
MUST be invoked before any work involving: Chrome DevTools Protocol, ov eval cdp commands, browser automation, clicking elements, taking screenshots, or OAuth flows inside containers.
npx skillsauth add overthinkos/overthink-plugins cdpInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ov eval cdp commands connect to Chrome DevTools Protocol (CDP) on port 9222 inside running containers. Provides HTTP API operations (open, list, close tabs) and WebSocket CDP operations (click, type, eval, wait, text, html, screenshot) for headless browser automation.
Every ov eval cdp <method> is authorable as a cdp: verb inside a eval: block. The method name becomes the verb's YAML value; method-specific args are sibling fields (tab:, expression:, url:, selector:, etc.). Shared matchers (stdout:, stderr:, exit_status:, artifact_min_bytes:) work like other verbs. See /ov-build:eval for the full method allowlist and YAML shape. Example: - cdp: eval\n tab: "1"\n expression: "document.title"\n stdout: "Dashboard".
| Action | Command | Description |
|--------|---------|-------------|
| Open URL | ov eval cdp open <image> <url> | Open URL in new Chrome tab |
| List tabs | ov eval cdp list <image> | List all open tabs (id, title, url) |
| Close tab | ov eval cdp close <image> <tab-id> | Close a tab by ID |
| Get text | ov eval cdp text <image> <tab-id> | Get page text content |
| Get HTML | ov eval cdp html <image> <tab-id> | Get page HTML source |
| Get URL | ov eval cdp url <image> <tab-id> | Get page title and URL |
| Screenshot | ov eval cdp screenshot <image> <tab-id> [file] | Capture PNG screenshot |
| Click | ov eval cdp click <image> <tab-id> <selector> [--vnc] | Click element by CSS selector |
| Coords | ov eval cdp coords <image> <tab-id> <selector> | Show element coords in viewport + desktop |
| Type | ov eval cdp type <image> <tab-id> <selector> <text> | Type into input field |
| Eval JS | ov eval cdp eval <image> <tab-id> <expression> | Evaluate JavaScript |
| Wait | ov eval cdp wait <image> <tab-id> <selector> | Wait for element (--timeout 30s) |
| Raw CDP | ov eval cdp raw <image> <tab-id> <method> [json] | Send raw CDP command |
| Status | ov eval cdp status <image> | Check CDP availability, show port and tab count |
| SPA click | ov eval cdp spa click <image> <tab> <x> <y> [--scale] | Click at canvas coords with SPA scale correction |
| SPA type | ov eval cdp spa type <image> <tab> <text> | Type text via SPA (bypasses local compositor/Chrome) |
| SPA key | ov eval cdp spa key <image> <tab> <key> | Send key press via SPA (Return, Escape, F1-F12, etc.) |
| SPA key-combo | ov eval cdp spa key-combo <image> <tab> <combo> | Send modifier combo via SPA (super+e, ctrl+t, alt+F4) |
| SPA mouse | ov eval cdp spa mouse <image> <tab> <x> <y> [--scale] | Move pointer with SPA scale correction |
| SPA status | ov eval cdp spa status <image> <tab> | Show SPA state (canvas, overlay, decoders) |
All commands accept -i INSTANCE for multi-instance support.
ov-<image>[-<instance>])podman port / docker port/json/list, /json/new?url=, /json/close/<id>) for list, open, and closecdp-proxy supervisord service--remote-allow-origins='*' and --remote-debugging-port=9223 (internal port)ov start)The cdp-proxy is essential because Chrome 146+ binds DevTools only to 127.0.0.1 and rejects connections with non-localhost Host headers. Chrome binds to 127.0.0.1:9223 internally. The cdp-proxy Python script listens on 0.0.0.0:9222 and forwards to Chrome with Host header rewriting. It also rewrites response URLs (webSocketDebuggerUrl: ws://localhost:9223/... to ws://<client-host>:9222/...) with Content-Length correction, ensuring CDP WebSocket connections work correctly from the host.
ov eval cdp open my-app "https://example.com"
Uses HTTP API: PUT /json/new?url=<encoded-url>. Returns the new tab ID.
ov eval cdp list my-app
# ID TITLE URL
# 7F8A3B2C... Example Domain https://example.com/
Uses HTTP API: GET /json/list.
ov eval cdp close my-app 7F8A3B2C...
Uses HTTP API: GET /json/close/<id>.
ov eval cdp text my-app $TAB # Plain text
ov eval cdp html my-app $TAB # HTML source
ov eval cdp url my-app $TAB # Title and URL
Uses CDP WebSocket: Runtime.evaluate with document.body.innerText / document.documentElement.outerHTML, Target.getTargetInfo.
ov eval cdp screenshot my-app $TAB # Prints base64 to stdout
ov eval cdp screenshot my-app $TAB page.png # Saves to file
Uses CDP: Page.captureScreenshot.
ov eval cdp click my-app $TAB 'button[type="submit"]'
ov eval cdp type my-app $TAB 'input[name="email"]' "[email protected]"
Click uses CDP: Runtime.evaluate with deepQuery() to find element (piercing shadow DOM), scrollIntoViewIfNeeded() + getBoundingClientRect() for coordinates, Input.dispatchMouseEvent for click. Type uses deepQuery() + scrollIntoViewIfNeeded() + focus() to select the element, then Input.dispatchKeyEvent for each character (keyDown, char, keyUp — matching Puppeteer behavior).
Shadow DOM support: All selector-based commands (click, type, wait) automatically pierce shadow DOM boundaries via recursive deepQuery(). This means selectors work on Chrome's internal pages (chrome://settings/*), Polymer/Lit web components, and any page using Web Components with shadow DOM. Hidden/zero-sized elements are skipped — only visible matches are returned.
Note on Chrome internal dialogs: Some Chrome UI elements (e.g., the "Turn on sync" confirmation dialog) are rendered as native browser chrome, invisible to CDP. Use VNC keyboard (ov eval vnc key ... Tab, ov eval vnc key ... Return) to interact with these dialogs. VNC screenshots (ov eval vnc screenshot) show the full desktop including these dialogs, while CDP screenshots only show the page viewport.
CDP coordinates are viewport-relative (relative to Chrome's content area). VNC coordinates are desktop-absolute (the full Wayland framebuffer). The offset between them includes Chrome's window position on the desktop plus Chrome's UI chrome (title bar, tab bar, address bar — typically ~107px).
ov eval cdp coords — Shows an element's coordinates in both systems:
ov eval cdp coords my-app $TAB '#sync-button'
# Element: #sync-button (108x36)
# Viewport: x=1166 y=310 center=(1220, 328)
# Desktop: x=1166 y=421 center=(1220, 439) (via window.screenX/screenY, chromeHeight=107)
# Sway: window at (4, 4) size 1912x1032 (app_id=google-chrome)
ov eval cdp click --vnc — Finds element via CDP selector, delivers click via VNC:
ov eval cdp click my-app $TAB '#sync-button' --vnc
# Clicked element at viewport (1220, 328) → desktop (1220, 439) via VNC
ov eval vnc click --from-cdp — Translates viewport coords to desktop coords:
ov eval vnc click my-app 1220 328 --from-cdp $TAB
# Translated viewport (1220, 328) → desktop (1220, 439) via CDP tab ...
ov eval cdp eval my-app $TAB 'document.title'
ov eval cdp eval my-app $TAB 'JSON.stringify(localStorage)'
Uses CDP: Runtime.evaluate. Returns the result value.
ov eval cdp wait my-app $TAB 'h1' # Default 30s timeout
ov eval cdp wait my-app $TAB '.loaded' --timeout 60s # Custom timeout
Polls with CDP until the CSS selector matches an element.
ov eval cdp raw my-app $TAB 'Page.navigate' '{"url":"https://example.com"}'
ov eval cdp raw my-app $TAB 'Runtime.evaluate' '{"expression":"1+1"}'
Sends arbitrary CDP method with optional JSON params. Returns raw CDP response.
When ov eval cdp commands fail to connect, the diagnoseCDP() function runs automatically and provides targeted hints:
pgrep chrome)supervisorctl status cdp-proxy)ss -tlnp)Hints direct users to ov eval wl sway exec <image> chrome-wrapper for manual Chrome restart (not ov shell with bare swaymsg, which may lack the correct SWAYSOCK path).
Images with Chrome include a browser-open script and set BROWSER=browser-open in the environment. When CLI tools inside the container call xdg-open or use the $BROWSER variable to open a URL, it routes through CDP to open the URL in the running Chrome instance.
Complete flow for deploying openclaw with Codex OAuth. All browser interactions must be VNC-visible — use --vnc flag on ov eval cdp click.
Critical: The openclaw models auth login TUI requires a real terminal. Do NOT pipe through tee or redirect stdout. Use ov tmux (see /ov-advanced:tmux).
IMG=openclaw-sway-browser # or openclaw-ollama-sway-browser
# 1. Prerequisites: Chrome signed into Google with sync enabled
# See /ov-openclaw:openclaw-ollama-sway-browser for full Chrome sign-in procedure
# 2. Start OAuth in a tmux session (real terminal)
ov tmux run $IMG -s oauth "openclaw models auth login --provider openai-codex --set-default"
# 3. Read OAuth URL from tmux output
sleep 5
OAUTH_URL=$(ov tmux capture $IMG -s oauth | grep -o 'https://auth.openai.com/[^ ]*')
ov eval cdp open $IMG "$OAUTH_URL"
# 4. Click "Continue with Google" (VNC-visible)
sleep 5
TAB=$(ov eval cdp list $IMG | grep -i "openai\|auth" | head -1 | awk '{print $1}')
ov eval cdp click $IMG $TAB 'button._buttonStyleFix_wvuha_65' --vnc
# 5. Click "Continue" on Codex consent page (VNC-visible)
sleep 5
ov eval cdp click $IMG $TAB 'button._primary_3rdp0_107' --vnc
# 6. Verify token exchange completed
sleep 10
ov tmux capture $IMG -s oauth
# Expected: "OpenAI OAuth complete", "Default model set to openai-codex/gpt-5.4"
# 7. Restart gateway
ov shell $IMG -c "supervisorctl restart openclaw"
ov shell $IMG -c "openclaw models status"
Tested selectors (OpenAI auth page as of 2026-03-21):
button._buttonStyleFix_wvuha_65 (first matching social button)button._primary_3rdp0_107 (black primary button)Key enablers:
ov tmux run provides a real terminal for the TUI to complete the token exchange (see /ov-advanced:tmux)ov eval cdp click --vnc finds elements via CDP, clicks via VNC (visible to user)cdp-proxy makes Chrome DevTools accessible from host through podman bridge networking (with Host header rewriting)shm_size: 1g prevents Chrome from crashing due to /dev/shm exhaustionlocalhost:1455 is container-internal (no port mapping needed)Stale port 1455: If a previous OAuth attempt left port 1455 occupied: ov shell $IMG -c 'kill -9 $(ss -tlnp sport = :1455 | grep -oP "pid=\K\d+")'
Source: ov/cdp.go, ov/vnc.go.
Sign into a Google account inside a running container. Requires GMAIL_USER and GMAIL_PASSWORD environment variables (set in .env or passed via -e).
App Passwords required: Google accounts with 2FA (now mandatory for most accounts) require a 16-character App Password. App Passwords bypass all verification challenges and 2FA prompts — use them by default for automated sign-in.
Fresh profile prerequisite: A fresh chrome-data volume triggers Chrome's first-run flow. Use ov remove <image> --purge before ov config to ensure a clean start. Just rebuilding the image does not reset named volumes.
On a fresh profile, Chrome opens a first-run dialog ("Make Google Chrome the default browser") as a separate window that CDP cannot see (no debuggable tabs). It tiles alongside any CDP-opened tabs in sway, breaking coordinate translation.
# Focus the first-run dialog and dismiss it
ov eval wl sway msg my-app 'focus left' # first-run dialog is typically the left window
ov eval vnc key my-app Return # press OK
After dismissal, Chrome shows chrome://intro/ — "Sign in to Chrome" with shadow DOM buttons.
chrome:// pages block CDP mouse events and JS .click(). Use --vnc click (CDP selector targeting + VNC pointer delivery):
TAB=$(ov eval cdp list my-app | grep intro | head -1 | awk '{print $1}')
ov eval cdp click my-app $TAB '#acceptSignInButton' --vnc
Shadow DOM path: intro-app > sign-in-promo > #acceptSignInButton. The --vnc flag uses deepQuery to find the element, translates viewport coords to desktop coords via window.screenX/screenY, and delivers the click through VNC.
This opens a new tab with the Google sign-in page. Capture the new tab ID:
sleep 3
TAB=$(ov eval cdp list my-app | grep -i "sign in" | head -1 | awk '{print $1}')
Note: The tab ID survives Google's same-tab navigations (email → password → result).
ov eval cdp wait my-app $TAB '#identifierId' --timeout 30s
ov eval cdp click my-app $TAB '#identifierId' --vnc # focus field via VNC pointer
sleep 0.5 # let compositor process focus
ov eval vnc type my-app "$GMAIL_USER" # real keysym events
Use ov eval cdp coords my-app $TAB '#identifierId' to inspect element position in all three coordinate systems (viewport, desktop via CDP, desktop via sway) for debugging.
ov eval cdp click my-app $TAB '#identifierNext' --vnc
sleep 5 # page transition
ov eval cdp url my-app $TAB # expect /challenge/pwd
ov eval cdp screenshot my-app $TAB step3.png # verification checkpoint
ov eval cdp wait my-app $TAB 'input[type="password"]' --timeout 15s
ov eval cdp click my-app $TAB 'input[type="password"]' --vnc
sleep 0.5
ov eval vnc type my-app "$GMAIL_PASSWORD"
ov eval cdp click my-app $TAB '#passwordNext' --vnc
sleep 7 # backend verification
ov eval cdp screenshot my-app $TAB step5.png
ov eval vnc screenshot my-app step5-desktop.png # catches native dialogs
After successful sign-in, Chrome navigates to chrome://sync-confirmation/ — a chrome:// page (NOT a native dialog). CDP can see it but --vnc click is required:
TAB=$(ov eval cdp list my-app | grep -i sync-confirmation | head -1 | awk '{print $1}')
# If no sync-confirmation tab, it may already be on the current tab:
# TAB stays the same from step 5
ov eval cdp click my-app $TAB '#confirmButton' --vnc # "Yes, I'm in"
Shadow DOM path: sync-confirmation-app > #confirmButton. Other buttons: #notNowButton ("No thanks"), #settingsButton ("Settings").
2FA/CAPTCHA: Take a VNC screenshot (ov eval vnc screenshot my-app challenge.png) and complete manually via a VNC client. App Passwords bypass most challenges.
Search engine choice: May appear as a new tab. Handle via CDP eval if present:
STAB=$(ov eval cdp list my-app | grep search-engine | head -1 | awk '{print $1}')
# If STAB is non-empty, select Google via shadow DOM eval
Cookies and sync state are stored in the chrome-data volume (~/.chrome-debug), persisting across container restarts. Use ov remove <image> --purge to clear for a fresh start.
The --vnc flag on ov eval cdp click is essential for the sign-in flow:
chrome:// pages (intro, sync-confirmation): CDP mouse events and JS .click() are blocked. --vnc is the only way to click.--vnc delivers real pointer events that bypass anti-automation detection.window.screenX + window.screenY + chromeHeight = desktop coords. On popup windows (no toolbar), chromeHeight=0.Use ov eval cdp coords my-app $TAB '<selector>' to debug coordinate translation. It shows element position in viewport, desktop (via CDP), and desktop (via sway) systems.
ov eval cdp spa)The ov eval cdp spa subcommands provide first-class support for interacting with Selkies-style remote desktop SPAs. These bypass the local compositor and Chrome shortcut handlers — the only way to send Super+e, Ctrl+T, or Alt+F4 to the remote desktop.
input#overlayInput (z-index 3, opacity 0, pointer-events: auto) — invisible input overlay capturing all eventscanvas#videoCanvas (z-index 2, pointer-events: none) — H.264 video render surfaceIMG=sway-browser-vnc
TAB=$(ov eval cdp list $IMG | grep -i selkies | awk '{print $1}')
# Check SPA state
ov eval cdp spa status $IMG $TAB
# Click at canvas coordinates (where elements appear in CDP screenshots)
ov eval cdp spa click $IMG $TAB 990 375 --scale 0.824,0.836
# Type text (bypasses local compositor — no double-char issue)
ov eval cdp spa type $IMG $TAB "hello world"
# Send modifier combos that normally can't reach the remote desktop:
ov eval cdp spa key-combo $IMG $TAB super+e # Open foot terminal in labwc
ov eval cdp spa key-combo $IMG $TAB ctrl+t # New tab in REMOTE Chrome
ov eval cdp spa key-combo $IMG $TAB alt+f4 # Close window in labwc
# Send special keys
ov eval cdp spa key $IMG $TAB return
ov eval cdp spa key $IMG $TAB escape
The SPA maps mouse events from canvas to remote desktop with an internal scaling factor. Use --scale scaleX,scaleY to correct: a click at canvas position (x, y) is sent to (x/scaleX, y/scaleY). Determine the scale empirically by comparing ov eval cdp spa click cursor position (via ov eval cdp screenshot) with the target.
ov eval cdp spa type/key/key-combo sends Input.dispatchKeyEvent directly to the page. The SPA's onkeydown handler on #overlayInput (with stopImmediatePropagation) captures these and forwards to the remote compositor via WebSocket. Only keyDown + keyUp are sent (no "char" event) to prevent double input.
spa vs regular CDP vs VNC/WL| Scenario | Command |
|----------|---------|
| Click/type in a web page | ov eval cdp click/type (CSS selector targeting) |
| Click/type in a remote desktop via SPA | ov eval cdp spa click/type (canvas coordinates) |
| Send Super+key or Ctrl+T to remote desktop | ov eval cdp spa key-combo (only option that works) |
| Click in local compositor | ov eval wl click or ov eval vnc click |
| Take screenshot of stream content | ov eval cdp screenshot (captures canvas) |
| Take screenshot of full client desktop | ov eval vnc screenshot or ov eval wl screenshot |
Use cdp status → cdp open → cdp eval to verify proxy connectivity on instances:
# Check CDP is available
ov eval cdp status selkies-desktop -i 198.145.102.110
# Open a test page
ov eval cdp open selkies-desktop -i 198.145.102.110 "https://ip.me"
# Extract the detected IP (ip.me stores it in an input field)
ov eval cdp eval selkies-desktop -i 198.145.102.110 <tab-id> \
"document.querySelector('#ip-lookup').value"
# → Should return 198.145.102.110 (the proxy IP)
This pattern works for any page content extraction via JS. The cdp eval command returns the expression's result directly.
/ov-build:eval -- parent router; ov eval cdp … is how every invocation is dispatched./ov-advanced:wl -- Wayland desktop automation (sibling verb under ov eval); also sway subgroup for compositor control./ov-advanced:vnc -- VNC desktop automation (sibling verb; same container, pixel-level interaction)./ov-advanced:dbus -- D-Bus calls and notifications (sibling verb under ov eval)./ov-core:shell -- Running commands in containers (--tty for OAuth flows)/ov-core:config -- Instance deployment, proxy configuration, removal workflow/ov-build:layer -- Chrome layer configuration (cdp-proxy service, port declarations)/ov-selkies:selkies-desktop -- Full SPA DOM structure, coordinate mapping, session resilience/ov-selkies:chrome-devtools-mcp -- MCP-based browser automation (29 tools via Streamable HTTP)/ov-selkies:chrome -- Chrome layer with cdp-proxy, env_accepts (HTTP_PROXY)MUST be invoked when the task involves Chrome DevTools Protocol, ov eval cdp commands, browser automation, clicking elements, taking screenshots, or OAuth flows inside containers. Invoke this skill BEFORE reading source code or launching Explore agents.
Workflow position: Desktop automation. Use after a desktop container is running. Preferred over VNC for structured interaction. See also /ov-advanced:vnc (pixel), /ov-advanced:wl (sway subgroup) (window).
development
Claude Code multi-agent support in Overthink — sub-agents, dynamic workflows, and agent teams, and how each drives the existing `ov eval` disposable beds to test and verify. MUST be invoked before authoring or invoking an ov sub-agent / dynamic workflow / agent team, wiring agent-lifecycle hooks, or asking "which primitive should drive the R10 beds?".
tools
Mounts a virtiofs share tagged `workspace` at /workspace inside a VM guest via a systemd .mount unit. Use when a kind:vm entity shares a host directory into the guest and you need it auto-mounted (and re-mounted at every boot).
development
MUST be invoked before any work involving: the `kind: android` schema kind, a `target: android` deploy, the `apk:` layer package format (installing Android apps declaratively), AndroidDeployTarget, an in-pod emulator OR a remote/physical adb-endpoint device, or nested `pod → android` deployment. The first-class Android device + app surface that sits above `ov eval adb`/`appium`.
tools
Use when committing, branching, pushing, merging, tagging, creating PRs, or approving/merging PRs with gh — the feat/-branch, R10-gated, never-force-push landing workflow across the main repo + the plugins submodule + image/<distro> submodules. Covers sync-to-upstream, branch/worktree pruning, the fork+PR path for contributors without write access, and cross-repo @github landing order.