ov/skills/cdp/SKILL.md
MUST be invoked before any work involving: Chrome DevTools Protocol, ov test cdp commands, browser automation, clicking elements, taking screenshots, or OAuth flows inside containers.
npx skillsauth add overthinkos/overthink-plugins cdpInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ov test cdp commands connect to Chrome DevTools Protocol (CDP) on port 9222 inside running containers. Provides HTTP API operations (open, list, close tabs) and WebSocket CDP operations (click, type, eval, wait, text, html, screenshot) for headless browser automation.
Every ov test cdp <method> is authorable as a cdp: verb inside a tests: block. The method name becomes the verb's YAML value; method-specific args are sibling fields (tab:, expression:, url:, selector:, etc.). Shared matchers (stdout:, stderr:, exit_status:, artifact_min_bytes:) work like other verbs. See /ov:test for the full method allowlist and YAML shape. Example: - cdp: eval\n tab: "1"\n expression: "document.title"\n stdout: "Dashboard".
| Action | Command | Description |
|--------|---------|-------------|
| Open URL | ov test cdp open <image> <url> | Open URL in new Chrome tab |
| List tabs | ov test cdp list <image> | List all open tabs (id, title, url) |
| Close tab | ov test cdp close <image> <tab-id> | Close a tab by ID |
| Get text | ov test cdp text <image> <tab-id> | Get page text content |
| Get HTML | ov test cdp html <image> <tab-id> | Get page HTML source |
| Get URL | ov test cdp url <image> <tab-id> | Get page title and URL |
| Screenshot | ov test cdp screenshot <image> <tab-id> [file] | Capture PNG screenshot |
| Click | ov test cdp click <image> <tab-id> <selector> [--vnc] | Click element by CSS selector |
| Coords | ov test cdp coords <image> <tab-id> <selector> | Show element coords in viewport + desktop |
| Type | ov test cdp type <image> <tab-id> <selector> <text> | Type into input field |
| Eval JS | ov test cdp eval <image> <tab-id> <expression> | Evaluate JavaScript |
| Wait | ov test cdp wait <image> <tab-id> <selector> | Wait for element (--timeout 30s) |
| Raw CDP | ov test cdp raw <image> <tab-id> <method> [json] | Send raw CDP command |
| Status | ov test cdp status <image> | Check CDP availability, show port and tab count |
| SPA click | ov test cdp spa click <image> <tab> <x> <y> [--scale] | Click at canvas coords with SPA scale correction |
| SPA type | ov test cdp spa type <image> <tab> <text> | Type text via SPA (bypasses local compositor/Chrome) |
| SPA key | ov test cdp spa key <image> <tab> <key> | Send key press via SPA (Return, Escape, F1-F12, etc.) |
| SPA key-combo | ov test cdp spa key-combo <image> <tab> <combo> | Send modifier combo via SPA (super+e, ctrl+t, alt+F4) |
| SPA mouse | ov test cdp spa mouse <image> <tab> <x> <y> [--scale] | Move pointer with SPA scale correction |
| SPA status | ov test cdp spa status <image> <tab> | Show SPA state (canvas, overlay, decoders) |
All commands accept -i INSTANCE for multi-instance support.
ov-<image>[-<instance>])podman port / docker port/json/list, /json/new?url=, /json/close/<id>) for list, open, and closecdp-proxy supervisord service--remote-allow-origins='*' and --remote-debugging-port=9223 (internal port)ov start)The cdp-proxy is essential because Chrome 146+ binds DevTools only to 127.0.0.1 and rejects connections with non-localhost Host headers. Chrome binds to 127.0.0.1:9223 internally. The cdp-proxy Python script listens on 0.0.0.0:9222 and forwards to Chrome with Host header rewriting. It also rewrites response URLs (webSocketDebuggerUrl: ws://localhost:9223/... to ws://<client-host>:9222/...) with Content-Length correction, ensuring CDP WebSocket connections work correctly from the host.
ov test cdp open my-app "https://example.com"
Uses HTTP API: PUT /json/new?url=<encoded-url>. Returns the new tab ID.
ov test cdp list my-app
# ID TITLE URL
# 7F8A3B2C... Example Domain https://example.com/
Uses HTTP API: GET /json/list.
ov test cdp close my-app 7F8A3B2C...
Uses HTTP API: GET /json/close/<id>.
ov test cdp text my-app $TAB # Plain text
ov test cdp html my-app $TAB # HTML source
ov test cdp url my-app $TAB # Title and URL
Uses CDP WebSocket: Runtime.evaluate with document.body.innerText / document.documentElement.outerHTML, Target.getTargetInfo.
ov test cdp screenshot my-app $TAB # Prints base64 to stdout
ov test cdp screenshot my-app $TAB page.png # Saves to file
Uses CDP: Page.captureScreenshot.
ov test cdp click my-app $TAB 'button[type="submit"]'
ov test cdp type my-app $TAB 'input[name="email"]' "[email protected]"
Click uses CDP: Runtime.evaluate with deepQuery() to find element (piercing shadow DOM), scrollIntoViewIfNeeded() + getBoundingClientRect() for coordinates, Input.dispatchMouseEvent for click. Type uses deepQuery() + scrollIntoViewIfNeeded() + focus() to select the element, then Input.dispatchKeyEvent for each character (keyDown, char, keyUp — matching Puppeteer behavior).
Shadow DOM support: All selector-based commands (click, type, wait) automatically pierce shadow DOM boundaries via recursive deepQuery(). This means selectors work on Chrome's internal pages (chrome://settings/*), Polymer/Lit web components, and any page using Web Components with shadow DOM. Hidden/zero-sized elements are skipped — only visible matches are returned.
Note on Chrome internal dialogs: Some Chrome UI elements (e.g., the "Turn on sync" confirmation dialog) are rendered as native browser chrome, invisible to CDP. Use VNC keyboard (ov test vnc key ... Tab, ov test vnc key ... Return) to interact with these dialogs. VNC screenshots (ov test vnc screenshot) show the full desktop including these dialogs, while CDP screenshots only show the page viewport.
CDP coordinates are viewport-relative (relative to Chrome's content area). VNC coordinates are desktop-absolute (the full Wayland framebuffer). The offset between them includes Chrome's window position on the desktop plus Chrome's UI chrome (title bar, tab bar, address bar — typically ~107px).
ov test cdp coords — Shows an element's coordinates in both systems:
ov test cdp coords my-app $TAB '#sync-button'
# Element: #sync-button (108x36)
# Viewport: x=1166 y=310 center=(1220, 328)
# Desktop: x=1166 y=421 center=(1220, 439) (via window.screenX/screenY, chromeHeight=107)
# Sway: window at (4, 4) size 1912x1032 (app_id=google-chrome)
ov test cdp click --vnc — Finds element via CDP selector, delivers click via VNC:
ov test cdp click my-app $TAB '#sync-button' --vnc
# Clicked element at viewport (1220, 328) → desktop (1220, 439) via VNC
ov test vnc click --from-cdp — Translates viewport coords to desktop coords:
ov test vnc click my-app 1220 328 --from-cdp $TAB
# Translated viewport (1220, 328) → desktop (1220, 439) via CDP tab ...
ov test cdp eval my-app $TAB 'document.title'
ov test cdp eval my-app $TAB 'JSON.stringify(localStorage)'
Uses CDP: Runtime.evaluate. Returns the result value.
ov test cdp wait my-app $TAB 'h1' # Default 30s timeout
ov test cdp wait my-app $TAB '.loaded' --timeout 60s # Custom timeout
Polls with CDP until the CSS selector matches an element.
ov test cdp raw my-app $TAB 'Page.navigate' '{"url":"https://example.com"}'
ov test cdp raw my-app $TAB 'Runtime.evaluate' '{"expression":"1+1"}'
Sends arbitrary CDP method with optional JSON params. Returns raw CDP response.
When ov test cdp commands fail to connect, the diagnoseCDP() function runs automatically and provides targeted hints:
pgrep chrome)supervisorctl status cdp-proxy)ss -tlnp)Hints direct users to ov test wl sway exec <image> chrome-wrapper for manual Chrome restart (not ov shell with bare swaymsg, which may lack the correct SWAYSOCK path).
Images with Chrome include a browser-open script and set BROWSER=browser-open in the environment. When CLI tools inside the container call xdg-open or use the $BROWSER variable to open a URL, it routes through CDP to open the URL in the running Chrome instance.
Complete flow for deploying openclaw with Codex OAuth. All browser interactions must be VNC-visible — use --vnc flag on ov test cdp click.
Critical: The openclaw models auth login TUI requires a real terminal. Do NOT pipe through tee or redirect stdout. Use ov tmux (see /ov:tmux).
IMG=openclaw-sway-browser # or openclaw-ollama-sway-browser
# 1. Prerequisites: Chrome signed into Google with sync enabled
# See /ov-images:openclaw-ollama-sway-browser for full Chrome sign-in procedure
# 2. Start OAuth in a tmux session (real terminal)
ov tmux run $IMG -s oauth "openclaw models auth login --provider openai-codex --set-default"
# 3. Read OAuth URL from tmux output
sleep 5
OAUTH_URL=$(ov tmux capture $IMG -s oauth | grep -o 'https://auth.openai.com/[^ ]*')
ov test cdp open $IMG "$OAUTH_URL"
# 4. Click "Continue with Google" (VNC-visible)
sleep 5
TAB=$(ov test cdp list $IMG | grep -i "openai\|auth" | head -1 | awk '{print $1}')
ov test cdp click $IMG $TAB 'button._buttonStyleFix_wvuha_65' --vnc
# 5. Click "Continue" on Codex consent page (VNC-visible)
sleep 5
ov test cdp click $IMG $TAB 'button._primary_3rdp0_107' --vnc
# 6. Verify token exchange completed
sleep 10
ov tmux capture $IMG -s oauth
# Expected: "OpenAI OAuth complete", "Default model set to openai-codex/gpt-5.4"
# 7. Restart gateway
ov shell $IMG -c "supervisorctl restart openclaw"
ov shell $IMG -c "openclaw models status"
Tested selectors (OpenAI auth page as of 2026-03-21):
button._buttonStyleFix_wvuha_65 (first matching social button)button._primary_3rdp0_107 (black primary button)Key enablers:
ov tmux run provides a real terminal for the TUI to complete the token exchange (see /ov:tmux)ov test cdp click --vnc finds elements via CDP, clicks via VNC (visible to user)cdp-proxy makes Chrome DevTools accessible from host through podman bridge networking (with Host header rewriting)shm_size: 1g prevents Chrome from crashing due to /dev/shm exhaustionlocalhost:1455 is container-internal (no port mapping needed)Stale port 1455: If a previous OAuth attempt left port 1455 occupied: ov shell $IMG -c 'kill -9 $(ss -tlnp sport = :1455 | grep -oP "pid=\K\d+")'
Source: ov/cdp.go, ov/vnc.go.
Sign into a Google account inside a running container. Requires GMAIL_USER and GMAIL_PASSWORD environment variables (set in .env or passed via -e).
App Passwords required: Google accounts with 2FA (now mandatory for most accounts) require a 16-character App Password. App Passwords bypass all verification challenges and 2FA prompts — use them by default for automated sign-in.
Fresh profile prerequisite: A fresh chrome-data volume triggers Chrome's first-run flow. Use ov remove <image> --purge before ov config to ensure a clean start. Just rebuilding the image does not reset named volumes.
On a fresh profile, Chrome opens a first-run dialog ("Make Google Chrome the default browser") as a separate window that CDP cannot see (no debuggable tabs). It tiles alongside any CDP-opened tabs in sway, breaking coordinate translation.
# Focus the first-run dialog and dismiss it
ov test wl sway msg my-app 'focus left' # first-run dialog is typically the left window
ov test vnc key my-app Return # press OK
After dismissal, Chrome shows chrome://intro/ — "Sign in to Chrome" with shadow DOM buttons.
chrome:// pages block CDP mouse events and JS .click(). Use --vnc click (CDP selector targeting + VNC pointer delivery):
TAB=$(ov test cdp list my-app | grep intro | head -1 | awk '{print $1}')
ov test cdp click my-app $TAB '#acceptSignInButton' --vnc
Shadow DOM path: intro-app > sign-in-promo > #acceptSignInButton. The --vnc flag uses deepQuery to find the element, translates viewport coords to desktop coords via window.screenX/screenY, and delivers the click through VNC.
This opens a new tab with the Google sign-in page. Capture the new tab ID:
sleep 3
TAB=$(ov test cdp list my-app | grep -i "sign in" | head -1 | awk '{print $1}')
Note: The tab ID survives Google's same-tab navigations (email → password → result).
ov test cdp wait my-app $TAB '#identifierId' --timeout 30s
ov test cdp click my-app $TAB '#identifierId' --vnc # focus field via VNC pointer
sleep 0.5 # let compositor process focus
ov test vnc type my-app "$GMAIL_USER" # real keysym events
Use ov test cdp coords my-app $TAB '#identifierId' to inspect element position in all three coordinate systems (viewport, desktop via CDP, desktop via sway) for debugging.
ov test cdp click my-app $TAB '#identifierNext' --vnc
sleep 5 # page transition
ov test cdp url my-app $TAB # expect /challenge/pwd
ov test cdp screenshot my-app $TAB step3.png # verification checkpoint
ov test cdp wait my-app $TAB 'input[type="password"]' --timeout 15s
ov test cdp click my-app $TAB 'input[type="password"]' --vnc
sleep 0.5
ov test vnc type my-app "$GMAIL_PASSWORD"
ov test cdp click my-app $TAB '#passwordNext' --vnc
sleep 7 # backend verification
ov test cdp screenshot my-app $TAB step5.png
ov test vnc screenshot my-app step5-desktop.png # catches native dialogs
After successful sign-in, Chrome navigates to chrome://sync-confirmation/ — a chrome:// page (NOT a native dialog). CDP can see it but --vnc click is required:
TAB=$(ov test cdp list my-app | grep -i sync-confirmation | head -1 | awk '{print $1}')
# If no sync-confirmation tab, it may already be on the current tab:
# TAB stays the same from step 5
ov test cdp click my-app $TAB '#confirmButton' --vnc # "Yes, I'm in"
Shadow DOM path: sync-confirmation-app > #confirmButton. Other buttons: #notNowButton ("No thanks"), #settingsButton ("Settings").
2FA/CAPTCHA: Take a VNC screenshot (ov test vnc screenshot my-app challenge.png) and complete manually via a VNC client. App Passwords bypass most challenges.
Search engine choice: May appear as a new tab. Handle via CDP eval if present:
STAB=$(ov test cdp list my-app | grep search-engine | head -1 | awk '{print $1}')
# If STAB is non-empty, select Google via shadow DOM eval
Cookies and sync state are stored in the chrome-data volume (~/.chrome-debug), persisting across container restarts. Use ov remove <image> --purge to clear for a fresh start.
The --vnc flag on ov test cdp click is essential for the sign-in flow:
chrome:// pages (intro, sync-confirmation): CDP mouse events and JS .click() are blocked. --vnc is the only way to click.--vnc delivers real pointer events that bypass anti-automation detection.window.screenX + window.screenY + chromeHeight = desktop coords. On popup windows (no toolbar), chromeHeight=0.Use ov test cdp coords my-app $TAB '<selector>' to debug coordinate translation. It shows element position in viewport, desktop (via CDP), and desktop (via sway) systems.
ov test cdp spa)The ov test cdp spa subcommands provide first-class support for interacting with Selkies-style remote desktop SPAs. These bypass the local compositor and Chrome shortcut handlers — the only way to send Super+e, Ctrl+T, or Alt+F4 to the remote desktop.
input#overlayInput (z-index 3, opacity 0, pointer-events: auto) — invisible input overlay capturing all eventscanvas#videoCanvas (z-index 2, pointer-events: none) — H.264 video render surfaceIMG=sway-browser-vnc
TAB=$(ov test cdp list $IMG | grep -i selkies | awk '{print $1}')
# Check SPA state
ov test cdp spa status $IMG $TAB
# Click at canvas coordinates (where elements appear in CDP screenshots)
ov test cdp spa click $IMG $TAB 990 375 --scale 0.824,0.836
# Type text (bypasses local compositor — no double-char issue)
ov test cdp spa type $IMG $TAB "hello world"
# Send modifier combos that normally can't reach the remote desktop:
ov test cdp spa key-combo $IMG $TAB super+e # Open foot terminal in labwc
ov test cdp spa key-combo $IMG $TAB ctrl+t # New tab in REMOTE Chrome
ov test cdp spa key-combo $IMG $TAB alt+f4 # Close window in labwc
# Send special keys
ov test cdp spa key $IMG $TAB return
ov test cdp spa key $IMG $TAB escape
The SPA maps mouse events from canvas to remote desktop with an internal scaling factor. Use --scale scaleX,scaleY to correct: a click at canvas position (x, y) is sent to (x/scaleX, y/scaleY). Determine the scale empirically by comparing ov test cdp spa click cursor position (via ov test cdp screenshot) with the target.
ov test cdp spa type/key/key-combo sends Input.dispatchKeyEvent directly to the page. The SPA's onkeydown handler on #overlayInput (with stopImmediatePropagation) captures these and forwards to the remote compositor via WebSocket. Only keyDown + keyUp are sent (no "char" event) to prevent double input.
spa vs regular CDP vs VNC/WL| Scenario | Command |
|----------|---------|
| Click/type in a web page | ov test cdp click/type (CSS selector targeting) |
| Click/type in a remote desktop via SPA | ov test cdp spa click/type (canvas coordinates) |
| Send Super+key or Ctrl+T to remote desktop | ov test cdp spa key-combo (only option that works) |
| Click in local compositor | ov test wl click or ov test vnc click |
| Take screenshot of stream content | ov test cdp screenshot (captures canvas) |
| Take screenshot of full client desktop | ov test vnc screenshot or ov test wl screenshot |
Use cdp status → cdp open → cdp eval to verify proxy connectivity on instances:
# Check CDP is available
ov test cdp status selkies-desktop -i 198.145.102.110
# Open a test page
ov test cdp open selkies-desktop -i 198.145.102.110 "https://ip.me"
# Extract the detected IP (ip.me stores it in an input field)
ov test cdp eval selkies-desktop -i 198.145.102.110 <tab-id> \
"document.querySelector('#ip-lookup').value"
# → Should return 198.145.102.110 (the proxy IP)
This pattern works for any page content extraction via JS. The cdp eval command returns the expression's result directly.
/ov:test -- parent router; ov test cdp … is how every invocation is dispatched./ov:wl -- Wayland desktop automation (sibling verb under ov test); also sway subgroup for compositor control./ov:vnc -- VNC desktop automation (sibling verb; same container, pixel-level interaction)./ov:dbus -- D-Bus calls and notifications (sibling verb under ov test)./ov:shell -- Running commands in containers (--tty for OAuth flows)/ov:config -- Instance deployment, proxy configuration, removal workflow/ov:layer -- Chrome layer configuration (cdp-proxy service, port declarations)/ov-images:selkies-desktop -- Full SPA DOM structure, coordinate mapping, session resilience/ov-layers:chrome-devtools-mcp -- MCP-based browser automation (29 tools via Streamable HTTP)/ov-layers:chrome -- Chrome layer with cdp-proxy, env_accepts (HTTP_PROXY)MUST be invoked when the task involves Chrome DevTools Protocol, ov test cdp commands, browser automation, clicking elements, taking screenshots, or OAuth flows inside containers. Invoke this skill BEFORE reading source code or launching Explore agents.
Workflow position: Desktop automation. Use after a desktop container is running. Preferred over VNC for structured interaction. See also /ov:vnc (pixel), /ov:wl (sway subgroup) (window).
development
Claude Code multi-agent support in Overthink — sub-agents, dynamic workflows, and agent teams, and how each drives the existing `ov eval` disposable beds to test and verify. MUST be invoked before authoring or invoking an ov sub-agent / dynamic workflow / agent team, wiring agent-lifecycle hooks, or asking "which primitive should drive the R10 beds?".
tools
Mounts a virtiofs share tagged `workspace` at /workspace inside a VM guest via a systemd .mount unit. Use when a kind:vm entity shares a host directory into the guest and you need it auto-mounted (and re-mounted at every boot).
development
MUST be invoked before any work involving: the `kind: android` schema kind, a `target: android` deploy, the `apk:` layer package format (installing Android apps declaratively), AndroidDeployTarget, an in-pod emulator OR a remote/physical adb-endpoint device, or nested `pod → android` deployment. The first-class Android device + app surface that sits above `ov eval adb`/`appium`.
tools
Use when committing, branching, pushing, merging, tagging, creating PRs, or approving/merging PRs with gh — the feat/-branch, R10-gated, never-force-push landing workflow across the main repo + the plugins submodule + image/<distro> submodules. Covers sync-to-upstream, branch/worktree pruning, the fork+PR path for contributors without write access, and cross-repo @github landing order.