CDP - Chrome DevTools Protocol

Overview

ov eval cdp commands connect to Chrome DevTools Protocol (CDP) on port 9222 inside running containers. Provides HTTP API operations (open, list, close tabs) and WebSocket CDP operations (click, type, eval, wait, text, html, screenshot) for headless browser automation.

Also as a declarative verb

Every ov eval cdp <method> is authorable as a cdp: verb inside a eval: block. The method name becomes the verb's YAML value; method-specific args are sibling fields (tab:, expression:, url:, selector:, etc.). Shared matchers (stdout:, stderr:, exit_status:, artifact_min_bytes:) work like other verbs. See /ov-build:eval for the full method allowlist and YAML shape. Example: - cdp: eval\n tab: "1"\n expression: "document.title"\n stdout: "Dashboard".

Quick Reference

| Action | Command | Description | |--------|---------|-------------| | Open URL | ov eval cdp open <image> <url> | Open URL in new Chrome tab | | List tabs | ov eval cdp list <image> | List all open tabs (id, title, url) | | Close tab | ov eval cdp close <image> <tab-id> | Close a tab by ID | | Get text | ov eval cdp text <image> <tab-id> | Get page text content | | Get HTML | ov eval cdp html <image> <tab-id> | Get page HTML source | | Get URL | ov eval cdp url <image> <tab-id> | Get page title and URL | | Screenshot | ov eval cdp screenshot <image> <tab-id> [file] | Capture PNG screenshot | | Click | ov eval cdp click <image> <tab-id> <selector> [--vnc] | Click element by CSS selector | | Coords | ov eval cdp coords <image> <tab-id> <selector> | Show element coords in viewport + desktop | | Type | ov eval cdp type <image> <tab-id> <selector> <text> | Type into input field | | Eval JS | ov eval cdp eval <image> <tab-id> <expression> | Evaluate JavaScript | | Wait | ov eval cdp wait <image> <tab-id> <selector> | Wait for element (--timeout 30s) | | Raw CDP | ov eval cdp raw <image> <tab-id> <method> [json] | Send raw CDP command | | Status | ov eval cdp status <image> | Check CDP availability, show port and tab count | | SPA click | ov eval cdp spa click <image> <tab> <x> <y> [--scale] | Click at canvas coords with SPA scale correction | | SPA type | ov eval cdp spa type <image> <tab> <text> | Type text via SPA (bypasses local compositor/Chrome) | | SPA key | ov eval cdp spa key <image> <tab> <key> | Send key press via SPA (Return, Escape, F1-F12, etc.) | | SPA key-combo | ov eval cdp spa key-combo <image> <tab> <combo> | Send modifier combo via SPA (super+e, ctrl+t, alt+F4) | | SPA mouse | ov eval cdp spa mouse <image> <tab> <x> <y> [--scale] | Move pointer with SPA scale correction | | SPA status | ov eval cdp spa status <image> <tab> | Show SPA state (canvas, overlay, decoders) |

All commands accept -i INSTANCE for multi-instance support.

Architecture

Resolves the container name from image + instance (ov-<image>[-<instance>])
Discovers the mapped port 9222 via podman port / docker port
HTTP API (/json/list, /json/new?url=, /json/close/<id>) for list, open, and close
CDP WebSocket for interactive operations (click, type, eval, wait, text, html, screenshot, cdp)

Requirements

A Chrome layer with the cdp-proxy supervisord service
Chrome launched with --remote-allow-origins='*' and --remote-debugging-port=9223 (internal port)
Container must be running (ov start)

The cdp-proxy is essential because Chrome 146+ binds DevTools only to 127.0.0.1 and rejects connections with non-localhost Host headers. Chrome binds to 127.0.0.1:9223 internally. The cdp-proxy Python script listens on 0.0.0.0:9222 and forwards to Chrome with Host header rewriting. It also rewrites response URLs (webSocketDebuggerUrl: ws://localhost:9223/... to ws://<client-host>:9222/...) with Content-Length correction, ensuring CDP WebSocket connections work correctly from the host.

Commands

Open a URL

ov eval cdp open my-app "https://example.com"

Uses HTTP API: PUT /json/new?url=<encoded-url>. Returns the new tab ID.

List Tabs

ov eval cdp list my-app
# ID                                    TITLE                URL
# 7F8A3B2C...                          Example Domain       https://example.com/

Uses HTTP API: GET /json/list.

Close a Tab

ov eval cdp close my-app 7F8A3B2C...

Uses HTTP API: GET /json/close/<id>.

Get Page Content

ov eval cdp text my-app $TAB      # Plain text
ov eval cdp html my-app $TAB      # HTML source
ov eval cdp url my-app $TAB       # Title and URL

Uses CDP WebSocket: Runtime.evaluate with document.body.innerText / document.documentElement.outerHTML, Target.getTargetInfo.

Screenshot

ov eval cdp screenshot my-app $TAB              # Prints base64 to stdout
ov eval cdp screenshot my-app $TAB page.png     # Saves to file

Uses CDP: Page.captureScreenshot.

Click and Type

ov eval cdp click my-app $TAB 'button[type="submit"]'
ov eval cdp type my-app $TAB 'input[name="email"]' "[email protected]"

Click uses CDP: Runtime.evaluate with deepQuery() to find element (piercing shadow DOM), scrollIntoViewIfNeeded() + getBoundingClientRect() for coordinates, Input.dispatchMouseEvent for click. Type uses deepQuery() + scrollIntoViewIfNeeded() + focus() to select the element, then Input.dispatchKeyEvent for each character (keyDown, char, keyUp — matching Puppeteer behavior).

Shadow DOM support: All selector-based commands (click, type, wait) automatically pierce shadow DOM boundaries via recursive deepQuery(). This means selectors work on Chrome's internal pages (chrome://settings/*), Polymer/Lit web components, and any page using Web Components with shadow DOM. Hidden/zero-sized elements are skipped — only visible matches are returned.

Note on Chrome internal dialogs: Some Chrome UI elements (e.g., the "Turn on sync" confirmation dialog) are rendered as native browser chrome, invisible to CDP. Use VNC keyboard (ov eval vnc key ... Tab, ov eval vnc key ... Return) to interact with these dialogs. VNC screenshots (ov eval vnc screenshot) show the full desktop including these dialogs, while CDP screenshots only show the page viewport.

Coordinate Systems

CDP coordinates are viewport-relative (relative to Chrome's content area). VNC coordinates are desktop-absolute (the full Wayland framebuffer). The offset between them includes Chrome's window position on the desktop plus Chrome's UI chrome (title bar, tab bar, address bar — typically ~107px).

ov eval cdp coords — Shows an element's coordinates in both systems:

ov eval cdp coords my-app $TAB '#sync-button'
# Element:  #sync-button (108x36)
# Viewport: x=1166 y=310  center=(1220, 328)
# Desktop:  x=1166 y=421  center=(1220, 439)  (via window.screenX/screenY, chromeHeight=107)
# Sway:     window at (4, 4) size 1912x1032  (app_id=google-chrome)

ov eval cdp click --vnc — Finds element via CDP selector, delivers click via VNC:

ov eval cdp click my-app $TAB '#sync-button' --vnc
# Clicked element at viewport (1220, 328) → desktop (1220, 439) via VNC

ov eval vnc click --from-cdp — Translates viewport coords to desktop coords:

ov eval vnc click my-app 1220 328 --from-cdp $TAB
# Translated viewport (1220, 328) → desktop (1220, 439) via CDP tab ...

Evaluate JavaScript

ov eval cdp eval my-app $TAB 'document.title'
ov eval cdp eval my-app $TAB 'JSON.stringify(localStorage)'

Uses CDP: Runtime.evaluate. Returns the result value.

Wait for Element

ov eval cdp wait my-app $TAB 'h1'                    # Default 30s timeout
ov eval cdp wait my-app $TAB '.loaded' --timeout 60s  # Custom timeout

Polls with CDP until the CSS selector matches an element.

Raw CDP Command

ov eval cdp raw my-app $TAB 'Page.navigate' '{"url":"https://example.com"}'
ov eval cdp raw my-app $TAB 'Runtime.evaluate' '{"expression":"1+1"}'

Sends arbitrary CDP method with optional JSON params. Returns raw CDP response.

CDP Connection Diagnostics

When ov eval cdp commands fail to connect, the diagnoseCDP() function runs automatically and provides targeted hints:

Chrome process check: Is Chrome running inside the container? (pgrep chrome)
Proxy status: Is the cdp-proxy forwarding to Chrome? (supervisorctl status cdp-proxy)
Port binding: Is Chrome listening on 127.0.0.1:9223? Is cdp-proxy listening on 0.0.0.0:9222? (ss -tlnp)

Hints direct users to ov eval wl sway exec <image> chrome-wrapper for manual Chrome restart (not ov shell with bare swaymsg, which may lack the correct SWAYSOCK path).

browser-open Script and BROWSER Env

Images with Chrome include a browser-open script and set BROWSER=browser-open in the environment. When CLI tools inside the container call xdg-open or use the $BROWSER variable to open a URL, it routes through CDP to open the URL in the running Chrome instance.

OAuth Automation Example

Complete flow for deploying openclaw with Codex OAuth. All browser interactions must be VNC-visible — use --vnc flag on ov eval cdp click.

Critical: The openclaw models auth login TUI requires a real terminal. Do NOT pipe through tee or redirect stdout. Use ov tmux (see /ov-advanced:tmux).

IMG=openclaw-sway-browser  # or openclaw-ollama-sway-browser

# 1. Prerequisites: Chrome signed into Google with sync enabled
#    See /ov-openclaw:openclaw-ollama-sway-browser for full Chrome sign-in procedure

# 2. Start OAuth in a tmux session (real terminal)
ov tmux run $IMG -s oauth "openclaw models auth login --provider openai-codex --set-default"

# 3. Read OAuth URL from tmux output
sleep 5
OAUTH_URL=$(ov tmux capture $IMG -s oauth | grep -o 'https://auth.openai.com/[^ ]*')
ov eval cdp open $IMG "$OAUTH_URL"

# 4. Click "Continue with Google" (VNC-visible)
sleep 5
TAB=$(ov eval cdp list $IMG | grep -i "openai\|auth" | head -1 | awk '{print $1}')
ov eval cdp click $IMG $TAB 'button._buttonStyleFix_wvuha_65' --vnc

# 5. Click "Continue" on Codex consent page (VNC-visible)
sleep 5
ov eval cdp click $IMG $TAB 'button._primary_3rdp0_107' --vnc

# 6. Verify token exchange completed
sleep 10
ov tmux capture $IMG -s oauth
# Expected: "OpenAI OAuth complete", "Default model set to openai-codex/gpt-5.4"

# 7. Restart gateway
ov shell $IMG -c "supervisorctl restart openclaw"
ov shell $IMG -c "openclaw models status"

Tested selectors (OpenAI auth page as of 2026-03-21):

"Continue with Google": button._buttonStyleFix_wvuha_65 (first matching social button)
"Continue" (consent): button._primary_3rdp0_107 (black primary button)
These are CSS class selectors specific to OpenAI's auth UI — may change over time

Key enablers:

ov tmux run provides a real terminal for the TUI to complete the token exchange (see /ov-advanced:tmux)
ov eval cdp click --vnc finds elements via CDP, clicks via VNC (visible to user)
cdp-proxy makes Chrome DevTools accessible from host through podman bridge networking (with Host header rewriting)
shm_size: 1g prevents Chrome from crashing due to /dev/shm exhaustion
Callback at localhost:1455 is container-internal (no port mapping needed)

Stale port 1455: If a previous OAuth attempt left port 1455 occupied: ov shell $IMG -c 'kill -9 $(ss -tlnp sport = :1455 | grep -oP "pid=\K\d+")'

Source: ov/cdp.go, ov/vnc.go.

Google Sign-In Automation

Sign into a Google account inside a running container. Requires GMAIL_USER and GMAIL_PASSWORD environment variables (set in .env or passed via -e).

App Passwords required: Google accounts with 2FA (now mandatory for most accounts) require a 16-character App Password. App Passwords bypass all verification challenges and 2FA prompts — use them by default for automated sign-in.

Fresh profile prerequisite: A fresh chrome-data volume triggers Chrome's first-run flow. Use ov remove <image> --purge before ov config to ensure a clean start. Just rebuilding the image does not reset named volumes.

Step 0: Dismiss Chrome First-Run Dialog

On a fresh profile, Chrome opens a first-run dialog ("Make Google Chrome the default browser") as a separate window that CDP cannot see (no debuggable tabs). It tiles alongside any CDP-opened tabs in sway, breaking coordinate translation.

# Focus the first-run dialog and dismiss it
ov eval wl sway msg my-app 'focus left'     # first-run dialog is typically the left window
ov eval vnc key my-app Return            # press OK

After dismissal, Chrome shows chrome://intro/ — "Sign in to Chrome" with shadow DOM buttons.

Step 1: Click "Sign in" on chrome://intro

chrome:// pages block CDP mouse events and JS .click(). Use --vnc click (CDP selector targeting + VNC pointer delivery):

TAB=$(ov eval cdp list my-app | grep intro | head -1 | awk '{print $1}')
ov eval cdp click my-app $TAB '#acceptSignInButton' --vnc

Shadow DOM path: intro-app > sign-in-promo > #acceptSignInButton. The --vnc flag uses deepQuery to find the element, translates viewport coords to desktop coords via window.screenX/screenY, and delivers the click through VNC.

This opens a new tab with the Google sign-in page. Capture the new tab ID:

sleep 3
TAB=$(ov eval cdp list my-app | grep -i "sign in" | head -1 | awk '{print $1}')

Note: The tab ID survives Google's same-tab navigations (email → password → result).

Step 2: Enter Email (--vnc click + VNC type)

ov eval cdp wait my-app $TAB '#identifierId' --timeout 30s
ov eval cdp click my-app $TAB '#identifierId' --vnc    # focus field via VNC pointer
sleep 0.5                                          # let compositor process focus
ov eval vnc type my-app "$GMAIL_USER"                   # real keysym events

Use ov eval cdp coords my-app $TAB '#identifierId' to inspect element position in all three coordinate systems (viewport, desktop via CDP, desktop via sway) for debugging.

Step 3: Submit Email

ov eval cdp click my-app $TAB '#identifierNext' --vnc
sleep 5                                            # page transition
ov eval cdp url my-app $TAB                             # expect /challenge/pwd
ov eval cdp screenshot my-app $TAB step3.png            # verification checkpoint

Step 4: Enter Password

ov eval cdp wait my-app $TAB 'input[type="password"]' --timeout 15s
ov eval cdp click my-app $TAB 'input[type="password"]' --vnc
sleep 0.5
ov eval vnc type my-app "$GMAIL_PASSWORD"

Step 5: Submit Password

ov eval cdp click my-app $TAB '#passwordNext' --vnc
sleep 7                                            # backend verification
ov eval cdp screenshot my-app $TAB step5.png
ov eval vnc screenshot my-app step5-desktop.png         # catches native dialogs

Step 6: Enable Sync (chrome://sync-confirmation)

After successful sign-in, Chrome navigates to chrome://sync-confirmation/ — a chrome:// page (NOT a native dialog). CDP can see it but --vnc click is required:

TAB=$(ov eval cdp list my-app | grep -i sync-confirmation | head -1 | awk '{print $1}')
# If no sync-confirmation tab, it may already be on the current tab:
# TAB stays the same from step 5
ov eval cdp click my-app $TAB '#confirmButton' --vnc    # "Yes, I'm in"

Shadow DOM path: sync-confirmation-app > #confirmButton. Other buttons: #notNowButton ("No thanks"), #settingsButton ("Settings").

Handling Challenges

2FA/CAPTCHA: Take a VNC screenshot (ov eval vnc screenshot my-app challenge.png) and complete manually via a VNC client. App Passwords bypass most challenges.

Search engine choice: May appear as a new tab. Handle via CDP eval if present:

STAB=$(ov eval cdp list my-app | grep search-engine | head -1 | awk '{print $1}')
# If STAB is non-empty, select Google via shadow DOM eval

Sign-In Persistence

Cookies and sync state are stored in the chrome-data volume (~/.chrome-debug), persisting across container restarts. Use ov remove <image> --purge to clear for a fresh start.

Coordinate Translation for Sign-In

The --vnc flag on ov eval cdp click is essential for the sign-in flow:

chrome:// pages (intro, sync-confirmation): CDP mouse events and JS .click() are blocked. --vnc is the only way to click.
Google sign-in pages: --vnc delivers real pointer events that bypass anti-automation detection.
Coordinate math: viewport center + window.screenX + window.screenY + chromeHeight = desktop coords. On popup windows (no toolbar), chromeHeight=0.

Use ov eval cdp coords my-app $TAB '<selector>' to debug coordinate translation. It shows element position in viewport, desktop (via CDP), and desktop (via sway) systems.

SPA Remote Desktop Interaction (`ov eval cdp spa`)

The ov eval cdp spa subcommands provide first-class support for interacting with Selkies-style remote desktop SPAs. These bypass the local compositor and Chrome shortcut handlers — the only way to send Super+e, Ctrl+T, or Alt+F4 to the remote desktop.

SPA DOM Structure

input#overlayInput (z-index 3, opacity 0, pointer-events: auto) — invisible input overlay capturing all events
canvas#videoCanvas (z-index 2, pointer-events: none) — H.264 video render surface
Header controls (fullscreen, gaming mode) — hidden at left=-132px, slide in on mouse hover

Usage Example

IMG=sway-browser-vnc
TAB=$(ov eval cdp list $IMG | grep -i selkies | awk '{print $1}')

# Check SPA state
ov eval cdp spa status $IMG $TAB

# Click at canvas coordinates (where elements appear in CDP screenshots)
ov eval cdp spa click $IMG $TAB 990 375 --scale 0.824,0.836

# Type text (bypasses local compositor — no double-char issue)
ov eval cdp spa type $IMG $TAB "hello world"

# Send modifier combos that normally can't reach the remote desktop:
ov eval cdp spa key-combo $IMG $TAB super+e    # Open foot terminal in labwc
ov eval cdp spa key-combo $IMG $TAB ctrl+t     # New tab in REMOTE Chrome
ov eval cdp spa key-combo $IMG $TAB alt+f4     # Close window in labwc

# Send special keys
ov eval cdp spa key $IMG $TAB return
ov eval cdp spa key $IMG $TAB escape

Coordinate Scaling

The SPA maps mouse events from canvas to remote desktop with an internal scaling factor. Use --scale scaleX,scaleY to correct: a click at canvas position (x, y) is sent to (x/scaleX, y/scaleY). Determine the scale empirically by comparing ov eval cdp spa click cursor position (via ov eval cdp screenshot) with the target.

Keyboard Architecture

ov eval cdp spa type/key/key-combo sends Input.dispatchKeyEvent directly to the page. The SPA's onkeydown handler on #overlayInput (with stopImmediatePropagation) captures these and forwards to the remote compositor via WebSocket. Only keyDown + keyUp are sent (no "char" event) to prevent double input.

When to use `spa` vs regular CDP vs VNC/WL

| Scenario | Command | |----------|---------| | Click/type in a web page | ov eval cdp click/type (CSS selector targeting) | | Click/type in a remote desktop via SPA | ov eval cdp spa click/type (canvas coordinates) | | Send Super+key or Ctrl+T to remote desktop | ov eval cdp spa key-combo (only option that works) | | Click in local compositor | ov eval wl click or ov eval vnc click | | Take screenshot of stream content | ov eval cdp screenshot (captures canvas) | | Take screenshot of full client desktop | ov eval vnc screenshot or ov eval wl screenshot |

CDP Proxy Verification

Use cdp status → cdp open → cdp eval to verify proxy connectivity on instances:

# Check CDP is available
ov eval cdp status selkies-desktop -i 198.145.102.110

# Open a test page
ov eval cdp open selkies-desktop -i 198.145.102.110 "https://ip.me"

# Extract the detected IP (ip.me stores it in an input field)
ov eval cdp eval selkies-desktop -i 198.145.102.110 <tab-id> \
  "document.querySelector('#ip-lookup').value"
# → Should return 198.145.102.110 (the proxy IP)

This pattern works for any page content extraction via JS. The cdp eval command returns the expression's result directly.

Cross-References

/ov-build:eval -- parent router; ov eval cdp … is how every invocation is dispatched.
/ov-advanced:wl -- Wayland desktop automation (sibling verb under ov eval); also sway subgroup for compositor control.
/ov-advanced:vnc -- VNC desktop automation (sibling verb; same container, pixel-level interaction).
/ov-advanced:dbus -- D-Bus calls and notifications (sibling verb under ov eval).
/ov-core:shell -- Running commands in containers (--tty for OAuth flows)
/ov-core:config -- Instance deployment, proxy configuration, removal workflow
/ov-build:layer -- Chrome layer configuration (cdp-proxy service, port declarations)
/ov-selkies:selkies-desktop -- Full SPA DOM structure, coordinate mapping, session resilience
/ov-selkies:chrome-devtools-mcp -- MCP-based browser automation (29 tools via Streamable HTTP)
/ov-selkies:chrome -- Chrome layer with cdp-proxy, env_accepts (HTTP_PROXY)

When to Use This Skill

MUST be invoked when the task involves Chrome DevTools Protocol, ov eval cdp commands, browser automation, clicking elements, taking screenshots, or OAuth flows inside containers. Invoke this skill BEFORE reading source code or launching Explore agents.

Workflow position: Desktop automation. Use after a desktop container is running. Preferred over VNC for structured interaction. See also /ov-advanced:vnc (pixel), /ov-advanced:wl (sway subgroup) (window).

CDP - Chrome DevTools Protocol

Overview

Also as a declarative verb

Quick Reference

All commands accept -i INSTANCE for multi-instance support.

Architecture

Resolves the container name from image + instance (ov-<image>[-<instance>])
Discovers the mapped port 9222 via podman port / docker port
HTTP API (/json/list, /json/new?url=, /json/close/<id>) for list, open, and close
CDP WebSocket for interactive operations (click, type, eval, wait, text, html, screenshot, cdp)

Requirements

A Chrome layer with the cdp-proxy supervisord service
Chrome launched with --remote-allow-origins='*' and --remote-debugging-port=9223 (internal port)
Container must be running (ov start)

Commands

Open a URL

ov eval cdp open my-app "https://example.com"

Uses HTTP API: PUT /json/new?url=<encoded-url>. Returns the new tab ID.

List Tabs

ov eval cdp list my-app
# ID                                    TITLE                URL
# 7F8A3B2C...                          Example Domain       https://example.com/

Uses HTTP API: GET /json/list.

Close a Tab

ov eval cdp close my-app 7F8A3B2C...

Uses HTTP API: GET /json/close/<id>.

Get Page Content

ov eval cdp text my-app $TAB      # Plain text
ov eval cdp html my-app $TAB      # HTML source
ov eval cdp url my-app $TAB       # Title and URL

Uses CDP WebSocket: Runtime.evaluate with document.body.innerText / document.documentElement.outerHTML, Target.getTargetInfo.

Screenshot

ov eval cdp screenshot my-app $TAB              # Prints base64 to stdout
ov eval cdp screenshot my-app $TAB page.png     # Saves to file

Uses CDP: Page.captureScreenshot.

Click and Type

ov eval cdp click my-app $TAB 'button[type="submit"]'
ov eval cdp type my-app $TAB 'input[name="email"]' "[email protected]"

Coordinate Systems

ov eval cdp coords — Shows an element's coordinates in both systems:

ov eval cdp coords my-app $TAB '#sync-button'
# Element:  #sync-button (108x36)
# Viewport: x=1166 y=310  center=(1220, 328)
# Desktop:  x=1166 y=421  center=(1220, 439)  (via window.screenX/screenY, chromeHeight=107)
# Sway:     window at (4, 4) size 1912x1032  (app_id=google-chrome)

ov eval cdp click --vnc — Finds element via CDP selector, delivers click via VNC:

ov eval cdp click my-app $TAB '#sync-button' --vnc
# Clicked element at viewport (1220, 328) → desktop (1220, 439) via VNC

ov eval vnc click --from-cdp — Translates viewport coords to desktop coords:

ov eval vnc click my-app 1220 328 --from-cdp $TAB
# Translated viewport (1220, 328) → desktop (1220, 439) via CDP tab ...

Evaluate JavaScript

ov eval cdp eval my-app $TAB 'document.title'
ov eval cdp eval my-app $TAB 'JSON.stringify(localStorage)'

Uses CDP: Runtime.evaluate. Returns the result value.

Wait for Element

ov eval cdp wait my-app $TAB 'h1'                    # Default 30s timeout
ov eval cdp wait my-app $TAB '.loaded' --timeout 60s  # Custom timeout

Polls with CDP until the CSS selector matches an element.

Raw CDP Command

ov eval cdp raw my-app $TAB 'Page.navigate' '{"url":"https://example.com"}'
ov eval cdp raw my-app $TAB 'Runtime.evaluate' '{"expression":"1+1"}'

Sends arbitrary CDP method with optional JSON params. Returns raw CDP response.

CDP Connection Diagnostics

When ov eval cdp commands fail to connect, the diagnoseCDP() function runs automatically and provides targeted hints:

Chrome process check: Is Chrome running inside the container? (pgrep chrome)
Proxy status: Is the cdp-proxy forwarding to Chrome? (supervisorctl status cdp-proxy)
Port binding: Is Chrome listening on 127.0.0.1:9223? Is cdp-proxy listening on 0.0.0.0:9222? (ss -tlnp)

Hints direct users to ov eval wl sway exec <image> chrome-wrapper for manual Chrome restart (not ov shell with bare swaymsg, which may lack the correct SWAYSOCK path).

browser-open Script and BROWSER Env

OAuth Automation Example

Complete flow for deploying openclaw with Codex OAuth. All browser interactions must be VNC-visible — use --vnc flag on ov eval cdp click.

Critical: The openclaw models auth login TUI requires a real terminal. Do NOT pipe through tee or redirect stdout. Use ov tmux (see /ov-advanced:tmux).

IMG=openclaw-sway-browser  # or openclaw-ollama-sway-browser

# 1. Prerequisites: Chrome signed into Google with sync enabled
#    See /ov-openclaw:openclaw-ollama-sway-browser for full Chrome sign-in procedure

# 2. Start OAuth in a tmux session (real terminal)
ov tmux run $IMG -s oauth "openclaw models auth login --provider openai-codex --set-default"

# 3. Read OAuth URL from tmux output
sleep 5
OAUTH_URL=$(ov tmux capture $IMG -s oauth | grep -o 'https://auth.openai.com/[^ ]*')
ov eval cdp open $IMG "$OAUTH_URL"

# 4. Click "Continue with Google" (VNC-visible)
sleep 5
TAB=$(ov eval cdp list $IMG | grep -i "openai\|auth" | head -1 | awk '{print $1}')
ov eval cdp click $IMG $TAB 'button._buttonStyleFix_wvuha_65' --vnc

# 5. Click "Continue" on Codex consent page (VNC-visible)
sleep 5
ov eval cdp click $IMG $TAB 'button._primary_3rdp0_107' --vnc

# 6. Verify token exchange completed
sleep 10
ov tmux capture $IMG -s oauth
# Expected: "OpenAI OAuth complete", "Default model set to openai-codex/gpt-5.4"

# 7. Restart gateway
ov shell $IMG -c "supervisorctl restart openclaw"
ov shell $IMG -c "openclaw models status"

Tested selectors (OpenAI auth page as of 2026-03-21):

"Continue with Google": button._buttonStyleFix_wvuha_65 (first matching social button)
"Continue" (consent): button._primary_3rdp0_107 (black primary button)
These are CSS class selectors specific to OpenAI's auth UI — may change over time

Key enablers:

ov tmux run provides a real terminal for the TUI to complete the token exchange (see /ov-advanced:tmux)
ov eval cdp click --vnc finds elements via CDP, clicks via VNC (visible to user)
cdp-proxy makes Chrome DevTools accessible from host through podman bridge networking (with Host header rewriting)
shm_size: 1g prevents Chrome from crashing due to /dev/shm exhaustion
Callback at localhost:1455 is container-internal (no port mapping needed)

Stale port 1455: If a previous OAuth attempt left port 1455 occupied: ov shell $IMG -c 'kill -9 $(ss -tlnp sport = :1455 | grep -oP "pid=\K\d+")'

Source: ov/cdp.go, ov/vnc.go.

Google Sign-In Automation

Sign into a Google account inside a running container. Requires GMAIL_USER and GMAIL_PASSWORD environment variables (set in .env or passed via -e).

Step 0: Dismiss Chrome First-Run Dialog

# Focus the first-run dialog and dismiss it
ov eval wl sway msg my-app 'focus left'     # first-run dialog is typically the left window
ov eval vnc key my-app Return            # press OK

After dismissal, Chrome shows chrome://intro/ — "Sign in to Chrome" with shadow DOM buttons.

Step 1: Click "Sign in" on chrome://intro

chrome:// pages block CDP mouse events and JS .click(). Use --vnc click (CDP selector targeting + VNC pointer delivery):

TAB=$(ov eval cdp list my-app | grep intro | head -1 | awk '{print $1}')
ov eval cdp click my-app $TAB '#acceptSignInButton' --vnc

This opens a new tab with the Google sign-in page. Capture the new tab ID:

sleep 3
TAB=$(ov eval cdp list my-app | grep -i "sign in" | head -1 | awk '{print $1}')

Note: The tab ID survives Google's same-tab navigations (email → password → result).

Step 2: Enter Email (--vnc click + VNC type)

ov eval cdp wait my-app $TAB '#identifierId' --timeout 30s
ov eval cdp click my-app $TAB '#identifierId' --vnc    # focus field via VNC pointer
sleep 0.5                                          # let compositor process focus
ov eval vnc type my-app "$GMAIL_USER"                   # real keysym events

Use ov eval cdp coords my-app $TAB '#identifierId' to inspect element position in all three coordinate systems (viewport, desktop via CDP, desktop via sway) for debugging.

Step 3: Submit Email

ov eval cdp click my-app $TAB '#identifierNext' --vnc
sleep 5                                            # page transition
ov eval cdp url my-app $TAB                             # expect /challenge/pwd
ov eval cdp screenshot my-app $TAB step3.png            # verification checkpoint

Step 4: Enter Password

ov eval cdp wait my-app $TAB 'input[type="password"]' --timeout 15s
ov eval cdp click my-app $TAB 'input[type="password"]' --vnc
sleep 0.5
ov eval vnc type my-app "$GMAIL_PASSWORD"

Step 5: Submit Password

ov eval cdp click my-app $TAB '#passwordNext' --vnc
sleep 7                                            # backend verification
ov eval cdp screenshot my-app $TAB step5.png
ov eval vnc screenshot my-app step5-desktop.png         # catches native dialogs

Step 6: Enable Sync (chrome://sync-confirmation)

After successful sign-in, Chrome navigates to chrome://sync-confirmation/ — a chrome:// page (NOT a native dialog). CDP can see it but --vnc click is required:

TAB=$(ov eval cdp list my-app | grep -i sync-confirmation | head -1 | awk '{print $1}')
# If no sync-confirmation tab, it may already be on the current tab:
# TAB stays the same from step 5
ov eval cdp click my-app $TAB '#confirmButton' --vnc    # "Yes, I'm in"

Shadow DOM path: sync-confirmation-app > #confirmButton. Other buttons: #notNowButton ("No thanks"), #settingsButton ("Settings").

Handling Challenges

2FA/CAPTCHA: Take a VNC screenshot (ov eval vnc screenshot my-app challenge.png) and complete manually via a VNC client. App Passwords bypass most challenges.

Search engine choice: May appear as a new tab. Handle via CDP eval if present:

STAB=$(ov eval cdp list my-app | grep search-engine | head -1 | awk '{print $1}')
# If STAB is non-empty, select Google via shadow DOM eval

Sign-In Persistence

Cookies and sync state are stored in the chrome-data volume (~/.chrome-debug), persisting across container restarts. Use ov remove <image> --purge to clear for a fresh start.

Coordinate Translation for Sign-In

The --vnc flag on ov eval cdp click is essential for the sign-in flow:

chrome:// pages (intro, sync-confirmation): CDP mouse events and JS .click() are blocked. --vnc is the only way to click.
Google sign-in pages: --vnc delivers real pointer events that bypass anti-automation detection.
Coordinate math: viewport center + window.screenX + window.screenY + chromeHeight = desktop coords. On popup windows (no toolbar), chromeHeight=0.

Use ov eval cdp coords my-app $TAB '<selector>' to debug coordinate translation. It shows element position in viewport, desktop (via CDP), and desktop (via sway) systems.

SPA Remote Desktop Interaction (`ov eval cdp spa`)

SPA DOM Structure

input#overlayInput (z-index 3, opacity 0, pointer-events: auto) — invisible input overlay capturing all events
canvas#videoCanvas (z-index 2, pointer-events: none) — H.264 video render surface
Header controls (fullscreen, gaming mode) — hidden at left=-132px, slide in on mouse hover

Usage Example

IMG=sway-browser-vnc
TAB=$(ov eval cdp list $IMG | grep -i selkies | awk '{print $1}')

# Check SPA state
ov eval cdp spa status $IMG $TAB

# Click at canvas coordinates (where elements appear in CDP screenshots)
ov eval cdp spa click $IMG $TAB 990 375 --scale 0.824,0.836

# Type text (bypasses local compositor — no double-char issue)
ov eval cdp spa type $IMG $TAB "hello world"

# Send modifier combos that normally can't reach the remote desktop:
ov eval cdp spa key-combo $IMG $TAB super+e    # Open foot terminal in labwc
ov eval cdp spa key-combo $IMG $TAB ctrl+t     # New tab in REMOTE Chrome
ov eval cdp spa key-combo $IMG $TAB alt+f4     # Close window in labwc

# Send special keys
ov eval cdp spa key $IMG $TAB return
ov eval cdp spa key $IMG $TAB escape

Coordinate Scaling

Keyboard Architecture

When to use `spa` vs regular CDP vs VNC/WL

CDP Proxy Verification

Use cdp status → cdp open → cdp eval to verify proxy connectivity on instances:

# Check CDP is available
ov eval cdp status selkies-desktop -i 198.145.102.110

# Open a test page
ov eval cdp open selkies-desktop -i 198.145.102.110 "https://ip.me"

# Extract the detected IP (ip.me stores it in an input field)
ov eval cdp eval selkies-desktop -i 198.145.102.110 <tab-id> \
  "document.querySelector('#ip-lookup').value"
# → Should return 198.145.102.110 (the proxy IP)

This pattern works for any page content extraction via JS. The cdp eval command returns the expression's result directly.

Cross-References

/ov-build:eval -- parent router; ov eval cdp … is how every invocation is dispatched.
/ov-advanced:wl -- Wayland desktop automation (sibling verb under ov eval); also sway subgroup for compositor control.
/ov-advanced:vnc -- VNC desktop automation (sibling verb; same container, pixel-level interaction).
/ov-advanced:dbus -- D-Bus calls and notifications (sibling verb under ov eval).
/ov-core:shell -- Running commands in containers (--tty for OAuth flows)
/ov-core:config -- Instance deployment, proxy configuration, removal workflow
/ov-build:layer -- Chrome layer configuration (cdp-proxy service, port declarations)
/ov-selkies:selkies-desktop -- Full SPA DOM structure, coordinate mapping, session resilience
/ov-selkies:chrome-devtools-mcp -- MCP-based browser automation (29 tools via Streamable HTTP)
/ov-selkies:chrome -- Chrome layer with cdp-proxy, env_accepts (HTTP_PROXY)

Adoption

overthinkos/cdp

$ install --global

Security Scan Results

SKILL.md

CDP - Chrome DevTools Protocol

Overview

Also as a declarative verb

Quick Reference

Architecture

Requirements

Commands

Open a URL

List Tabs

Close a Tab

Get Page Content

Screenshot

Click and Type

Coordinate Systems

Evaluate JavaScript

Wait for Element

Raw CDP Command

CDP Connection Diagnostics

browser-open Script and BROWSER Env

OAuth Automation Example

Google Sign-In Automation

Step 0: Dismiss Chrome First-Run Dialog

Step 1: Click "Sign in" on chrome://intro

Step 2: Enter Email (--vnc click + VNC type)

Step 3: Submit Email

Step 4: Enter Password

Step 5: Submit Password

Step 6: Enable Sync (chrome://sync-confirmation)

Handling Challenges

Sign-In Persistence

Coordinate Translation for Sign-In

SPA Remote Desktop Interaction (ov eval cdp spa)

SPA DOM Structure

Usage Example

Coordinate Scaling

Keyboard Architecture

When to use spa vs regular CDP vs VNC/WL

CDP Proxy Verification

Cross-References

When to Use This Skill

Related Skills

overthinkos/plugin

overthinkos/cue

overthinkos/egress

overthinkos/check-k8s

overthinkos/cdp

$ install --global

Security Scan Results

SKILL.md

CDP - Chrome DevTools Protocol

Overview

Also as a declarative verb

Quick Reference

Architecture

Requirements

Commands

Open a URL

List Tabs

Close a Tab

Get Page Content

Screenshot

Click and Type

Coordinate Systems

Evaluate JavaScript

Wait for Element

Raw CDP Command

CDP Connection Diagnostics

browser-open Script and BROWSER Env

OAuth Automation Example

Google Sign-In Automation

Step 0: Dismiss Chrome First-Run Dialog

Step 1: Click "Sign in" on chrome://intro

Step 2: Enter Email (--vnc click + VNC type)

Step 3: Submit Email

Step 4: Enter Password

SPA Remote Desktop Interaction (`ov eval cdp spa`)

When to use `spa` vs regular CDP vs VNC/WL

SPA Remote Desktop Interaction (`ov eval cdp spa`)

When to use `spa` vs regular CDP vs VNC/WL