eval/skills/cdp/SKILL.md
MUST be invoked before any work involving: Chrome DevTools Protocol, charly eval cdp commands, browser automation, clicking elements, taking screenshots, or OAuth flows inside containers.
npx skillsauth add overthinkos/overthink-plugins cdpInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
charly eval cdp commands connect to Chrome DevTools Protocol (CDP) on port 9222 inside running containers. Provides HTTP API operations (open, list, close tabs) and WebSocket CDP operations (click, type, eval, wait, text, html, screenshot) for headless browser automation.
Every charly eval cdp <method> is authorable as a cdp: verb inside a eval: block. The method name becomes the verb's YAML value; method-specific args are sibling fields (tab:, expression:, url:, selector:, etc.). Shared matchers (stdout:, stderr:, exit_status:, artifact_min_bytes:) work like other verbs. See /charly-eval:eval for the full method allowlist and YAML shape. Example: - cdp: eval\n tab: "1"\n expression: "document.title"\n stdout: "Dashboard".
| Action | Command | Description |
|--------|---------|-------------|
| Open URL | charly eval cdp open <image> <url> | Open URL in new Chrome tab |
| List tabs | charly eval cdp list <image> | List all open tabs (id, title, url) |
| Close tab | charly eval cdp close <image> <tab-id> | Close a tab by ID |
| Get text | charly eval cdp text <image> <tab-id> | Get page text content |
| Get HTML | charly eval cdp html <image> <tab-id> | Get page HTML source |
| Get URL | charly eval cdp url <image> <tab-id> | Get page title and URL |
| Screenshot | charly eval cdp screenshot <image> <tab-id> [file] | Capture PNG screenshot |
| Click | charly eval cdp click <image> <tab-id> <selector> [--vnc] | Click element by CSS selector |
| Coords | charly eval cdp coords <image> <tab-id> <selector> | Show element coords in viewport + desktop |
| Type | charly eval cdp type <image> <tab-id> <selector> <text> | Type into input field |
| Eval JS | charly eval cdp eval <image> <tab-id> <expression> | Evaluate JavaScript |
| Wait | charly eval cdp wait <image> <tab-id> <selector> | Wait for element (--timeout 30s) |
| Raw CDP | charly eval cdp raw <image> <tab-id> <method> [json] | Send raw CDP command |
| Status | charly eval cdp status <image> | Check CDP availability, show port and tab count |
| SPA click | charly eval cdp spa click <image> <tab> <x> <y> [--scale] | Click at canvas coords with SPA scale correction |
| SPA type | charly eval cdp spa type <image> <tab> <text> | Type text via SPA (bypasses local compositor/Chrome) |
| SPA key | charly eval cdp spa key <image> <tab> <key> | Send key press via SPA (Return, Escape, F1-F12, etc.) |
| SPA key-combo | charly eval cdp spa key-combo <image> <tab> <combo> | Send modifier combo via SPA (super+e, ctrl+t, alt+F4) |
| SPA mouse | charly eval cdp spa mouse <image> <tab> <x> <y> [--scale] | Move pointer with SPA scale correction |
| SPA status | charly eval cdp spa status <image> <tab> | Show SPA state (canvas, overlay, decoders) |
All commands accept -i INSTANCE for multi-instance support.
charly-<image>[-<instance>])podman port / docker port/json/list, /json/new?url=, /json/close/<id>) for list, open, and closecdp-proxy supervisord service--remote-allow-origins='*' and --remote-debugging-port=9223 (internal port)charly start)The cdp-proxy is essential because Chrome 146+ binds DevTools only to 127.0.0.1 and rejects connections with non-localhost Host headers. Chrome binds to 127.0.0.1:9223 internally. The cdp-proxy Python script listens on 0.0.0.0:9222 and forwards to Chrome with Host header rewriting. It also rewrites response URLs (webSocketDebuggerUrl: ws://localhost:9223/... to ws://<client-host>:9222/...) with Content-Length correction, ensuring CDP WebSocket connections work correctly from the host.
charly eval cdp open my-app "https://example.com"
Uses HTTP API: PUT /json/new?url=<encoded-url>. Returns the new tab ID.
charly eval cdp list my-app
# ID TITLE URL
# 7F8A3B2C... Example Domain https://example.com/
Uses HTTP API: GET /json/list.
charly eval cdp close my-app 7F8A3B2C...
Uses HTTP API: GET /json/close/<id>.
charly eval cdp text my-app $TAB # Plain text
charly eval cdp html my-app $TAB # HTML source
charly eval cdp url my-app $TAB # Title and URL
Uses CDP WebSocket: Runtime.evaluate with document.body.innerText / document.documentElement.outerHTML, Target.getTargetInfo.
charly eval cdp screenshot my-app $TAB # Prints base64 to stdout
charly eval cdp screenshot my-app $TAB page.png # Saves to file
Uses CDP: Page.captureScreenshot.
charly eval cdp click my-app $TAB 'button[type="submit"]'
charly eval cdp type my-app $TAB 'input[name="email"]' "[email protected]"
Click uses CDP: Runtime.evaluate with deepQuery() to find element (piercing shadow DOM), scrollIntoViewIfNeeded() + getBoundingClientRect() for coordinates, Input.dispatchMouseEvent for click. Type uses deepQuery() + scrollIntoViewIfNeeded() + focus() to select the element, then Input.dispatchKeyEvent for each character (keyDown, char, keyUp — matching Puppeteer behavior).
Shadow DOM support: All selector-based commands (click, type, wait) automatically pierce shadow DOM boundaries via recursive deepQuery(). This means selectors work on Chrome's internal pages (chrome://settings/*), Polymer/Lit web components, and any page using Web Components with shadow DOM. Hidden/zero-sized elements are skipped — only visible matches are returned.
Note on Chrome internal dialogs: Some Chrome UI elements (e.g., the "Turn on sync" confirmation dialog) are rendered as native browser chrome, invisible to CDP. Use VNC keyboard (charly eval vnc key ... Tab, charly eval vnc key ... Return) to interact with these dialogs. VNC screenshots (charly eval vnc screenshot) show the full desktop including these dialogs, while CDP screenshots only show the page viewport.
CDP coordinates are viewport-relative (relative to Chrome's content area). VNC coordinates are desktop-absolute (the full Wayland framebuffer). The offset between them includes Chrome's window position on the desktop plus Chrome's UI chrome (title bar, tab bar, address bar — typically ~107px).
charly eval cdp coords — Shows an element's coordinates in both systems:
charly eval cdp coords my-app $TAB '#sync-button'
# Element: #sync-button (108x36)
# Viewport: x=1166 y=310 center=(1220, 328)
# Desktop: x=1166 y=421 center=(1220, 439) (via window.screenX/screenY, chromeHeight=107)
# Sway: window at (4, 4) size 1912x1032 (app_id=google-chrome)
charly eval cdp click --vnc — Finds element via CDP selector, delivers click via VNC:
charly eval cdp click my-app $TAB '#sync-button' --vnc
# Clicked element at viewport (1220, 328) → desktop (1220, 439) via VNC
charly eval vnc click --from-cdp — Translates viewport coords to desktop coords:
charly eval vnc click my-app 1220 328 --from-cdp $TAB
# Translated viewport (1220, 328) → desktop (1220, 439) via CDP tab ...
charly eval cdp eval my-app $TAB 'document.title'
charly eval cdp eval my-app $TAB 'JSON.stringify(localStorage)'
Uses CDP: Runtime.evaluate. Returns the result value.
charly eval cdp wait my-app $TAB 'h1' # Default 30s timeout
charly eval cdp wait my-app $TAB '.loaded' --timeout 60s # Custom timeout
Polls with CDP until the CSS selector matches an element.
charly eval cdp raw my-app $TAB 'Page.navigate' '{"url":"https://example.com"}'
charly eval cdp raw my-app $TAB 'Runtime.evaluate' '{"expression":"1+1"}'
Sends arbitrary CDP method with optional JSON params. Returns raw CDP response.
When charly eval cdp commands fail to connect, the diagnoseCDP() function runs automatically and provides targeted hints:
pgrep chrome)supervisorctl status cdp-proxy)ss -tlnp)Hints direct users to charly eval wl sway exec <image> chrome-wrapper for manual Chrome restart (not charly shell with bare swaymsg, which may lack the correct SWAYSOCK path).
Images with Chrome include a browser-open script and set BROWSER=browser-open in the environment. When CLI tools inside the container call xdg-open or use the $BROWSER variable to open a URL, it routes through CDP to open the URL in the running Chrome instance.
Complete flow for deploying openclaw with Codex OAuth. All browser interactions must be VNC-visible — use --vnc flag on charly eval cdp click.
Critical: The openclaw models auth login TUI requires a real terminal. Do NOT pipe through tee or redirect stdout. Use charly tmux (see /charly-automation:tmux).
IMG=sway-browser-vnc # any image composing chrome-cdp + a Wayland desktop + VNC
# 1. Prerequisites: Chrome signed into Google with sync enabled
# See /charly-automation:openclaw-deploy for full Chrome sign-in procedure
# 2. Start OAuth in a tmux session (real terminal)
charly tmux run $IMG -s oauth "openclaw models auth login --provider openai-codex --set-default"
# 3. Read OAuth URL from tmux output
sleep 5
OAUTH_URL=$(charly tmux capture $IMG -s oauth | grep -o 'https://auth.openai.com/[^ ]*')
charly eval cdp open $IMG "$OAUTH_URL"
# 4. Click "Continue with Google" (VNC-visible)
sleep 5
TAB=$(charly eval cdp list $IMG | grep -i "openai\|auth" | head -1 | awk '{print $1}')
charly eval cdp click $IMG $TAB 'button._buttonStyleFix_wvuha_65' --vnc
# 5. Click "Continue" on Codex consent page (VNC-visible)
sleep 5
charly eval cdp click $IMG $TAB 'button._primary_3rdp0_107' --vnc
# 6. Verify token exchange completed
sleep 10
charly tmux capture $IMG -s oauth
# Expected: "OpenAI OAuth complete", "Default model set to openai-codex/gpt-5.4"
# 7. Restart gateway
charly service restart $IMG openclaw
charly shell $IMG -c "openclaw models status"
Tested selectors (OpenAI auth page):
button._buttonStyleFix_wvuha_65 (first matching social button)button._primary_3rdp0_107 (black primary button)Key enablers:
charly tmux run provides a real terminal for the TUI to complete the token exchange (see /charly-automation:tmux)charly eval cdp click --vnc finds elements via CDP, clicks via VNC (visible to user)cdp-proxy makes Chrome DevTools accessible from host through podman bridge networking (with Host header rewriting)shm_size: 1g prevents Chrome from crashing due to /dev/shm exhaustionlocalhost:1455 is container-internal (no port mapping needed)Stale port 1455: If a previous OAuth attempt left port 1455 occupied: charly shell $IMG -c 'kill -9 $(ss -tlnp sport = :1455 | grep -oP "pid=\K\d+")'
Source: charly/cdp.go, charly/vnc.go.
Sign into a Google account inside a running container. Requires GMAIL_USER and GMAIL_PASSWORD environment variables (set in .env or passed via -e).
App Passwords required: Google accounts with 2FA (now mandatory for most accounts) require a 16-character App Password. App Passwords bypass all verification challenges and 2FA prompts — use them by default for automated sign-in.
Fresh profile prerequisite: A fresh chrome-data volume triggers Chrome's first-run flow. Use charly remove <image> --purge before charly config to ensure a clean start. Just rebuilding the image does not reset named volumes.
On a fresh profile, Chrome opens a first-run dialog ("Make Google Chrome the default browser") as a separate window that CDP cannot see (no debuggable tabs). It tiles alongside any CDP-opened tabs in sway, breaking coordinate translation.
# Focus the first-run dialog and dismiss it
charly eval wl sway msg my-app 'focus left' # first-run dialog is typically the left window
charly eval vnc key my-app Return # press OK
After dismissal, Chrome shows chrome://intro/ — "Sign in to Chrome" with shadow DOM buttons.
chrome:// pages block CDP mouse events and JS .click(). Use --vnc click (CDP selector targeting + VNC pointer delivery):
TAB=$(charly eval cdp list my-app | grep intro | head -1 | awk '{print $1}')
charly eval cdp click my-app $TAB '#acceptSignInButton' --vnc
Shadow DOM path: intro-app > sign-in-promo > #acceptSignInButton. The --vnc flag uses deepQuery to find the element, translates viewport coords to desktop coords via window.screenX/screenY, and delivers the click through VNC.
This opens a new tab with the Google sign-in page. Capture the new tab ID:
sleep 3
TAB=$(charly eval cdp list my-app | grep -i "sign in" | head -1 | awk '{print $1}')
Note: The tab ID survives Google's same-tab navigations (email → password → result).
charly eval cdp wait my-app $TAB '#identifierId' --timeout 30s
charly eval cdp click my-app $TAB '#identifierId' --vnc # focus field via VNC pointer
sleep 0.5 # let compositor process focus
charly eval vnc type my-app "$GMAIL_USER" # real keysym events
Use charly eval cdp coords my-app $TAB '#identifierId' to inspect element position in all three coordinate systems (viewport, desktop via CDP, desktop via sway) for debugging.
charly eval cdp click my-app $TAB '#identifierNext' --vnc
sleep 5 # page transition
charly eval cdp url my-app $TAB # expect /challenge/pwd
charly eval cdp screenshot my-app $TAB step3.png # verification checkpoint
charly eval cdp wait my-app $TAB 'input[type="password"]' --timeout 15s
charly eval cdp click my-app $TAB 'input[type="password"]' --vnc
sleep 0.5
charly eval vnc type my-app "$GMAIL_PASSWORD"
charly eval cdp click my-app $TAB '#passwordNext' --vnc
sleep 7 # backend verification
charly eval cdp screenshot my-app $TAB step5.png
charly eval vnc screenshot my-app step5-desktop.png # catches native dialogs
After successful sign-in, Chrome navigates to chrome://sync-confirmation/ — a chrome:// page (NOT a native dialog). CDP can see it but --vnc click is required:
TAB=$(charly eval cdp list my-app | grep -i sync-confirmation | head -1 | awk '{print $1}')
# If no sync-confirmation tab, it may already be on the current tab:
# TAB stays the same from step 5
charly eval cdp click my-app $TAB '#confirmButton' --vnc # "Yes, I'm in"
Shadow DOM path: sync-confirmation-app > #confirmButton. Other buttons: #notNowButton ("No thanks"), #settingsButton ("Settings").
2FA/CAPTCHA: Take a VNC screenshot (charly eval vnc screenshot my-app challenge.png) and complete manually via a VNC client. App Passwords bypass most challenges.
Search engine choice: May appear as a new tab. Handle via CDP eval if present:
STAB=$(charly eval cdp list my-app | grep search-engine | head -1 | awk '{print $1}')
# If STAB is non-empty, select Google via shadow DOM eval
Cookies and sync state are stored in the chrome-data volume (~/.chrome-debug), persisting across container restarts. Use charly remove <image> --purge to clear for a fresh start.
The --vnc flag on charly eval cdp click is essential for the sign-in flow:
chrome:// pages (intro, sync-confirmation): CDP mouse events and JS .click() are blocked. --vnc is the only way to click.--vnc delivers real pointer events that bypass anti-automation detection.window.screenX + window.screenY + chromeHeight = desktop coords. On popup windows (no toolbar), chromeHeight=0.Use charly eval cdp coords my-app $TAB '<selector>' to debug coordinate translation. It shows element position in viewport, desktop (via CDP), and desktop (via sway) systems.
charly eval cdp spa)The charly eval cdp spa subcommands provide first-class support for interacting with Selkies-style remote desktop SPAs. These bypass the local compositor and Chrome shortcut handlers — the only way to send Super+e, Ctrl+T, or Alt+F4 to the remote desktop.
input#overlayInput (z-index 3, opacity 0, pointer-events: auto) — invisible input overlay capturing all eventscanvas#videoCanvas (z-index 2, pointer-events: none) — H.264 video render surfaceIMG=sway-browser-vnc
TAB=$(charly eval cdp list $IMG | grep -i selkies | awk '{print $1}')
# Check SPA state
charly eval cdp spa status $IMG $TAB
# Click at canvas coordinates (where elements appear in CDP screenshots)
charly eval cdp spa click $IMG $TAB 990 375 --scale 0.824,0.836
# Type text (bypasses local compositor — no double-char issue)
charly eval cdp spa type $IMG $TAB "hello world"
# Send modifier combos that normally can't reach the remote desktop:
charly eval cdp spa key-combo $IMG $TAB super+e # Open foot terminal in labwc
charly eval cdp spa key-combo $IMG $TAB ctrl+t # New tab in REMOTE Chrome
charly eval cdp spa key-combo $IMG $TAB alt+f4 # Close window in labwc
# Send special keys
charly eval cdp spa key $IMG $TAB return
charly eval cdp spa key $IMG $TAB escape
The SPA maps mouse events from canvas to remote desktop with an internal scaling factor. Use --scale scaleX,scaleY to correct: a click at canvas position (x, y) is sent to (x/scaleX, y/scaleY). Determine the scale empirically by comparing charly eval cdp spa click cursor position (via charly eval cdp screenshot) with the target.
charly eval cdp spa type/key/key-combo sends Input.dispatchKeyEvent directly to the page. The SPA's onkeydown handler on #overlayInput (with stopImmediatePropagation) captures these and forwards to the remote compositor via WebSocket. Only keyDown + keyUp are sent (no "char" event) to prevent double input.
spa vs regular CDP vs VNC/WL| Scenario | Command |
|----------|---------|
| Click/type in a web page | charly eval cdp click/type (CSS selector targeting) |
| Click/type in a remote desktop via SPA | charly eval cdp spa click/type (canvas coordinates) |
| Send Super+key or Ctrl+T to remote desktop | charly eval cdp spa key-combo (only option that works) |
| Click in local compositor | charly eval wl click or charly eval vnc click |
| Take screenshot of stream content | charly eval cdp screenshot (captures canvas) |
| Take screenshot of full client desktop | charly eval vnc screenshot or charly eval wl screenshot |
Use cdp status → cdp open → cdp eval to verify proxy connectivity on instances:
# Check CDP is available
charly eval cdp status selkies-desktop -i 198.145.102.110
# Open a test page
charly eval cdp open selkies-desktop -i 198.145.102.110 "https://ip.me"
# Extract the detected IP (ip.me stores it in an input field)
charly eval cdp eval selkies-desktop -i 198.145.102.110 <tab-id> \
"document.querySelector('#ip-lookup').value"
# → Should return 198.145.102.110 (the proxy IP)
This pattern works for any page content extraction via JS. The cdp eval command returns the expression's result directly.
/charly-eval:eval -- parent router; charly eval cdp … is how every invocation is dispatched./charly-eval:wl -- Wayland desktop automation (sibling verb under charly eval); also sway subgroup for compositor control./charly-eval:vnc -- VNC desktop automation (sibling verb; same container, pixel-level interaction)./charly-eval:dbus -- D-Bus calls and notifications (sibling verb under charly eval)./charly-core:shell -- Running commands in containers (--tty for OAuth flows)/charly-core:charly-config -- Instance deployment, proxy configuration, removal workflow/charly-image:layer -- Chrome candy configuration (cdp-proxy service, port declarations)/charly-selkies:selkies-labwc -- Full SPA DOM structure, coordinate mapping, session resilience/charly-selkies:chrome-devtools-mcp -- MCP-based browser automation (29 tools via Streamable HTTP)/charly-selkies:chrome -- Chrome candy with cdp-proxy, env_accept (HTTP_PROXY)MUST be invoked when the task involves Chrome DevTools Protocol, charly eval cdp commands, browser automation, clicking elements, taking screenshots, or OAuth flows inside containers. Invoke this skill BEFORE reading source code or launching Explore agents.
Workflow position: Desktop automation. Use after a desktop container is running. Preferred over VNC for structured interaction. See also /charly-eval:vnc (pixel), /charly-eval:wl (sway subgroup) (window).
tools
OpenCharly CLI (charly) binary installed into container/VM images for in-container use. Use when working with charly binary deployment inside containers, native D-Bus support, or the full charly toolchain (charly binary + virtualization + gocryptfs + socat).
development
Operator CachyOS workstation profile — a kind:local template + target:local deploy that installs the full dev stack (30 candies) onto a CachyOS host via ShellExecutor. Lives in the overthinkos/cachyos submodule. MUST be invoked before editing or applying the charly-cachyos workstation profile.
tools
Fedora box with the full charly toolchain using shared candies. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Same candy list as charly-arch. Includes NVIDIA GPU runtime. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-fedora box.
tools
Arch Linux box with the full charly toolchain. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Composes /charly-coder:charly-mcp so the box is reachable as an MCP gateway on port 18765. NVIDIA GPU runtime composed in. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-arch box.