ov/skills/vnc/SKILL.md
MUST be invoked before any work involving: VNC automation, ov test vnc commands, RFB protocol desktop interaction, VNC screenshots, clicking coordinates, or VNC authentication.
npx skillsauth add overthinkos/overthink-plugins vncInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ov test vnc commands connect to VNC servers (RFB protocol on port tcp:5900) inside running containers. Provides screenshot capture, keyboard/mouse input, and VNC password management for Wayland desktop automation via wayvnc.
Every ov test vnc <method> (status/screenshot/click/mouse/type/key/rfb/passwd) is authorable as a vnc: verb inside a tests: block. Method-specific fields (x, y, text, key, artifact, artifact_min_bytes) are siblings of the verb line. See /ov:test for the full YAML shape. Example: - vnc: screenshot\n artifact: /tmp/vnc.png\n artifact_min_bytes: 5000.
| Action | Command | Description |
|--------|---------|-------------|
| Screenshot | ov test vnc screenshot <image> [file] | Capture VNC framebuffer as PNG |
| Click | ov test vnc click <image> <x> <y> | Click at x,y coordinates |
| Type text | ov test vnc type <image> <text> | Send keyboard input as key events |
| Send key | ov test vnc key <image> <key-name> | Press a special key (Return, Escape, etc.) |
| Move mouse | ov test vnc mouse <image> <x> <y> | Move mouse without clicking |
| Status | ov test vnc status <image> | Check VNC server, show resolution and desktop name |
| Set password | ov test vnc passwd <image> | Set VNC auth password for deployment |
| Raw RFB | ov test vnc rfb <image> <method> [json] | Send raw RFB protocol message |
CLI command -> resolveVNCContainer (engine + container name)
-> resolveVNCAddress (docker/podman port <name> 5900)
-> resolveVNCPassword (ov settings + VNC_PASSWORD env)
-> NewVNCClient(address, password) -> RFB handshake -> operation
Custom RFC 6143 VNC client implementation (no external dependency). Supports None, VNC auth (DES), and VeNCrypt (TLS + sub-auth) security types.
wayvnc layer (port tcp:5900)ov start)ov test vnc screenshot openclaw-sway-browser # saves screenshot.png
ov test vnc screenshot openclaw-sway-browser desktop.png # custom filename
ov test vnc screenshot openclaw-sway-browser -i prod # specific instance
ov test vnc click openclaw-sway-browser 960 540 # left click at center of 1920x1080
ov test vnc click openclaw-sway-browser 100 200 --button right # right click
ov test vnc click openclaw-sway-browser 100 200 --button middle # middle click
ov test vnc click openclaw-sway-browser 100 200 --from-cdp $TAB # translate from CDP viewport
ov test vnc click openclaw-sway-browser 100 200 --from-sway google-chrome # translate from sway window
ov test vnc click openclaw-sway-browser 100 200 --from-x11 Steam # translate from X11 window (XWayland)
--from-x11 <class-or-title> translates coordinates from X11 window-internal space to desktop-absolute VNC coordinates. Works the same as ov test wl click --from-x11 -- queries X11 geometry via xdotool, finds the sway node, and scales to desktop coordinates. Essential for XWayland windows (Steam, Heroic) where the X11 resolution differs from the compositor resolution.
ov test vnc type openclaw-sway-browser "hello world" # types each character as key events
Only supports ASCII/Latin-1 characters. For special keys, use ov test vnc key.
ov test vnc key openclaw-sway-browser Return # press Enter
ov test vnc key openclaw-sway-browser Escape # press Escape
ov test vnc key openclaw-sway-browser Tab # press Tab
ov test vnc key openclaw-sway-browser F5 # press F5
ov test vnc key openclaw-sway-browser Control_L # press left Ctrl
Valid key names: Return, Escape, Tab, BackSpace, Delete, Home, End, Page_Up, Page_Down, Up, Down, Left, Right, Insert, F1-F12, Shift_L, Shift_R, Control_L, Control_R, Alt_L, Alt_R, Super_L, Super_R, Meta_L, Meta_R, Caps_Lock, space.
ov test vnc mouse openclaw-sway-browser 500 300 # move mouse to (500, 300)
ov test vnc status openclaw-sway-browser
# Output:
# Desktop: sway
# Resolution: 1920x1080
ov test vnc passwd openclaw-sway-browser # prompts for password
ov test vnc passwd openclaw-sway-browser --generate # generates random password, prints to stdout
Sets up VNC authentication (VeNCrypt/TLS):
secret_backend setting) as vnc.password.<image>$HOME inside container for absolute config paths-traditional flag for OpenSSL 3.x) if not present~/.config/wayvnc/config with enable_auth=true (wayvnc reads this automatically)After setting a password, all ov test vnc commands authenticate transparently via VeNCrypt/TLS.
When connecting, password is resolved in this order:
VNC_PASSWORD environment variable (CI/automation override)vnc.password.<image>-<instance> (when secret_backend=auto or keyring)vnc.password.<image>-<instance> (instance-specific)vnc.password.<image> (image-level)# One-off password override via env
VNC_PASSWORD=secret ov test vnc screenshot openclaw-sway-browser out.png
# Set password programmatically (alternative to ov test vnc passwd)
ov settings set vnc.password.openclaw-sway-browser mysecret
# Instance-specific password
ov settings set vnc.password.openclaw-sway-browser-prod prodpassword
Requires openssl inside the container for TLS cert and RSA key generation.
ov test vnc rfb openclaw-sway-browser key '{"key": 65293, "down": true}' # raw key event
ov test vnc rfb openclaw-sway-browser pointer '{"x": 100, "y": 200, "button": 1}' # raw pointer
ov test vnc rfb openclaw-sway-browser cut-text '{"text": "clipboard"}' # clipboard
ov test vnc rfb openclaw-sway-browser fbupdate-request # get dimensions
| Aspect | ov test cdp (CDP) | ov test vnc (RFB) |
|--------|----------------|----------------|
| Protocol | WebSocket JSON | Binary TCP |
| Scope | Browser tabs | Whole desktop |
| Click | CSS selector (viewport-relative) | x,y coordinates (desktop-absolute) |
| Type | CDP key events | Key events (keysyms) |
| Screenshot | Browser page only | Full desktop |
| JavaScript | Yes (eval/wait) | No |
| Use case | Web automation | Desktop automation |
Source: ov/vnc_client.go, ov/vnc.go.
Some websites (notably Google sign-in) detect and block CDP-based input. VNC provides a reliable fallback because ov test vnc type sends real X11 keysym events through the Wayland compositor — indistinguishable from physical keyboard input.
CDP + VNC Hybrid Pattern: Use ov test cdp click --vnc for clicking (CDP selector precision + VNC pointer delivery) and ov test vnc type for typing credentials:
# --vnc click: CDP finds element by selector, delivers click via VNC pointer
ov test cdp click my-app $TAB '#identifierId' --vnc
sleep 0.5 # let compositor process focus
# VNC type sends real key events through the compositor
ov test vnc type my-app "$GMAIL_USER"
Tested timing: 500ms sleep between --vnc click and VNC type is sufficient. No characters were dropped at this timing during Google sign-in testing.
When to use --vnc click and VNC type:
chrome:// pages (required): CDP mouse events and JS .click() are blocked on Chrome's privileged pages (chrome://intro/, chrome://sync-confirmation/, chrome://settings/). --vnc is the only way to click.Chrome first-run dialogs: On fresh profiles, Chrome opens a first-run dialog as a separate window invisible to CDP. Dismiss with ov test wl sway msg my-app 'focus left' then ov test vnc key my-app Return.
See /ov:cdp for the full Google sign-in recipe.
VNC uses desktop-absolute coordinates, while CDP returns viewport-relative coordinates. Use the --from-cdp or --from-sway flags to explicitly translate:
--from-cdp <tab-id> — Translates viewport coords to desktop coords via CDP's window.screenX/screenY:
# Get viewport coords from ov test cdp coords, then click via VNC
ov test vnc click my-app 1220 328 --from-cdp $TAB
# Translated viewport (1220, 328) → desktop (1220, 439) via CDP tab ...
--from-sway <app-id> — Translates window-relative coords to desktop coords via sway tree:
ov test vnc click my-app 500 200 --from-sway google-chrome
# Translated window-relative (500, 200) → desktop (504, 204) via sway app_id=google-chrome
Without flags, X and Y are desktop-absolute coordinates (the default, unchanged behavior).
VNC screenshots work correctly on NVIDIA headless for images using sway-desktop-vnc (the standard VNC composition). Two fixes enable this:
sway-desktop-vnc forces WLR_RENDERER=pixman (software rendering), producing buffers wayvnc can reliably capturewayvnc-wrapper triggers the missing headless power event that wayvnc 0.9.1 waits for before starting captureBoth ov test vnc screenshot and ov test wl screenshot work on NVIDIA headless:
ov test vnc screenshot <image> out.png # VNC screenshot (works with pixman + DPMS fix)
ov test wl screenshot <image> out.png # Wayland screenshot (grim, always works)
/ov:test — parent router; ov test vnc … is how every invocation is dispatched./ov:wl — Wayland-native desktop automation (sibling verb; works on NVIDIA headless)./ov:cdp — Chrome DevTools Protocol automation (sibling verb; same container, different protocol)./ov:dbus — D-Bus calls and desktop notifications (sibling verb under ov test)./ov:wl (sway subgroup) — Sway compositor control (window management, workspaces)/ov:config — VNC password storage, secret_backend setting, migrate-secrets command/ov:service — Managing wayvnc supervisord service/ov:deploy — VNC password setup in deployment workflows/ov:shell — Executing commands inside containers/ov:layer — wayvnc layer configuration (port tcp:5900)MUST be invoked when the task involves VNC automation, ov test vnc commands, RFB protocol desktop interaction, VNC screenshots, clicking coordinates, or VNC authentication. Invoke this skill BEFORE reading source code or launching Explore agents.
Workflow position: Desktop automation. Use for pixel-level interaction when CDP can't reach the element. See also /ov:cdp (DOM, preferred), /ov:wl (sway subgroup) (window).
development
Claude Code multi-agent support in Overthink — sub-agents, dynamic workflows, and agent teams, and how each drives the existing `ov eval` disposable beds to test and verify. MUST be invoked before authoring or invoking an ov sub-agent / dynamic workflow / agent team, wiring agent-lifecycle hooks, or asking "which primitive should drive the R10 beds?".
tools
Mounts a virtiofs share tagged `workspace` at /workspace inside a VM guest via a systemd .mount unit. Use when a kind:vm entity shares a host directory into the guest and you need it auto-mounted (and re-mounted at every boot).
development
MUST be invoked before any work involving: the `kind: android` schema kind, a `target: android` deploy, the `apk:` layer package format (installing Android apps declaratively), AndroidDeployTarget, an in-pod emulator OR a remote/physical adb-endpoint device, or nested `pod → android` deployment. The first-class Android device + app surface that sits above `ov eval adb`/`appium`.
tools
Use when committing, branching, pushing, merging, tagging, creating PRs, or approving/merging PRs with gh — the feat/-branch, R10-gated, never-force-push landing workflow across the main repo + the plugins submodule + image/<distro> submodules. Covers sync-to-upstream, branch/worktree pruning, the fork+PR path for contributors without write access, and cross-repo @github landing order.