eval/skills/vnc/SKILL.md
MUST be invoked before any work involving: VNC automation, charly eval vnc commands, RFB protocol desktop interaction, VNC screenshots, clicking coordinates, or VNC authentication.
npx skillsauth add overthinkos/overthink-plugins vncInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
charly eval vnc commands connect to VNC servers (RFB protocol on port tcp:5900) inside running containers. Provides screenshot capture, keyboard/mouse input, and VNC password management for Wayland desktop automation via wayvnc.
Every charly eval vnc <method> (status/screenshot/click/mouse/type/key/rfb/passwd) is authorable as a vnc: verb inside a eval: block. Method-specific fields (x, y, text, key, artifact, artifact_min_bytes) are siblings of the verb line. See /charly-eval:eval for the full YAML shape. Example: - vnc: screenshot\n artifact: /tmp/vnc.png\n artifact_min_bytes: 5000.
| Action | Command | Description |
|--------|---------|-------------|
| Screenshot | charly eval vnc screenshot <image> [file] | Capture VNC framebuffer as PNG |
| Click | charly eval vnc click <image> <x> <y> | Click at x,y coordinates |
| Type text | charly eval vnc type <image> <text> | Send keyboard input as key events |
| Send key | charly eval vnc key <image> <key-name> | Press a special key (Return, Escape, etc.) |
| Move mouse | charly eval vnc mouse <image> <x> <y> | Move mouse without clicking |
| Status | charly eval vnc status <image> | Check VNC server, show resolution and desktop name |
| Set password | charly eval vnc passwd <image> | Set VNC auth password for deployment |
| Raw RFB | charly eval vnc rfb <image> <method> [json] | Send raw RFB protocol message |
CLI command -> resolveVNCContainer (engine + container name)
-> resolveVNCAddress (docker/podman port <name> 5900)
-> resolveVNCPassword (charly settings + VNC_PASSWORD env)
-> NewVNCClient(address, password) -> RFB handshake -> operation
Custom RFC 6143 VNC client implementation (no external dependency). Supports None, VNC auth (DES), and VeNCrypt (TLS + sub-auth) security types.
wayvnc layer (port tcp:5900)charly start)charly eval vnc screenshot sway-browser-vnc # saves screenshot.png
charly eval vnc screenshot sway-browser-vnc desktop.png # custom filename
charly eval vnc screenshot sway-browser-vnc -i prod # specific instance
charly eval vnc click sway-browser-vnc 960 540 # left click at center of 1920x1080
charly eval vnc click sway-browser-vnc 100 200 --button right # right click
charly eval vnc click sway-browser-vnc 100 200 --button middle # middle click
charly eval vnc click sway-browser-vnc 100 200 --from-cdp $TAB # translate from CDP viewport
charly eval vnc click sway-browser-vnc 100 200 --from-sway google-chrome # translate from sway window
charly eval vnc click sway-browser-vnc 100 200 --from-x11 Steam # translate from X11 window (XWayland)
--from-x11 <class-or-title> translates coordinates from X11 window-internal space to desktop-absolute VNC coordinates. Works the same as charly eval wl click --from-x11 -- queries X11 geometry via xdotool, finds the sway node, and scales to desktop coordinates. Essential for XWayland windows (Steam, Heroic) where the X11 resolution differs from the compositor resolution.
charly eval vnc type sway-browser-vnc "hello world" # types each character as key events
Only supports ASCII/Latin-1 characters. For special keys, use charly eval vnc key.
charly eval vnc key sway-browser-vnc Return # press Enter
charly eval vnc key sway-browser-vnc Escape # press Escape
charly eval vnc key sway-browser-vnc Tab # press Tab
charly eval vnc key sway-browser-vnc F5 # press F5
charly eval vnc key sway-browser-vnc Control_L # press left Ctrl
Valid key names: Return, Escape, Tab, BackSpace, Delete, Home, End, Page_Up, Page_Down, Up, Down, Left, Right, Insert, F1-F12, Shift_L, Shift_R, Control_L, Control_R, Alt_L, Alt_R, Super_L, Super_R, Meta_L, Meta_R, Caps_Lock, space.
charly eval vnc mouse sway-browser-vnc 500 300 # move mouse to (500, 300)
charly eval vnc status sway-browser-vnc
# Output:
# Desktop: sway
# Resolution: 1920x1080
charly eval vnc passwd sway-browser-vnc # prompts for password
charly eval vnc passwd sway-browser-vnc --generate # generates random password, prints to stdout
Sets up VNC authentication (VeNCrypt/TLS):
secret_backend setting) as vnc.password.<image>$HOME inside container for absolute config paths-traditional flag for OpenSSL 3.x) if not present~/.config/wayvnc/config with enable_auth=true (wayvnc reads this automatically)After setting a password, all charly eval vnc commands authenticate transparently via VeNCrypt/TLS.
When connecting, password is resolved in this order:
VNC_PASSWORD environment variable (CI/automation override)vnc.password.<image>-<instance> (when secret_backend=auto or keyring)vnc.password.<image>-<instance> (instance-specific)vnc.password.<image> (image-level)# One-off password override via env
VNC_PASSWORD=secret charly eval vnc screenshot sway-browser-vnc out.png
# Set password programmatically (alternative to charly eval vnc passwd)
charly settings set vnc.password.sway-browser-vnc mysecret
# Instance-specific password
charly settings set vnc.password.sway-browser-vnc-prod prodpassword
Requires openssl inside the container for TLS cert and RSA key generation.
charly eval vnc rfb sway-browser-vnc key '{"key": 65293, "down": true}' # raw key event
charly eval vnc rfb sway-browser-vnc pointer '{"x": 100, "y": 200, "button": 1}' # raw pointer
charly eval vnc rfb sway-browser-vnc cut-text '{"text": "clipboard"}' # clipboard
charly eval vnc rfb sway-browser-vnc fbupdate-request # get dimensions
| Aspect | charly eval cdp (CDP) | charly eval vnc (RFB) |
|--------|----------------|----------------|
| Protocol | WebSocket JSON | Binary TCP |
| Scope | Browser tabs | Whole desktop |
| Click | CSS selector (viewport-relative) | x,y coordinates (desktop-absolute) |
| Type | CDP key events | Key events (keysyms) |
| Screenshot | Browser page only | Full desktop |
| JavaScript | Yes (eval/wait) | No |
| Use case | Web automation | Desktop automation |
Source: charly/vnc_client.go, charly/vnc.go.
Some websites (notably Google sign-in) detect and block CDP-based input. VNC provides a reliable fallback because charly eval vnc type sends real X11 keysym events through the Wayland compositor — indistinguishable from physical keyboard input.
CDP + VNC Hybrid Pattern: Use charly eval cdp click --vnc for clicking (CDP selector precision + VNC pointer delivery) and charly eval vnc type for typing credentials:
# --vnc click: CDP finds element by selector, delivers click via VNC pointer
charly eval cdp click my-app $TAB '#identifierId' --vnc
sleep 0.5 # let compositor process focus
# VNC type sends real key events through the compositor
charly eval vnc type my-app "$GMAIL_USER"
Tested timing: 500ms sleep between --vnc click and VNC type is sufficient. No characters were dropped at this timing during Google sign-in testing.
When to use --vnc click and VNC type:
chrome:// pages (required): CDP mouse events and JS .click() are blocked on Chrome's privileged pages (chrome://intro/, chrome://sync-confirmation/, chrome://settings/). --vnc is the only way to click.Chrome first-run dialogs: On fresh profiles, Chrome opens a first-run dialog as a separate window invisible to CDP. Dismiss with charly eval wl sway msg my-app 'focus left' then charly eval vnc key my-app Return.
See /charly-eval:cdp for the full Google sign-in recipe.
VNC uses desktop-absolute coordinates, while CDP returns viewport-relative coordinates. Use the --from-cdp or --from-sway flags to explicitly translate:
--from-cdp <tab-id> — Translates viewport coords to desktop coords via CDP's window.screenX/screenY:
# Get viewport coords from charly eval cdp coords, then click via VNC
charly eval vnc click my-app 1220 328 --from-cdp $TAB
# Translated viewport (1220, 328) → desktop (1220, 439) via CDP tab ...
--from-sway <app-id> — Translates window-relative coords to desktop coords via sway tree:
charly eval vnc click my-app 500 200 --from-sway google-chrome
# Translated window-relative (500, 200) → desktop (504, 204) via sway app_id=google-chrome
Without flags, X and Y are desktop-absolute coordinates (the default, unchanged behavior).
VNC screenshots work correctly on NVIDIA headless for images using sway-desktop-vnc (the standard VNC composition). Two fixes enable this:
sway-desktop-vnc forces WLR_RENDERER=pixman (software rendering), producing buffers wayvnc can reliably capturewayvnc-wrapper triggers the missing headless power event that wayvnc 0.9.1 waits for before starting captureBoth charly eval vnc screenshot and charly eval wl screenshot work on NVIDIA headless:
charly eval vnc screenshot <image> out.png # VNC screenshot (works with pixman + DPMS fix)
charly eval wl screenshot <image> out.png # Wayland screenshot (grim, always works)
/charly-eval:eval — parent router; charly eval vnc … is how every invocation is dispatched./charly-eval:wl — Wayland-native desktop automation (sibling verb; works on NVIDIA headless)./charly-eval:cdp — Chrome DevTools Protocol automation (sibling verb; same container, different protocol)./charly-eval:dbus — D-Bus calls and desktop notifications (sibling verb under charly eval)./charly-eval:wl (sway subgroup) — Sway compositor control (window management, workspaces)/charly-core:charly-config — VNC password storage, secret_backend setting, migrate-secrets command/charly-core:service — Managing wayvnc supervisord service/charly-core:deploy — VNC password setup in deployment workflows/charly-core:shell — Executing commands inside containers/charly-image:layer — wayvnc layer configuration (port tcp:5900)MUST be invoked when the task involves VNC automation, charly eval vnc commands, RFB protocol desktop interaction, VNC screenshots, clicking coordinates, or VNC authentication. Invoke this skill BEFORE reading source code or launching Explore agents.
Workflow position: Desktop automation. Use for pixel-level interaction when CDP can't reach the element. See also /charly-eval:cdp (DOM, preferred), /charly-eval:wl (sway subgroup) (window).
tools
OpenCharly CLI (charly) binary installed into container/VM images for in-container use. Use when working with charly binary deployment inside containers, native D-Bus support, or the full charly toolchain (charly binary + virtualization + gocryptfs + socat).
development
Operator CachyOS workstation profile — a kind:local template + target:local deploy that installs the full dev stack (30 candies) onto a CachyOS host via ShellExecutor. Lives in the overthinkos/cachyos submodule. MUST be invoked before editing or applying the charly-cachyos workstation profile.
tools
Fedora box with the full charly toolchain using shared candies. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Same candy list as charly-arch. Includes NVIDIA GPU runtime. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-fedora box.
tools
Arch Linux box with the full charly toolchain. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Composes /charly-coder:charly-mcp so the box is reachable as an MCP gateway on port 18765. NVIDIA GPU runtime composed in. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-arch box.