eval/skills/wl/SKILL.md
MUST be invoked before any work involving: Wayland / wlroots desktop automation — `ov eval wl` commands (screenshots, click/type/scroll/drag, window management via wlrctl, clipboard, resolution control, AT-SPI2 introspection, window geometry), nested `wl sway` / `wl overlay` subcommands, or `wl:` declarative verbs inside `eval:` blocks. Covers sway-desktop and selkies-desktop image automation on both sway and labwc compositors.
npx skillsauth add overthinkos/overthink-plugins wlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ov eval wl is the unified desktop automation command for all wlroots-based compositors (sway, labwc). It provides screenshots, input (click, type, key combos, scroll, drag), window management (via wlrctl toplevel), clipboard, resolution control, accessibility introspection (AT-SPI2), and window geometry queries. Works on both sway-desktop and selkies-desktop images.
Every ov eval wl <method> (including nested wl overlay <method> and wl sway <method>) is authorable as a wl: verb inside a eval: block. Nested subcommands are hyphenated in YAML: wl: overlay-show, wl: sway-tree, wl: sway-workspaces. Method-specific fields (x, y, text, key, combo, target, action, artifact) are siblings of the verb line. See /ov-eval:eval for the full method allowlist. Example: - wl: screenshot\n artifact: /tmp/desktop.png\n artifact_min_bytes: 10000.
| Action | Command | Description |
|--------|---------|-------------|
| Screenshot | ov eval wl screenshot <image> [file] | Capture desktop as PNG via grim |
| Click | ov eval wl click <image> <x> <y> | Click at absolute coordinates via wlrctl |
| Double-click | ov eval wl double-click <image> <x> <y> | Double-click with configurable delay |
| Type text | ov eval wl type <image> <text> | Send keyboard input via wtype |
| Send key | ov eval wl key <image> <key-name> | Press a named key via wtype |
| Key combo | ov eval wl key-combo <image> <keys> | Send key combination (ctrl+c, alt+tab) |
| Move mouse | ov eval wl mouse <image> <x> <y> | Move pointer to absolute coordinates |
| Scroll | ov eval wl scroll <image> <x> <y> <dir> | Scroll at coordinates (up/down/left/right) |
| Drag | ov eval wl drag <image> <x1> <y1> <x2> <y2> | Drag between coordinates (experimental) |
| List windows | ov eval wl windows <image> | List windows (wlrctl toplevel, xdotool fallback) |
| List toplevel | ov eval wl toplevel <image> | List Wayland toplevel windows via wlrctl |
| Focus window | ov eval wl focus <image> <title> | Focus window (wlrctl toplevel, xdotool fallback) |
| Close window | ov eval wl close <image> <title> | Close window via wlrctl toplevel |
| Fullscreen | ov eval wl fullscreen <image> <title> | Toggle fullscreen via wlrctl toplevel |
| Minimize | ov eval wl minimize <image> <title> | Toggle minimize via wlrctl toplevel |
| Launch app | ov eval wl exec <image> <command> | Launch application in container |
| Resolution | ov eval wl resolution <image> <WxH> | Set output resolution via wlr-randr |
| Clipboard | ov eval wl clipboard <image> get/set/clear | Read/write Wayland clipboard |
| Window props | ov eval wl xprop <image> [target] | Query X11 window properties |
| Window rect | ov eval wl geometry <image> <target> | Get window position/size as JSON |
| A11y tree | ov eval wl atspi <image> tree | Dump accessibility tree as JSON |
| A11y find | ov eval wl atspi <image> find <query> | Find elements by name/role |
| A11y click | ov eval wl atspi <image> click <query> | Click element by name/role |
| Status | ov eval wl status <image> | Check all tool availability |
| Overlay show | ov eval wl overlay show <image> --type text --text "Hello" | Show recording overlay (see /ov-eval:wl-overlay) |
| Overlay hide | ov eval wl overlay hide <image> --all | Remove all overlays |
All commands work on any wlroots-based compositor:
| Tool | Protocol | sway | labwc (selkies) | |------|----------|------|-----------------| | grim | wlr-screencopy | YES | NO (nested compositor) | | pixelflux-screenshot | pixelflux API | NO | YES | | wtype | zwp_virtual_keyboard_v1 | YES | YES | | wlrctl pointer | wlr-virtual-pointer | YES | YES | | wlrctl toplevel | wlr-foreign-toplevel-management | YES | YES | | wlr-randr | wlr-output-management | YES | YES | | wl-copy/paste | wlr-data-control | YES | YES | | xdotool | X11 (XWayland) | YES | YES (on-demand) | | swaymsg | i3 IPC | YES | NO |
Coordinate translation flags:
--from-cdp — Works on all compositors (uses window.screenX/Y)--from-sway — Sway only (uses swaymsg -t get_tree)--from-x11 — Works on all compositors with XWaylandCLI command -> resolveContainer (engine + container name)
-> exec sh -c "export WAYLAND_DISPLAY=... && <tool command>"
-> capture stdout (screenshot) or run silently (input)
Uses exec into the container. All tools use native Wayland protocols — no daemon, no /dev/uinput, no VNC server required.
wl-tools layer (wtype, wlrctl, wl-clipboard, wlr-randr, xdotool, ydotool)wl-screenshot-grim (sway) or wl-screenshot-pixelflux (selkies)a11y-tools layer (python3-pyatspi, python3-gobject) + dbus layerxterm must be running to trigger XWayland start on labwcsway-desktop and selkies-desktop metalayersov eval wl key-combo my-app ctrl+c # Ctrl+C
ov eval wl key-combo my-app alt+tab # Alt+Tab
ov eval wl key-combo my-app ctrl+shift+t # Ctrl+Shift+T
ov eval wl key-combo my-app super+l # Super+L (lock)
Modifiers: ctrl/control, alt, shift, super/win/logo, meta. Uses wtype -M.
ov eval wl scroll my-app 960 540 down # scroll down 3 steps at center
ov eval wl scroll my-app 960 540 up --amount 10 # scroll up 10 steps
Uses xdotool click 4/5/6/7 (X11 scroll buttons) for XWayland windows. Falls back to wtype Page_Up/Page_Down.
ov eval wl drag my-app 100 100 500 500 # drag left to right
ov eval wl drag my-app 100 100 500 500 --duration 500 # slower drag (500ms)
Requires XWayland (uses xdotool mousemove + mousedown/mouseup).
ov eval wl toplevel my-app # list all windows
ov eval wl focus my-app "Chrome" # focus by title
ov eval wl close my-app "Chrome" # close by title
ov eval wl fullscreen my-app "Chrome" # toggle fullscreen
ov eval wl minimize my-app "Chrome" # toggle minimize
ov eval wl exec my-app foot # launch terminal
ov eval wl resolution my-app 1920x1080 # auto-detect output
ov eval wl resolution my-app 2560x1440 -o WL-1 # specific output
ov eval wl clipboard my-app get # read clipboard
ov eval wl clipboard my-app set "hello" # write clipboard
ov eval wl clipboard my-app clear # clear clipboard
ov eval wl clipboard my-app get --primary # read primary selection
ov eval wl geometry my-app "Chrome" # returns JSON: {"x":0,"y":0,"width":1920,"height":1080}
ov eval wl xprop my-app # active window properties
ov eval wl xprop my-app "Chrome" # specific window properties
ov eval wl atspi my-app tree # dump full accessibility tree
ov eval wl atspi my-app find "Save" # find elements named "Save"
ov eval wl atspi my-app find "button" # find elements with role "button"
ov eval wl atspi my-app find "Save:button" # find by name AND role
ov eval wl atspi my-app click "Save:button" # click element by name/role
Requires a11y-tools layer. Chrome needs --force-renderer-accessibility flag.
Use ov eval cdp click --wl to find elements by CSS selector in Chrome and deliver clicks via wlrctl (critical for selkies-desktop which has no VNC):
ov eval cdp click selkies-desktop $TAB '#submit-button' --wl
ov eval wl sway)Sway IPC commands are grouped under ov eval wl sway <subcommand>. These require a sway compositor and use swaymsg. They will error on labwc.
ov eval wl sway tree <image> # Get sway window tree (JSON)
ov eval wl sway msg <image> 'focus left' # Run any swaymsg command
ov eval wl sway workspaces <image> # List workspaces (JSON)
ov eval wl sway outputs <image> # List outputs (JSON)
ov eval wl sway focus <image> left # Focus by direction or [criteria]
ov eval wl sway move <image> right # Move focused window
ov eval wl sway resize <image> width 10px # Resize focused window
ov eval wl sway kill <image> # Close focused window
ov eval wl sway floating <image> # Toggle floating
ov eval wl sway layout <image> tabbed # Set layout mode
ov eval wl sway workspace <image> 2 # Switch workspace
ov eval wl sway reload <image> # Reload sway config
| Aspect | ov eval wl | ov eval vnc |
|--------|---------|----------|
| Compositors | All wlroots (+ sway subgroup) | Requires wayvnc |
| Transport | exec into container | TCP port 5900 |
| Window mgmt | wlrctl toplevel + sway IPC | No |
| Clipboard | wl-copy/paste | rfb cut-text |
| Remote access | No | Yes (TCP) |
| NVIDIA headless | Works | Works (pixman + DPMS fix) |
Source: ov/wl.go.
/ov-eval:eval — parent router; ov eval wl … is how every invocation is dispatched./ov-eval:vnc — VNC/RFB protocol alternative (sibling verb; TCP-based, works remotely)./ov-eval:cdp — Chrome DevTools Protocol (sibling verb; DOM-level interaction, --wl flag for click, axtree for accessibility)./ov-eval:dbus — D-Bus calls and desktop notifications (sibling verb under ov eval)./ov-selkies:wl-tools — Compositor-agnostic tools (wtype, wlrctl, wl-clipboard, wlr-randr, xdotool, ydotool)/ov-eval:wl-overlay — Fullscreen overlays for recordings (title cards, lower-thirds, countdowns, highlights, fades)/ov-selkies:wl-overlay-layer — Overlay layer (gtk4-layer-shell, python3-gobject)/ov-selkies:wl-screenshot-grim — Screenshot layer for sway (grim, wlr-screencopy)/ov-selkies:wl-screenshot-pixelflux — Screenshot layer for selkies (pixelflux rendering pipeline)/ov-selkies:a11y-tools — AT-SPI2 accessibility (python3-pyatspi, python3-gobject)/ov-selkies:xterm — X11 terminal for XWayland testing/ov-selkies:sway-desktop — Desktop metalayer (wl-tools + wl-screenshot-grim)/ov-selkies:selkies-desktop-layer — Desktop metalayer (wl-tools + wl-screenshot-pixelflux + a11y-tools + xterm)development
Claude Code multi-agent support in Overthink — sub-agents, dynamic workflows, and agent teams, and how each drives the existing `ov eval` disposable beds to test and verify. MUST be invoked before authoring or invoking an ov sub-agent / dynamic workflow / agent team, wiring agent-lifecycle hooks, or asking "which primitive should drive the R10 beds?".
tools
Mounts a virtiofs share tagged `workspace` at /workspace inside a VM guest via a systemd .mount unit. Use when a kind:vm entity shares a host directory into the guest and you need it auto-mounted (and re-mounted at every boot).
development
MUST be invoked before any work involving: the `kind: android` schema kind, a `target: android` deploy, the `apk:` layer package format (installing Android apps declaratively), AndroidDeployTarget, an in-pod emulator OR a remote/physical adb-endpoint device, or nested `pod → android` deployment. The first-class Android device + app surface that sits above `ov eval adb`/`appium`.
tools
Use when committing, branching, pushing, merging, tagging, creating PRs, or approving/merging PRs with gh — the feat/-branch, R10-gated, never-force-push landing workflow across the main repo + the plugins submodule + image/<distro> submodules. Covers sync-to-upstream, branch/worktree pruning, the fork+PR path for contributors without write access, and cross-repo @github landing order.