prompts/skills/wayland-automation/SKILL.md
Automate the niri Wayland compositor - query and manage windows via niri IPC (JSON + jq), take screenshots of specific windows via OBS dynamic cast + obs-cmd, send keyboard/text input via wtype, control the mouse with wlrctl, and manipulate the clipboard with wl-clipboard. Only works on niri/Wayland, not X11.
npx skillsauth add ramblurr/nix-devenv wayland-automationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Patterns for automating a niri Wayland desktop: querying windows, taking screenshots, sending input, and clipboard operations.
Tools: niri msg (IPC), obs-cmd (OBS WebSocket), wtype (virtual keyboard), wlrctl (virtual pointer/mouse), wl-copy/wl-paste (clipboard), jq (JSON filtering).
echo $WAYLAND_DISPLAY # should print wayland-1 or similar
pgrep niri # should return a PID
If either fails, this skill does not apply.
Running a nested niri-inside-niri session allows the agent to interact with GUI applications without disrupting the user's desktop. The nested session is an isolated Wayland compositor running inside a window on the parent desktop.
Ask the user whether they want a nested session. Use nested when:
Use the parent session directly when:
Ask the user if they already have a nested niri running. If so, they will provide the socket path. If not, start one:
# Start nested niri in the background
niri &
niri will print its IPC socket path on startup, e.g.:
IPC listening on: /run/user/1000/niri.wayland-2.1708311.sock
Extract the WAYLAND_DISPLAY name and NIRI_SOCKET path from the output. The WAYLAND_DISPLAY is the wayland-N portion (e.g., wayland-2).
All CLI tools must be told to target the nested session via environment variables:
# For niri IPC commands:
NIRI_SOCKET=/run/user/1000/niri.wayland-2.XXXXXXX.sock niri msg --json windows
# For wlrctl (mouse control):
WAYLAND_DISPLAY=wayland-2 wlrctl pointer move 100 50
# For wtype (keyboard input):
WAYLAND_DISPLAY=wayland-2 wtype "Hello"
# For wl-clipboard:
WAYLAND_DISPLAY=wayland-2 wl-copy "text"
WAYLAND_DISPLAY=wayland-2 wl-paste
# For launching GUI apps inside the nested session:
WAYLAND_DISPLAY=wayland-2 some-app &
You can set both variables in a shell for convenience:
export NIRI_SOCKET=/run/user/1000/niri.wayland-2.XXXXXXX.sock
export WAYLAND_DISPLAY=wayland-2
The nested niri appears as a window in the parent niri (title "niri", app_id may be null). Use the parent niri's OBS dynamic cast to capture it:
# Find the nested niri window in the PARENT niri (no NIRI_SOCKET override)
niri msg --json windows | jq '[.[] | select(.title == "niri")] | first | .id'
# Set dynamic cast target on the parent
niri msg action set-dynamic-cast-window --id <NESTED_NIRI_WINDOW_ID>
sleep 1
# Screenshot via OBS (obs-cmd always talks to OBS on the parent)
obs-cmd save-screenshot "Desktop" "png" "/tmp/nested_screenshot.png"
The corner-anchor technique works cleanly in nested sessions because the nested niri has a single output starting at (0, 0):
# Anchor to top-left of nested output
WAYLAND_DISPLAY=wayland-2 wlrctl pointer move -9999 -9999
# Move to target position (absolute within nested output)
WAYLAND_DISPLAY=wayland-2 wlrctl pointer move <X> <Y>
# Click
WAYLAND_DISPLAY=wayland-2 wlrctl pointer click
WAYLAND_DISPLAY=wayland-2 some-app &NIRI_SOCKET=... niri msg --json windowsWAYLAND_DISPLAY=wayland-2 with wlrctl/wtypeniri exposes window and workspace state via niri msg. Use --json for machine-readable output.
niri msg --json windows
Returns a JSON array. Each object has: id (int), title (string), app_id (string), pid (int), workspace_id (int), is_focused (bool), is_floating (bool), is_urgent (bool), and a layout object.
The layout object contains:
tile_pos_in_workspace_view: [x, y] position of the tile in logical coordinates relative to the workspace viewport, or null if the window is scrolled out of viewtile_size: [width, height] of the tile in logical pixelswindow_size: [width, height] of the window contentwindow_offset_in_tile: [x, y] offset of the window within its tile (usually [0, 0])# By app_id (exact match)
niri msg --json windows | jq '[.[] | select(.app_id == "firefox")] | first | .id'
# By app_id (pattern, case-insensitive)
niri msg --json windows | jq '[.[] | select(.app_id | test("ghostty"; "i"))] | first | .id'
# By title substring (case-insensitive)
niri msg --json windows | jq '[.[] | select(.title | test("some pattern"; "i"))] | first | .id'
A result of null means no matching window is open.
niri msg focused-window # active window info
niri msg --json focused-window # same, as JSON
niri msg workspaces # list workspaces
niri msg --json focused-output # focused monitor info
niri msg action focus-workspace 2 # switch workspace
niri msg action close-window # close focused window
niri msg action focus-window --id <ID> # focus a specific window
Captures a single window by routing it through niri's dynamic cast target and taking a screenshot via OBS Studio.
Requirements: niri, OBS Studio running with WebSocket server enabled, a Screen Capture source configured to capture "niri Dynamic Cast Target", and obs-cmd.
obs-cmd scene current
If this errors, ABORT. Tell the human: "OBS must be open with WebSocket server enabled, and a Screen Capture source configured to capture the 'niri Dynamic Cast Target' window."
obs-cmd scene-item list <SCENE_NAME>
Use the scene name from step 1. Look for the Screen Capture source name in the output. Note the exact name for step 6.
If no suitable source is found, ABORT and ask the human to configure a Screen Capture source in OBS that captures "niri Dynamic Cast Target".
The target application must already be open. Use the jq patterns from the "niri IPC" section above.
If the result is null, ABORT and tell the human the application must be open first.
niri msg action set-dynamic-cast-window --id <WINDOW_ID>
sleep 1
obs-cmd save-screenshot "<SOURCE_NAME>" "png" "<OUTPUT_PATH>"
.png--width <N> and --height <N> for custom dimensionsniri msg action clear-dynamic-cast-target
wtype simulates keyboard input on Wayland via the virtual-keyboard protocol. It is like xdotool type for Wayland.
wtype "Hello World"
wtype -k Return # Enter
wtype -k Tab
wtype -k Escape
wtype -k BackSpace
wtype -k Delete
wtype -k space
wtype -k Up
wtype -k Down
wtype -k Left
wtype -k Right
wtype -k Home
wtype -k End
wtype -k Page_Up
wtype -k Page_Down
wtype -k F1 # through F12
Key names are resolved by libxkbcommon.
Valid modifiers: shift, capslock, ctrl, logo (Super/Win), alt, altgr.
wtype -M ctrl -k c # Ctrl+C
wtype -M ctrl -k v # Ctrl+V
wtype -M ctrl -k a # Ctrl+A
wtype -M ctrl -k s # Ctrl+S
wtype -M ctrl -k z # Ctrl+Z
wtype -M alt -k Tab # Alt+Tab
wtype -M alt -k F4 # Alt+F4
wtype -M shift -k Tab # Shift+Tab
wtype -M logo -k d # Super+D
wtype -M ctrl -M shift -k t # Ctrl+Shift+T
Modifiers are released automatically when wtype exits.
wtype does not interpret \n. Use separate invocations with key presses between them:
wtype "Line 1" && wtype -k Return && wtype "Line 2"
wtype -d 50 "slow typing" # 50ms delay between keystrokes
wtype -s 500 -k Return # sleep 500ms then press Enter
wtype "first" && sleep 0.5 && wtype "second" # shell-level delay between commands
wtype -P ctrl # press ctrl (hold)
wtype -k c # type c while ctrl is held
wtype -p ctrl # release ctrl
echo "text to type" | wtype -
wlrctl provides virtual pointer control on Wayland via the wlr-virtual-pointer protocol. It supports relative mouse movement, clicking, and scrolling.
wlrctl pointer click # left click (default)
wlrctl pointer click left # explicit left click
wlrctl pointer click right # right click
wlrctl pointer click middle # middle click
Supported buttons: left, right, middle, extra, side, forward, back.
wlrctl moves the cursor by a relative displacement in pixels. There is no absolute positioning.
wlrctl pointer move 100 0 # move 100px right
wlrctl pointer move -50 0 # move 50px left
wlrctl pointer move 0 200 # move 200px down
wlrctl pointer move 0 -100 # move 100px up
wlrctl pointer move 50 -30 # move diagonally
dx = positive right, negative left. dy = positive down, negative up.
wlrctl pointer scroll 5 0 # scroll down
wlrctl pointer scroll -5 0 # scroll up
wlrctl pointer scroll 0 3 # scroll right
wlrctl pointer scroll 0 -3 # scroll left
Arguments are <dy> <dx> (vertical first, then horizontal).
Since wlrctl only supports relative movement, establish a known position by overshooting to a corner, then move by exact offset:
# 1. Anchor to top-left corner of the output
wlrctl pointer move -9999 -9999
# 2. Move to the desired absolute position (in output-local logical pixels)
wlrctl pointer move <TARGET_X> <TARGET_Y>
This works because the compositor clamps the cursor at the output edge.
Combine the corner-anchor technique with niri IPC layout data to click at a pixel offset inside a window.
# 1. Anchor to top-left corner
wlrctl pointer move -9999 -9999
# 2. Get the window's position on the output
LAYOUT=$(niri msg --json focused-window | jq '.layout')
TILE_X=$(echo "$LAYOUT" | jq '.tile_pos_in_workspace_view[0]')
TILE_Y=$(echo "$LAYOUT" | jq '.tile_pos_in_workspace_view[1]')
OFF_X=$(echo "$LAYOUT" | jq '.window_offset_in_tile[0]')
OFF_Y=$(echo "$LAYOUT" | jq '.window_offset_in_tile[1]')
# 3. Compute absolute position: window origin + offset within window
# TX, TY are the target pixel coordinates within the window (0,0 = top-left)
TX=35 # e.g. x position of "File" menu
TY=80 # e.g. y position of "File" menu
TARGET_X=$(echo "$TILE_X + $OFF_X + $TX" | bc)
TARGET_Y=$(echo "$TILE_Y + $OFF_Y + $TY" | bc)
# 4. Move and click
wlrctl pointer move $TARGET_X $TARGET_Y
wlrctl pointer click
Note: tile_pos_in_workspace_view is null when the window is scrolled out of the visible viewport. The window must be visible on screen for this to work.
wlrctl also provides window management via the foreign toplevel protocol. Match windows by app_id, title, or state.
wlrctl window focus firefox # focus by app_id
wlrctl window focus title:"My Window" # focus by title
wlrctl window close app_id:firefox # close by app_id
wlrctl window maximize app_id:firefox # maximize
wlrctl window minimize app_id:signal # minimize
wlrctl window fullscreen title:"Video" # fullscreen
wlrctl window find firefox # exit 0 if window exists
wlrctl window waitfor app_id:firefox # block until window appears
wlrctl window focus may fail to activate a window when the cursor is on a different output. Use niri msg action focus-window --id <ID> instead -- it works across outputs:
ID=$(niri msg --json windows | jq '[.[] | select(.app_id == "firefox")] | first | .id')
niri msg action focus-window --id "$ID"
Note: focus-window does NOT warp the mouse cursor into the window. After focusing, the cursor may still be on a different output. To also move the cursor, combine with the corner-anchor technique on the correct output.
wl-kbptr is a keyboard-driven mouse positioning tool. It overlays a grid on the screen and lets the user select a position by typing label characters, then refines with binary subdivision. Press Enter to finalize the cursor position, Escape to cancel.
wl-kbptr does NOT work with virtual keyboards (wtype/wlrctl keyboard). It only responds to physical keyboard input. This makes it unsuitable for fully automated agent workflows. Use it when:
# Default mode (tile grid, then bisect refinement)
WAYLAND_DISPLAY=wayland-2 wl-kbptr
# Bisect-only mode (4 quadrants, keep subdividing)
WAYLAND_DISPLAY=wayland-2 wl-kbptr -o 'modes=bisect'
# Split mode (arrow keys to narrow down)
WAYLAND_DISPLAY=wayland-2 wl-kbptr -o 'modes=split'
# Print coordinates only (don't move cursor or click)
WAYLAND_DISPLAY=wayland-2 wl-kbptr --only-print
# Restrict to a specific area
WAYLAND_DISPLAY=wayland-2 wl-kbptr --restrict '800x600+100+50'
Bisect home row keys: a/s/d/f select quadrants (top-left/top-right/bottom-left/bottom-right), g = left click, h = right click, b = middle click. Backspace undoes the last selection. Enter/Space confirms.
echo "text" | wl-copy # text to clipboard
wl-copy < file.txt # file contents
wl-copy --type image/png < image.png # image
wl-copy --primary "text" # primary selection
wl-paste # get clipboard text
wl-paste --type text/plain # specific MIME type
wl-paste --type image/png > image.png # image from clipboard
wl-paste --list-types # list available types
wl-paste --watch echo "Clipboard changed"
obs-cmd controls OBS Studio via the obs-websocket v5 protocol. Default connection: obsws://localhost:4455/secret. Override with --websocket flag or OBS_WEBSOCKET_URL env var.
obs-cmd scene current # get current scene name
obs-cmd scene switch "Scene Name" # switch scene
obs-cmd scene-item list "Scene" "Source" # list scene items
obs-cmd scene-item enable "Scene" "Source" # show source
obs-cmd scene-item disable "Scene" "Source" # hide source
obs-cmd scene-item toggle "Scene" "Source" # toggle visibility
obs-cmd save-screenshot "Source" "png" "/path/to/file.png"
obs-cmd save-screenshot "Source" "jpg" "/path/to/file.jpg" --width 1920 --height 1080
obs-cmd save-screenshot "Source" "jpg" "/path/to/file.jpg" --compression-quality 90
obs-cmd recording start
obs-cmd recording stop
obs-cmd recording toggle
obs-cmd recording status
obs-cmd streaming start
obs-cmd streaming stop
obs-cmd streaming toggle
obs-cmd streaming status
obs-cmd audio toggle "Mic/Aux"
obs-cmd audio mute "Desktop Audio"
obs-cmd audio unmute "Mic/Aux"
obs-cmd audio status "Mic/Aux"
obs-cmd info
wtype "username"
wtype -k Tab
wtype "password"
wtype -k Tab
wtype -k Return
wtype -M ctrl -k a
sleep 0.1
wtype -M ctrl -k c
content=$(wl-paste)
echo "$content"
echo "new content" | wl-copy
wtype -M ctrl -k v
echo "日本語テキスト" | wl-copy
wtype -M ctrl -k v
| Problem | Fix |
|---|---|
| obs-cmd connection refused | Enable WebSocket server in OBS: Tools > WebSocket Server Settings |
| Screenshot is black/empty | Verify OBS Screen Capture source captures "niri Dynamic Cast Target" |
| save-screenshot source not found | Run obs-cmd scene-item list <SCENE> for exact source name (case-sensitive) |
| Window ID is null | App not open or app_id/title pattern wrong; run niri msg --json windows to inspect |
| niri msg not found | Ensure niri IPC socket is available; only works on niri/Wayland |
| wtype does nothing | Check $WAYLAND_DISPLAY is set and niri is running |
| wlrctl click/move has no effect | Check $WAYLAND_DISPLAY; ensure niri supports wlr-virtual-pointer-unstable-v1 |
| Mouse not at expected position after focus | focus-window does not warp cursor; use corner-anchor technique instead |
| Nested niri: commands affect parent | Set WAYLAND_DISPLAY for wlrctl/wtype/wl-clipboard, NIRI_SOCKET for niri msg |
| Nested niri: app opens on parent desktop | Launch app with WAYLAND_DISPLAY=wayland-2 to target nested session |
| Clipboard empty | Check wl-paste --list-types; ensure wl-clipboard can connect to Wayland |
tools
Use when working with Nixbot CI, forge commit statuses, Nix flake checks, nixbot.toml, Nixbot effects, or nixbot-cli - explains how Nixbot runs flake CI and how to inspect builds and logs.
testing
Use this OCP when executing or preparing to execute commands that change a live or important system, service reloads/restarts, package changes, deployments, migrations, firewall/network/access changes, credential rotation, NixOS switch/test/boot/deploy, or incident mitigation. It guides safe operations with a persisted ledger for scope, preflight, baseline, rollback, validation, and evidence.
development
Create new agent skills with proper structure, progressive disclosure, and bundled resources. Use when user wants to create, write, or build a new skill.
documentation
Naming conventions for workflow documents in prompts/. Use when creating plans, PRDs, research reports, idea capture or other workflow documents. Triggers on (1) creating new planning documents, (2) naming PRDs or research reports, (3) questions about document organization in prompts/.