skills/dev-test-linux/SKILL.md
This skill should be used when the user asks to "test Linux desktop apps", "automate GTK/Qt applications", "test with ydotool", "test with xdotool", "verify Linux UI interactions", "capture screenshots on Linux", "control D-Bus services", "test Wayland applications", "test X11 applications", or needs Linux desktop E2E testing. Provides comprehensive guidance for Linux automation with ydotool (Wayland), xdotool (X11), grim, and D-Bus.
npx skillsauth add edwinhu/workflows dev-test-linuxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before taking screenshots or running E2E tests, you MUST complete all 6 gates from dev-tdd:
GATE 1: BUILD
GATE 2: LAUNCH (with file-based logging)
GATE 3: WAIT
GATE 4: CHECK PROCESS
GATE 5: READ LOGS ← MANDATORY, CANNOT SKIP
GATE 6: VERIFY LOGS
THEN: E2E tests/screenshots
You loaded dev-tdd earlier. Follow the gates now. </EXTREMELY-IMPORTANT>
Verify automation tools are installed before proceeding.
# Detect display server (check for Wayland vs X11)
echo $XDG_SESSION_TYPE # "wayland" or "x11"
# Wayland tools check (verify ydotool, wtype, grim, slurp)
which ydotool || echo "MISSING: ydotool"
which wtype || echo "MISSING: wtype"
which grim || echo "MISSING: grim"
which slurp || echo "MISSING: slurp"
# X11 tools check (verify xdotool, xclip, scrot)
which xdotool || echo "MISSING: xdotool"
which xclip || echo "MISSING: xclip"
which scrot || echo "MISSING: scrot"
# D-Bus check (verify dbus-send availability)
which dbus-send || echo "MISSING: dbus-send"
If missing (Wayland):
STOP: Cannot proceed with Wayland automation.
Missing tools for Wayland E2E testing.
Install with:
# Arch
sudo pacman -S ydotool wtype grim slurp
# Debian/Ubuntu
sudo apt install ydotool wtype grim slurp
# Nix
nix-env -iA nixpkgs.ydotool nixpkgs.wtype nixpkgs.grim nixpkgs.slurp
Start ydotool daemon:
sudo systemctl enable --now ydotool
# Or for user service:
systemctl --user enable --now ydotool
Reply when installed and I'll continue testing.
This gate is non-negotiable. Missing tools = full stop. </EXTREMELY-IMPORTANT>
<EXTREMELY-IMPORTANT> ## When to Use Linux AutomationUse Linux automation (ydotool/xdotool) for:
Do NOT use Linux automation for:
Related skills:
${CLAUDE_SKILL_DIR}/../../skills/dev-test-chrome/SKILL.md and follow its instructions.${CLAUDE_SKILL_DIR}/../../skills/dev-test-playwright/SKILL.md and follow its instructions.| Thought | Reality | |---------|---------| | "I can test the app manually" | AUTOMATE IT with ydotool/xdotool | | "Web testing tools work for desktop apps" | NO. Use native Linux tools | | "ydotool daemon is hard to set up" | One-time setup. Do it. | | "X11 is deprecated, skip xdotool" | Many systems still use X11. Support both. | | "D-Bus is too complex" | D-Bus gives precise control. Learn it. |
# Detect display server and choose appropriate tools
if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
# Use ydotool, wtype, grim
else
# Use xdotool, xclip, scrot
fi
Always detect display server before choosing tools. </EXTREMELY-IMPORTANT>
# Check display server type (Wayland or X11)
if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
echo "Using Wayland tools (ydotool, wtype, grim)"
else
echo "Using X11 tools (xdotool, xclip, scrot)"
fi
Requires ydotoold daemon running.
# Type text (simple text input to focused window)
ydotool type "hello world"
# Type with delay (type text with microsecond delay between keys)
ydotool type --delay 50 "slow typing"
# Press Enter key (send Enter key using keycode format)
ydotool key 28:1 28:0
# Press Escape key (send Escape key)
ydotool key 1:1 1:0
# Press Ctrl+C (send Ctrl+C combination)
ydotool key 29:1 46:1 46:0 29:0
# Press Ctrl+V (send Ctrl+V combination)
ydotool key 29:1 47:1 47:0 29:0
# Press Alt+Tab (send Alt+Tab combination)
ydotool key 56:1 15:1 15:0 56:0
# Common keycodes reference
# 1=Escape, 14=Backspace, 15=Tab, 28=Enter, 29=Ctrl, 42=LShift
# 56=Alt, 57=Space, 100=RightAlt, 125=Super/Win
# Type text (simple text input to focused window)
wtype "hello world"
# Press Ctrl+C (send Ctrl+C combination)
wtype -M ctrl -k c
# Press Ctrl+Shift+S (send Ctrl+Shift+S combination)
wtype -M ctrl -M shift -k s
# Press Enter (send Enter key)
wtype -k Return
# Press Escape (send Escape key)
wtype -k Escape
Available modifiers: shift, ctrl, alt, logo (super)
# Move mouse to absolute position (move cursor to screen coordinates)
ydotool mousemove --absolute 100 200
# Move mouse relative (move cursor by relative offset)
ydotool mousemove 50 -30
# Click left button (send left mouse click)
ydotool click 1
# Click right button (send right mouse click)
ydotool click 3
# Double click (send double click)
ydotool click 1 1
# Click at position (move and click in one operation)
ydotool mousemove --absolute 500 300 && ydotool click 1
# Drag operation (move mouse while holding button)
ydotool mousemove --absolute 100 100
ydotool mousedown 1
ydotool mousemove --absolute 200 200
ydotool mouseup 1
# Type text (simple text input to focused window)
xdotool type "hello world"
# Press Return (send Return key)
xdotool key Return
# Press Escape (send Escape key)
xdotool key Escape
# Press Ctrl+C (send Ctrl+C combination)
xdotool key ctrl+c
# Press Ctrl+Shift+S (send Ctrl+Shift+S combination)
xdotool key ctrl+shift+s
# Press Alt+Tab (send Alt+Tab combination)
xdotool key alt+Tab
# Press Super+D (send Super+D combination)
xdotool key super+d
# Type with delay (type text with millisecond delay between keys)
xdotool type --delay 50 "slow typing"
# Hold key down (press and hold Ctrl)
xdotool keydown ctrl
# Press C (send C key)
xdotool key c
# Release key (release Ctrl)
xdotool keyup ctrl
# Move mouse absolute (move cursor to screen coordinates)
xdotool mousemove 100 200
# Move mouse relative (move cursor by relative offset)
xdotool mousemove --relative 50 30
# Click left button (send left mouse click)
xdotool click 1
# Click middle button (send middle mouse click)
xdotool click 2
# Click right button (send right mouse click)
xdotool click 3
# Double click (send double click)
xdotool click --repeat 2 1
# Click at position (move and click in one operation)
xdotool mousemove 500 300 click 1
# Drag operation (move mouse while holding button)
xdotool mousemove 100 100 mousedown 1 mousemove 200 200 mouseup 1
# Get active window ID (get numeric window identifier)
xdotool getactivewindow
# Focus window by name (find and focus window matching name)
xdotool search --name "Firefox" windowactivate
# Focus window by class (find and focus window matching class)
xdotool search --class "firefox" windowactivate
# Get window title (get title of active window)
xdotool getactivewindow getwindowname
# Move window (move active window to coordinates)
xdotool getactivewindow windowmove 100 100
# Resize window (resize active window to dimensions)
xdotool getactivewindow windowsize 800 600
# Minimize window (minimize active window)
xdotool getactivewindow windowminimize
# Focus window and wait (find, focus, and synchronize with window)
xdotool search --name "Firefox" windowactivate --sync
Every E2E test MUST include screenshot evidence.
Capture a screenshot after completing a workflow to prove success. </EXTREMELY-IMPORTANT>
# Capture full screen (capture all outputs)
grim /tmp/screenshot.png
# Capture specific output (capture single monitor/output)
grim -o DP-1 /tmp/screen.png
# Capture region interactively (select region with slurp then capture)
grim -g "$(slurp)" /tmp/region.png
# Capture specific region (capture region by coordinates and size)
grim -g "100,200 800x600" /tmp/region.png
# Capture Hyprland window (get window geometry and capture)
hyprctl clients -j | jq '.[] | select(.class=="firefox")'
grim -g "X,Y WxH" /tmp/window.png
# Capture Sway focused window (get focused window geometry and capture)
grim -g "$(swaymsg -t get_tree | jq -r '.. | select(.focused?) | .rect | "\(.x),\(.y) \(.width)x\(.height)"')" /tmp/window.png
# Capture full screen (screenshot of entire display)
scrot /tmp/screenshot.png
# Capture active window (screenshot of focused window)
scrot -u /tmp/window.png
# Capture interactive selection (select region with mouse then capture)
scrot -s /tmp/selection.png
# Capture with delay (wait before capturing)
scrot -d 3 /tmp/delayed.png
# Capture root window (screenshot using ImageMagick)
import -window root /tmp/screenshot.png
# Capture active window (screenshot of focused window using ImageMagick)
import -window "$(xdotool getactivewindow)" /tmp/window.png
# Compare screenshots (count different pixels using ImageMagick)
compare -metric AE baseline.png current.png diff.png
# Threshold comparison (allow 5% fuzz when comparing)
compare -metric AE -fuzz 5% baseline.png current.png diff.png
Preferred for apps that expose D-Bus interfaces.
# List available services (enumerate all D-Bus services)
dbus-send --session --print-reply --dest=org.freedesktop.DBus \
/org/freedesktop/DBus org.freedesktop.DBus.ListNames
# Open document in Zathura (get PID first, then use org.pwmt.zathura.PID-XXXX)
dbus-send --print-reply --dest=org.pwmt.zathura.PID-12345 \
/org/pwmt/zathura org.pwmt.zathura.OpenDocument string:"/path/to/file.pdf"
# Go to page in Zathura (navigate to specific page)
dbus-send --print-reply --dest=org.pwmt.zathura.PID-12345 \
/org/pwmt/zathura org.pwmt.zathura.GotoPage uint32:5
# Open file in GNOME Nautilus (open folder via D-Bus)
dbus-send --session --dest=org.gnome.Nautilus \
/org/gnome/Nautilus org.freedesktop.Application.Open \
array:string:"file:///home/user" dict:string:string:""
# Introspect D-Bus service (discover available methods and properties)
dbus-send --session --print-reply --dest=org.example.App \
/org/example/App org.freedesktop.DBus.Introspectable.Introspect
Use AT-SPI for UI element discovery and verification.
#!/usr/bin/env python3
import pyatspi
# Find application (get desktop and search for app by name)
desktop = pyatspi.Registry.getDesktop(0)
for app in desktop:
if "firefox" in app.name.lower():
print(f"Found: {app.name}")
# Traverse accessibility tree (recursively dump accessibility tree)
def dump_tree(node, indent=0):
print(" " * indent + f"{node.getRole()}: {node.name}")
for child in node:
dump_tree(child, indent + 1)
dump_tree(app)
# Find specific element (search for button by name in tree)
def find_button(app, name):
for child in app:
if child.getRole() == pyatspi.ROLE_PUSH_BUTTON:
if name.lower() in child.name.lower():
return child
found = find_button(child, name)
if found:
return found
return None
# Click button via AT-SPI (trigger button action via accessibility interface)
button = find_button(app, "Submit")
if button:
button.queryAction().doAction(0)
Every Linux E2E test MUST:
#!/bin/bash
# test_workflow.sh - Wayland E2E test
set -e # Exit on error
echo "Starting E2E test..."
# Launch Firefox
firefox &
sleep 3
# Focus address bar and navigate (focus address bar with Ctrl+L)
wtype -M ctrl -k l
sleep 0.2
# Type URL (type example.com URL)
wtype "https://example.com"
# Press Enter (send Return key)
wtype -k Return
sleep 2
# Capture initial screenshot (screenshot before interaction)
grim /tmp/test_before.png
# Move mouse and click (move to element and click)
ydotool mousemove --absolute 500 400
ydotool click 1
sleep 0.5
# Capture final screenshot (screenshot after interaction)
grim /tmp/test_after.png
# Compare screenshots (compare file sizes to detect changes)
SIZE_BEFORE=$(stat -c%s /tmp/test_before.png)
SIZE_AFTER=$(stat -c%s /tmp/test_after.png)
if [ "$SIZE_BEFORE" -ne "$SIZE_AFTER" ]; then
echo "PASS: Screenshots differ (interaction worked)"
else
echo "WARN: Screenshots identical"
fi
echo "Test complete"
#!/bin/bash
# test_workflow_x11.sh - X11 E2E test
set -e
echo "Starting X11 E2E test..."
# Launch gedit (start text editor application)
gedit &
sleep 2
# Focus gedit window (find and focus window by name)
xdotool search --name "gedit" windowactivate --sync
# Type test content (type test text into editor)
xdotool type "Hello, this is an automated test!"
sleep 0.5
# Select all text (select all with Ctrl+A)
xdotool key ctrl+a
# Copy to clipboard (copy selected text with Ctrl+C)
xdotool key ctrl+c
# Verify clipboard content (get clipboard and verify content)
CLIPBOARD=$(xclip -selection clipboard -o)
if [[ "$CLIPBOARD" == *"automated test"* ]]; then
echo "PASS: Clipboard contains expected text"
else
echo "FAIL: Clipboard mismatch"
exit 1
fi
# Capture window screenshot (screenshot of active window)
scrot -u /tmp/test_result.png
echo "Screenshot saved"
# Close without saving (close window with Ctrl+W)
xdotool key ctrl+w
sleep 0.5
# Dismiss save dialog (press Tab and Return to skip save)
xdotool key Tab key Return
echo "Test complete"
Document every test run in LEARNINGS.md using this template:
## Linux E2E Test: [Description]
**Display Server:** Wayland / X11
**Tool:** ydotool / xdotool
**Script:**
```bash
./test_workflow.sh
Output:
Starting E2E test...
PASS: Screenshots differ (interaction worked)
Test complete
Result: PASS
Screenshot: /tmp/test_result.png
## Integration
This skill integrates with `dev-test` for Linux desktop automation.
Read `${CLAUDE_SKILL_DIR}/../../skills/dev-tdd/SKILL.md` and follow its instructions.
testing
Internal skill for literature review and source materialization. Called after brainstorm, before setup. NOT user-facing.
documentation
This skill should be used when the user asks to 'write a paper', 'start a writing project', 'draft an article', 'write about', 'brainstorm writing topics', 'gather sources for a paper', 'what should I write about', or needs the writing workflow entry point for any writing task.
testing
Validate draft sections cover all PRECIS claims before review.
testing
Internal skill for creating PRECIS.md, OUTLINE.md, and ACTIVE_WORKFLOW.md. Called after brainstorm sources are gathered.