openclaw/skills/peekaboo/SKILL.md
macOS GUI automation via Peekaboo. Use when asked to interact with the Mac Mini desktop, click buttons, type text, take screenshots, control windows/apps, navigate menus, or perform any visual UI automation.
npx skillsauth add Dbochman/dotfiles peekabooInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Control the Mac Mini desktop via the peekaboo CLI (v3.0.0-beta3). Provides screenshot capture, UI element detection, clicking, typing, window/app management, menu interaction, and AI-powered multi-step automation.
Peekaboo requires Screen Recording and Accessibility TCC permissions which only work from the macOS GUI (Aqua) session. Running peekaboo directly over SSH will fail silently or error out.
All peekaboo commands must use the .command file pattern to execute in the GUI session:
# Write the command to a .command file and execute via `open`
ssh dylans-mac-mini 'cat > /tmp/peekaboo_op.command << '\''SCRIPT'\''
#!/bin/bash
export PATH=/opt/homebrew/bin:$PATH
peekaboo image --path /tmp/screenshot.png --json > /tmp/peekaboo_result.txt 2>&1
echo $? > /tmp/peekaboo_exit.txt
osascript -e '\''tell application "Terminal" to close (every window whose name contains "peekaboo_op")'\'' &>/dev/null &
SCRIPT
chmod +x /tmp/peekaboo_op.command && open /tmp/peekaboo_op.command'
sleep 3
ssh dylans-mac-mini "cat /tmp/peekaboo_exit.txt && cat /tmp/peekaboo_result.txt"
For simpler one-off commands, use this condensed pattern:
ssh dylans-mac-mini "echo '#!/bin/bash
export PATH=/opt/homebrew/bin:\$PATH
peekaboo YOUR_COMMAND_HERE > /tmp/peekaboo_result.txt 2>&1
echo \$? > /tmp/peekaboo_exit.txt
osascript -e \"tell application \\\"Terminal\\\" to close (every window whose name contains \\\"pk_op\\\")\" &>/dev/null &' > /tmp/pk_op.command && chmod +x /tmp/pk_op.command && open /tmp/pk_op.command"
sleep 3
ssh dylans-mac-mini "cat /tmp/peekaboo_exit.txt; cat /tmp/peekaboo_result.txt"
The standard automation pattern is:
peekaboo see - Capture and analyze UI elements, get element IDspeekaboo click --on <ID> - Click on a discovered elementpeekaboo type "text" - Type into the focused elementElement IDs from see (e.g., B1, T2, S1) are used by click, drag, scroll, and other interaction commands.
# Capture the frontmost window
peekaboo image --path /tmp/screenshot.png
# Capture a specific app's window
peekaboo image --app Safari --path /tmp/safari.png
# Capture entire screen
peekaboo image --mode screen --path /tmp/screen.png
# Capture at Retina resolution
peekaboo image --retina --path /tmp/retina.png
# Capture and analyze with AI
peekaboo image --analyze "What is shown on screen?"
# Analyze frontmost window, get element IDs
peekaboo see --json
# Analyze with annotated screenshot saved
peekaboo see --annotate --path /tmp/see.png --json
# Analyze a specific app
peekaboo see --app Safari --json
# Analyze a specific window
peekaboo see --app Safari --window-title "Login" --json
# Capture and ask AI about what's visible
peekaboo see --analyze "What buttons are visible?"
# Click on an element ID from `see`
peekaboo click --on B1
# Click by text query
peekaboo click "Submit"
# Click at specific coordinates
peekaboo click --coords 500,300
# Double-click
peekaboo click --on B1 --double
# Right-click
peekaboo click --on B1 --right
# Click in a specific app
peekaboo click --on T2 --app Safari
# Type text (human-like cadence by default)
peekaboo type "Hello World"
# Type and press return
peekaboo type "search query" --return
# Clear field first, then type
peekaboo type "new value" --clear
# Type at maximum speed
peekaboo type "fast text" --delay 0
# Press tab 3 times
peekaboo type --tab 3
# Type into a specific app
peekaboo type "text" --app "TextEdit"
# Copy
peekaboo hotkey "cmd,c"
# Paste
peekaboo hotkey "cmd,v"
# Select all
peekaboo hotkey "cmd,a"
# Open Spotlight
peekaboo hotkey "cmd,space"
# Reopen closed tab
peekaboo hotkey "cmd,shift,t"
# Target a specific app
peekaboo hotkey "cmd,s" --app "TextEdit"
# Scroll down 5 ticks
peekaboo scroll --direction down --amount 5
# Smooth scroll up
peekaboo scroll --direction up --amount 10 --smooth
# Scroll on a specific element
peekaboo scroll --direction down --amount 3 --on element_42
# Drag element to element
peekaboo drag --from B1 --to T2
# Drag by coordinates
peekaboo drag --from-coords "100,200" --to-coords "400,300"
# Drag to an app (e.g., Trash)
peekaboo drag --from B1 --to-app Trash
# List running apps
peekaboo app list --json
# Launch an app
peekaboo app launch "Safari"
peekaboo app launch "Safari" --open https://example.com
# Quit an app
peekaboo app quit --app Safari
# Quit all except certain apps
peekaboo app quit --all --except "Finder,Terminal"
# Switch to an app
peekaboo app switch --to Terminal
# Hide / unhide
peekaboo app hide --app Slack
peekaboo app unhide --app Slack
# Relaunch
peekaboo app relaunch Safari
# List windows for an app
peekaboo window list --app Safari --json
# Focus a window
peekaboo window focus --app "Visual Studio Code"
peekaboo window focus --app Safari --window-title "GitHub"
# Move a window
peekaboo window move --app TextEdit --x 100 --y 100
# Resize a window
peekaboo window resize --app Safari --width 1200 --height 800
# Set position and size together
peekaboo window set-bounds --app Chrome --x 50 --y 50 --width 1024 --height 768
# Minimize / maximize
peekaboo window minimize --app Finder
peekaboo window maximize --app Terminal
# Close a window
peekaboo window close --app Safari --window-title "GitHub"
# List all menu items for an app
peekaboo menu list --app Finder --json
# Click a menu item
peekaboo menu click --app Safari --item "New Window"
# Navigate nested menus
peekaboo menu click --app TextEdit --path "Format > Font > Show Fonts"
# Click system menu extras (WiFi, Bluetooth, etc.)
peekaboo menu click-extra --title "WiFi"
# List running apps
peekaboo list apps --json
# List windows
peekaboo list windows --app Safari --json
# List screens/displays
peekaboo list screens --json
# List menu bar status items
peekaboo list menubar --json
# Check permissions
peekaboo list permissions
# Read clipboard
peekaboo clipboard read
# Write to clipboard
peekaboo clipboard write "some text"
For complex tasks, use the built-in agent which plans and executes multiple steps autonomously:
# Run a multi-step task
peekaboo agent "Open Safari, navigate to example.com, and take a screenshot"
# Dry run to see planned steps
peekaboo agent "Prepare the TestFlight build" --dry-run
# Limit steps
peekaboo agent "Fill out the form" --max-steps 10
# Choose AI model
peekaboo agent "Describe the screen" --model claude-opus-4-5
All commands support these flags:
| Flag | Description |
|------|-------------|
| --json / -j | Machine-readable JSON output |
| --verbose / -v | Enable verbose logging |
| --no-remote | Force local execution, skip remote bridge hosts |
--json when parsing output programmaticallysee are ephemeral -- they change between captures. Always run see immediately before using IDs.see + click/type pattern is the fundamental workflow. Do not guess coordinates; use element IDs.peekaboo agent calls external AI APIs and may incur costs. Prefer manual see/click/type sequences for predictable operations./opt/homebrew/bin/peekaboo on the Mac Mini.peekaboo list permissions (but must run from GUI session)..command file pattern. Peekaboo needs the GUI session.peekaboo see again to get fresh IDs.--coords with exact coordinates, or re-run see --annotate to verify element positions visually.see returns no elements: The app window may be minimized or behind other windows. Use peekaboo window focus --app <name> first.peekaboo click to focus the target input field before peekaboo type.development
Search the web for current information, news, facts, and answers. Use when asked questions about current events, needing to look something up, finding websites, researching topics, or when you need up-to-date information beyond your training data.
development
Summarize any URL, YouTube video, podcast, PDF, or file into concise text. Use when asked to read an article, summarize a link, get the gist of a video or podcast, extract content from a URL, or when you need to understand what a web page or document contains.
development
Play music via Spotify and control Google Home speakers. Use when asked to play music, songs, artists, playlists, podcasts, or control speakers/volume/audio.
testing
Create new OpenClaw skills, modify and improve existing skills, and measure skill performance with evals. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy. Also use when asked to "make a skill", "turn this into a skill", "improve this skill", or "test this skill".