skills/desktop-control/SKILL.md
Advanced desktop automation with mouse, keyboard, and screen control
npx skillsauth add leoyeai/openclaw-master-skills skills/desktop-controlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The most advanced desktop automation skill for OpenClaw. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.
First, install required dependencies:
pip install pyautogui pillow opencv-python pygetwindow
from skills.desktop_control import DesktopController
# Initialize controller
dc = DesktopController(failsafe=True)
# Mouse operations
dc.move_mouse(500, 300) # Move to coordinates
dc.click() # Left click at current position
dc.click(100, 200, button="right") # Right click at position
# Keyboard operations
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c") # Copy
dc.press("enter")
# Screen operations
screenshot = dc.screenshot()
position = dc.get_mouse_position()
move_mouse(x, y, duration=0, smooth=True)Move mouse to absolute screen coordinates.
Parameters:
x (int): X coordinate (pixels from left)y (int): Y coordinate (pixels from top)duration (float): Movement time in seconds (0 = instant, 0.5 = smooth)smooth (bool): Use bezier curve for natural movementExample:
# Instant movement
dc.move_mouse(1000, 500)
# Smooth 1-second movement
dc.move_mouse(1000, 500, duration=1.0)
move_relative(x_offset, y_offset, duration=0)Move mouse relative to current position.
Parameters:
x_offset (int): Pixels to move horizontally (positive = right)y_offset (int): Pixels to move vertically (positive = down)duration (float): Movement time in secondsExample:
# Move 100px right, 50px down
dc.move_relative(100, 50, duration=0.3)
click(x=None, y=None, button='left', clicks=1, interval=0.1)Perform mouse click.
Parameters:
x, y (int, optional): Coordinates to click (None = current position)button (str): 'left', 'right', 'middle'clicks (int): Number of clicks (1 = single, 2 = double)interval (float): Delay between multiple clicksExample:
# Simple left click
dc.click()
# Double-click at specific position
dc.click(500, 300, clicks=2)
# Right-click
dc.click(button='right')
drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')Drag and drop operation.
Parameters:
start_x, start_y (int): Starting coordinatesend_x, end_y (int): Ending coordinatesduration (float): Drag durationbutton (str): Mouse button to useExample:
# Drag file from desktop to folder
dc.drag(100, 100, 500, 500, duration=1.0)
scroll(clicks, direction='vertical', x=None, y=None)Scroll mouse wheel.
Parameters:
clicks (int): Scroll amount (positive = up/left, negative = down/right)direction (str): 'vertical' or 'horizontal'x, y (int, optional): Position to scroll atExample:
# Scroll down 5 clicks
dc.scroll(-5)
# Scroll up 10 clicks
dc.scroll(10)
# Horizontal scroll
dc.scroll(5, direction='horizontal')
get_mouse_position()Get current mouse coordinates.
Returns: (x, y) tuple
Example:
x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")
type_text(text, interval=0, wpm=None)Type text with configurable speed.
Parameters:
text (str): Text to typeinterval (float): Delay between keystrokes (0 = instant)wpm (int, optional): Words per minute (overrides interval)Example:
# Instant typing
dc.type_text("Hello World")
# Human-like typing at 60 WPM
dc.type_text("Hello World", wpm=60)
# Slow typing with 0.1s between keys
dc.type_text("Hello World", interval=0.1)
press(key, presses=1, interval=0.1)Press and release a key.
Parameters:
key (str): Key name (see Key Names section)presses (int): Number of times to pressinterval (float): Delay between pressesExample:
# Press Enter
dc.press('enter')
# Press Space 3 times
dc.press('space', presses=3)
# Press Down arrow
dc.press('down')
hotkey(*keys, interval=0.05)Execute keyboard shortcut.
Parameters:
*keys (str): Keys to press togetherinterval (float): Delay between key pressesExample:
# Copy (Ctrl+C)
dc.hotkey('ctrl', 'c')
# Paste (Ctrl+V)
dc.hotkey('ctrl', 'v')
# Open Run dialog (Win+R)
dc.hotkey('win', 'r')
# Save (Ctrl+S)
dc.hotkey('ctrl', 's')
# Select All (Ctrl+A)
dc.hotkey('ctrl', 'a')
key_down(key) / key_up(key)Manually control key state.
Example:
# Hold Shift
dc.key_down('shift')
dc.type_text("hello") # Types "HELLO"
dc.key_up('shift')
# Hold Ctrl and click (for multi-select)
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')
screenshot(region=None, filename=None)Capture screen or region.
Parameters:
region (tuple, optional): (left, top, width, height) for partial capturefilename (str, optional): Path to save imageReturns: PIL Image object
Example:
# Full screen
img = dc.screenshot()
# Save to file
dc.screenshot(filename="screenshot.png")
# Capture specific region
img = dc.screenshot(region=(100, 100, 500, 300))
get_pixel_color(x, y)Get color of pixel at coordinates.
Returns: RGB tuple (r, g, b)
Example:
r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")
find_on_screen(image_path, confidence=0.8)Find image on screen (requires OpenCV).
Parameters:
image_path (str): Path to template imageconfidence (float): Match threshold (0-1)Returns: (x, y, width, height) or None
Example:
# Find button on screen
location = dc.find_on_screen("button.png")
if location:
x, y, w, h = location
# Click center of found image
dc.click(x + w//2, y + h//2)
get_screen_size()Get screen resolution.
Returns: (width, height) tuple
Example:
width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")
get_all_windows()List all open windows.
Returns: List of window titles
Example:
windows = dc.get_all_windows()
for title in windows:
print(f"Window: {title}")
activate_window(title_substring)Bring window to front by title.
Parameters:
title_substring (str): Part of window title to matchExample:
# Activate Chrome
dc.activate_window("Chrome")
# Activate VS Code
dc.activate_window("Visual Studio Code")
get_active_window()Get currently focused window.
Returns: Window title (str)
Example:
active = dc.get_active_window()
print(f"Active window: {active}")
copy_to_clipboard(text)Copy text to clipboard.
Example:
dc.copy_to_clipboard("Hello from OpenClaw!")
get_from_clipboard()Get text from clipboard.
Returns: str
Example:
text = dc.get_from_clipboard()
print(f"Clipboard: {text}")
'a' through 'z'
'0' through '9'
'f1' through 'f24'
'enter' / 'return''esc' / 'escape''space' / 'spacebar''tab''backspace''delete' / 'del''insert''home''end''pageup' / 'pgup''pagedown' / 'pgdn''up' / 'down' / 'left' / 'right''ctrl' / 'control''shift''alt''win' / 'winleft' / 'winright''cmd' / 'command' (Mac)'capslock''numlock''scrolllock''.' / ',' / '?' / '!' / ';' / ':''[' / ']' / '{' / '}''(' / ')''+' / '-' / '*' / '/' / '='Move mouse to any corner of the screen to abort all automation.
# Enable failsafe (enabled by default)
dc = DesktopController(failsafe=True)
# Pause all automation for 2 seconds
dc.pause(2.0)
# Check if automation is safe to proceed
if dc.is_safe():
dc.click(500, 500)
Require user confirmation before actions:
dc = DesktopController(require_approval=True)
# This will ask for confirmation
dc.click(500, 500) # Prompt: "Allow click at (500, 500)? [y/n]"
dc = DesktopController()
# Click name field
dc.click(300, 200)
dc.type_text("John Doe", wpm=80)
# Tab to next field
dc.press('tab')
dc.type_text("[email protected]", wpm=80)
# Tab to password
dc.press('tab')
dc.type_text("SecurePassword123", wpm=60)
# Submit form
dc.press('enter')
# Capture specific area
region = (100, 100, 800, 600) # left, top, width, height
img = dc.screenshot(region=region)
# Save with timestamp
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
img.save(f"capture_{timestamp}.png")
# Hold Ctrl and click multiple files
dc.key_down('ctrl')
dc.click(100, 200) # First file
dc.click(100, 250) # Second file
dc.click(100, 300) # Third file
dc.key_up('ctrl')
# Copy selected files
dc.hotkey('ctrl', 'c')
# Activate Calculator
dc.activate_window("Calculator")
time.sleep(0.5)
# Type calculation
dc.type_text("5+3=", interval=0.2)
time.sleep(0.5)
# Take screenshot of result
dc.screenshot(filename="calculation_result.png")
# Drag file from source to destination
dc.drag(
start_x=200, start_y=300, # File location
end_x=800, end_y=500, # Folder location
duration=1.0 # Smooth 1-second drag
)
duration=0get_screen_size() to confirm dimensionsinterval for reliabilityDesktopController(failsafe=False)Install all:
pip install pyautogui pillow opencv-python pygetwindow
Built for OpenClaw - The ultimate desktop automation companion 🦞
testing
AI-powered diary generation for agents - creates rich, reflective journal entries (400-600 words) with Quote Hall of Fame, Curiosity Backlog, Decision Archaeology, Relationship Evolution, mood analytics, weekly digests, "On This Day" resurfacing, and cron auto-generation. Works best with Claude models (Haiku, Sonnet, Opus).
development
Multi-agent UX for OpenClaw Control UI — agent selector, per-agent sessions, session history viewer with search, agent-filtered Sessions tab with friendly names, Create Agent wizard, emoji picker, and backend agent CRUD.
tools
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
tools
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.