Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

svenflow/computer-use

Name: computer-use
Author: svenflow

skills/computer-use/SKILL.md

npx skillsauth add svenflow/dispatch computer-use

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Computer Use Skill

Analyze screens and interact with UI elements. Works with macOS native apps, Chrome browser, and iOS Simulator.

Prerequisites

Requires OmniParser for vision-based parsing:

# Clone OmniParser
git clone https://github.com/microsoft/OmniParser.git ~/code/OmniParser

# Download weights (~2GB)
cd ~/code/OmniParser
uv pip install huggingface_hub
for f in icon_detect/{train_args.yaml,model.pt,model.yaml} icon_caption/{config.json,generation_config.json,model.safetensors}; do
  huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights
done
mv weights/icon_caption weights/icon_caption_florence

Also install Peekaboo for native macOS accessibility:

brew install steipete/tap/peekaboo

see CLI

The unified tool for screen analysis and interaction.

Analyze Screen

# Live screen capture (runs both Peekaboo + OmniParser in parallel)
~/.claude/skills/computer-use/scripts/see

# Analyze specific app
~/.claude/skills/computer-use/scripts/see --app Chrome

# Analyze existing image file (OmniParser only)
~/.claude/skills/computer-use/scripts/see --image /path/to/screenshot.png

# JSON output
~/.claude/skills/computer-use/scripts/see --json

# Verbose mode (shows timing)
~/.claude/skills/computer-use/scripts/see -v

Click Elements

# Click Peekaboo element by ID
~/.claude/skills/computer-use/scripts/see click p_elem_42

# Click at specific coordinates
~/.claude/skills/computer-use/scripts/see click --coords 500,300

Two Engines

| Engine | Prefix | Speed | Best For | |--------|--------|-------|----------| | Peekaboo | p_* | ~1s | Native macOS apps (uses Accessibility API) | | OmniParser | o_* | ~4s warm, ~14s cold | Web content, custom UI, images |

When analyzing live screens, both run in parallel. When using --image, only OmniParser runs.

Output Format

{
  "peekaboo": {
    "elements": [
      {"id": "p_elem_42", "label": "Settings", "role": "button", "is_actionable": true}
    ],
    "element_count": 450,
    "elapsed_ms": 1200
  },
  "omniparser": {
    "elements": [
      {"id": "o_15", "content": "A gear icon", "type": "icon", "center_pixels": [558, 71], "clickable": true}
    ],
    "element_count": 196,
    "elapsed_ms": 4200
  }
}

Platform-Specific Workflows

macOS Native Apps

# 1. Analyze screen
~/.claude/skills/computer-use/scripts/see --json > /tmp/screen.json

# 2. Click Peekaboo element (preferred for native apps)
~/.claude/skills/computer-use/scripts/see click p_elem_42

# Or click by coordinates (Retina: divide OmniParser pixels by 2)
~/.claude/skills/computer-use/scripts/see click --coords 279,35

Chrome Browser

# 1. Analyze
~/.claude/skills/computer-use/scripts/see --app Chrome --json > /tmp/chrome.json

# 2. Click using Chrome extension (NOT cliclick)
~/.claude/skills/chrome-control/scripts/chrome click 558 71

Important: Always use chrome click for Chrome, not see click or cliclick.

iOS Simulator

# 1. Capture and analyze
xcrun simctl io booted screenshot /tmp/sim.png
~/.claude/skills/computer-use/scripts/see --image /tmp/sim.png --json > /tmp/sim.json

# 2. Tap (divide pixels by scale factor: 3x for iPhone 15, 2x for iPad)
xcrun simctl io booted tap 186 400

Coordinate Systems

| Platform | Coordinates | Conversion | |----------|-------------|------------| | macOS (cliclick) | Logical pixels | OmniParser pixels / 2 (Retina) | | Chrome | Viewport pixels | Use as-is with chrome click | | iOS Simulator | Points | OmniParser pixels / scale (2x or 3x) |

Server Management

OmniParser runs as a background daemon (models stay in RAM for fast inference):

# Check status
~/.claude/skills/computer-use/scripts/see --status   # via see
# or
~/.claude/skills/computer-use/scripts/parse-image --status

# Stop server (reclaim ~2GB RAM)
~/.claude/skills/computer-use/scripts/parse-image --stop

# View logs
tail -f /tmp/omniparser-server.log

Server Details

Port: 8765 (localhost only)
Auto-shutdown: 12 hours idle
Memory: ~2GB (YOLO + Florence-2 models)
First call: ~10-30s (server boot + model load)
Subsequent calls: ~4s (inference only)

Troubleshooting

Server hangs on startup: If OmniParser hangs for 2+ minutes on startup, it may be stuck on a PaddleOCR connectivity check. The parse-image script sets PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK=True automatically to bypass this.

Tips

Finding Elements

# Get JSON output
~/.claude/skills/computer-use/scripts/see --json > /tmp/screen.json

# Search OmniParser elements by text
jq '.omniparser.elements[] | select(.content | test("Settings"; "i"))' /tmp/screen.json

# Get clickable elements only
jq '.omniparser.elements[] | select(.clickable == true)' /tmp/screen.json

# Search Peekaboo elements
jq '.peekaboo.elements[] | select(.label | test("Settings"; "i"))' /tmp/screen.json

Debugging

# Verbose mode shows timing for each engine
~/.claude/skills/computer-use/scripts/see -v

# Save annotated image
~/.claude/skills/computer-use/scripts/see --output /tmp/debug/
open /tmp/debug/omniparser_annotated.png

svenflow/computer-use

skills/computer-use/SKILL.md

Analyze and interact with screen UI via vision and accessibility APIs. Use for clicking buttons, finding elements, reading text on screen, and automating macOS, Chrome, or iOS Simulator. Trigger words - see screen, look at screen, what do you see, what's on screen, read screen, find button, click button, tap element, locate element, screen automation, computer use, vision, OCR, screenshot, control computer, interact with UI, mouse click, navigate app.

tools

Updated May 28, 2026

$ install --global

skillsauth

npx skillsauth add svenflow/dispatch computer-use

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 28, 2026, 5:12 AM322.9s1 file scanned

SKILL.md

name:: computer-use
description:: Analyze and interact with screen UI via vision and accessibility APIs. Use for clicking buttons, finding elements, reading text on screen, and automating macOS, Chrome, or iOS Simulator. Trigger words - see screen, look at screen, what do you see, what's on screen, read screen, find button, click button, tap element, locate element, screen automation, computer use, vision, OCR, screenshot, control computer, interact with UI, mouse click, navigate app.

Computer Use Skill

Analyze screens and interact with UI elements. Works with macOS native apps, Chrome browser, and iOS Simulator.

Prerequisites

Requires OmniParser for vision-based parsing:

# Clone OmniParser
git clone https://github.com/microsoft/OmniParser.git ~/code/OmniParser

# Download weights (~2GB)
cd ~/code/OmniParser
uv pip install huggingface_hub
for f in icon_detect/{train_args.yaml,model.pt,model.yaml} icon_caption/{config.json,generation_config.json,model.safetensors}; do
  huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights
done
mv weights/icon_caption weights/icon_caption_florence

Also install Peekaboo for native macOS accessibility:

brew install steipete/tap/peekaboo

see CLI

The unified tool for screen analysis and interaction.

Analyze Screen

# Live screen capture (runs both Peekaboo + OmniParser in parallel)
~/.claude/skills/computer-use/scripts/see

# Analyze specific app
~/.claude/skills/computer-use/scripts/see --app Chrome

# Analyze existing image file (OmniParser only)
~/.claude/skills/computer-use/scripts/see --image /path/to/screenshot.png

# JSON output
~/.claude/skills/computer-use/scripts/see --json

# Verbose mode (shows timing)
~/.claude/skills/computer-use/scripts/see -v

Click Elements

# Click Peekaboo element by ID
~/.claude/skills/computer-use/scripts/see click p_elem_42

# Click at specific coordinates
~/.claude/skills/computer-use/scripts/see click --coords 500,300

Two Engines

When analyzing live screens, both run in parallel. When using --image, only OmniParser runs.

Output Format

{
  "peekaboo": {
    "elements": [
      {"id": "p_elem_42", "label": "Settings", "role": "button", "is_actionable": true}
    ],
    "element_count": 450,
    "elapsed_ms": 1200
  },
  "omniparser": {
    "elements": [
      {"id": "o_15", "content": "A gear icon", "type": "icon", "center_pixels": [558, 71], "clickable": true}
    ],
    "element_count": 196,
    "elapsed_ms": 4200
  }
}

Platform-Specific Workflows

macOS Native Apps

# 1. Analyze screen
~/.claude/skills/computer-use/scripts/see --json > /tmp/screen.json

# 2. Click Peekaboo element (preferred for native apps)
~/.claude/skills/computer-use/scripts/see click p_elem_42

# Or click by coordinates (Retina: divide OmniParser pixels by 2)
~/.claude/skills/computer-use/scripts/see click --coords 279,35

Chrome Browser

# 1. Analyze
~/.claude/skills/computer-use/scripts/see --app Chrome --json > /tmp/chrome.json

# 2. Click using Chrome extension (NOT cliclick)
~/.claude/skills/chrome-control/scripts/chrome click 558 71

Important: Always use chrome click for Chrome, not see click or cliclick.

iOS Simulator

# 1. Capture and analyze
xcrun simctl io booted screenshot /tmp/sim.png
~/.claude/skills/computer-use/scripts/see --image /tmp/sim.png --json > /tmp/sim.json

# 2. Tap (divide pixels by scale factor: 3x for iPhone 15, 2x for iPad)
xcrun simctl io booted tap 186 400

Coordinate Systems

Server Management

OmniParser runs as a background daemon (models stay in RAM for fast inference):

# Check status
~/.claude/skills/computer-use/scripts/see --status   # via see
# or
~/.claude/skills/computer-use/scripts/parse-image --status

# Stop server (reclaim ~2GB RAM)
~/.claude/skills/computer-use/scripts/parse-image --stop

# View logs
tail -f /tmp/omniparser-server.log

Server Details

Port: 8765 (localhost only)
Auto-shutdown: 12 hours idle
Memory: ~2GB (YOLO + Florence-2 models)
First call: ~10-30s (server boot + model load)
Subsequent calls: ~4s (inference only)

Troubleshooting

Tips

Finding Elements

# Get JSON output
~/.claude/skills/computer-use/scripts/see --json > /tmp/screen.json

# Search OmniParser elements by text
jq '.omniparser.elements[] | select(.content | test("Settings"; "i"))' /tmp/screen.json

# Get clickable elements only
jq '.omniparser.elements[] | select(.clickable == true)' /tmp/screen.json

# Search Peekaboo elements
jq '.peekaboo.elements[] | select(.label | test("Settings"; "i"))' /tmp/screen.json

Debugging

# Verbose mode shows timing for each engine
~/.claude/skills/computer-use/scripts/see -v

# Save annotated image
~/.claude/skills/computer-use/scripts/see --output /tmp/debug/
open /tmp/debug/omniparser_annotated.png

Related Skills

svenflow/frontend

development

VerifiedTrustedCommunity

Use when building React/Next.js components, dashboards, admin panels, apps, or any web interface. Trigger words - react, frontend, ui, dashboard, component, interface, web app, polish, audit, design review.

SKILL.mdUpdated May 28, 2026

svenflow/flight-tracker

tools

VerifiedTrustedCommunity

Track flight status and get FlightAware links. Use when asked about flights, flight status, arrival times, or flight tracking. Trigger words - flight, flying, UA, AA, DL, landing, arriving, departure.

SKILL.mdUpdated May 28, 2026

svenflow/flight-tracker

svenflow/findmy

development

VerifiedTrustedCommunity

Query real-time locations of people sharing via Find My. Look up where someone is, reverse geocode GPS coordinates, set up geofence alerts. Trigger words - findmy, find my, location, where is, geofence, track location.

SKILL.mdUpdated May 28, 2026

svenflow/figma

tools

VerifiedTrustedCommunity

Access Figma designs via MCP or Chrome. Use when asked about Figma files, design mockups, wireframes, or UI designs. Trigger words - figma, design, mockup, wireframe, UI design, FigJam.

SKILL.mdUpdated May 28, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/svenflow/dispatch.git

# Copy into Claude Code skills folder (global)
cp -r dispatch/skills/computer-use ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

svenflow/dispatch

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT