plugins/claude-code-expert/skills/computer-use/SKILL.md
Computer use and GUI automation patterns — when to use GUI automation vs shell/MCP/browser tools, visual validation techniques, native app testing, and guardrails for visual regression workflows
npx skillsauth add markus41/claude plugins/claude-code-expert/skills/computer-useInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Computer use lets Claude interact with GUIs: click buttons, fill forms, take screenshots, and navigate native apps. This is powerful but expensive and slow — use it only when a more precise tool doesn't exist.
Before reaching for computer use, exhaust these options first:
| Task | Prefer This | Over Computer Use | |------|------------|-------------------| | API endpoint testing | Bash + curl | Clicking through UI | | Database inspection | MCP postgres/sqlite | Navigating admin UI | | File operations | Read/Write/Edit | Drag-and-drop UI | | Web scraping | Firecrawl MCP | Screenshot + parse | | Browser automation | Playwright MCP | Computer use click | | CI status | GitHub API / gh CLI | Browser navigation | | Log inspection | Bash + grep | Terminal screenshot |
Rule: If you can express the task as a shell command or API call, do that. Computer use is the fallback for GUI-only workflows.
Testing a desktop app that has no API or CLI interface.
# Example: Validate Electron app UI after a build
Take a screenshot of the app after launch.
Click the "New Project" button.
Verify the dialog opens with the correct fields.
Fill in project name: "Test Project 2026"
Click Create and verify the project appears in the list.
Detecting layout regressions that unit tests can't catch.
# Workflow:
1. Take baseline screenshot of the current UI state
2. Apply the change
3. Take comparison screenshot
4. Highlight pixel differences > 1%
5. Human reviews diff
Admin panels, legacy enterprise software, and embedded UIs with no API.
# Example: Generate a report from a legacy admin panel
Navigate to: http://admin.internal/reports
Click: "Export" → "CSV" → "Last 30 days"
Wait for download
Move file to: /tmp/report-{date}.csv
Mobile simulator or desktop app testing that requires visual interaction.
# Example: iOS simulator validation
Launch: xcrun simctl launch booted com.example.MyApp
Take screenshot
Verify: "Welcome" text is visible in the header
Tap: "Get Started" button (coordinates or element description)
Verify: onboarding screen loads
Computer use output is inherently visual and unstructured. Always verify results with a structured check after GUI actions:
After each GUI action:
1. Take a screenshot
2. Verify the expected visual state (specific text, element position, color)
3. If verification fails: log "FAIL: {what was expected vs. what was seen}"
4. If unsure: take another screenshot from a wider viewport
At the end:
- List each action and its verification result
- Count: {N} actions taken, {M} verified OK, {K} failed
| Confidence | Verification | Action | |------------|-------------|--------| | HIGH | Text matches exactly / element found by ID | Proceed | | MEDIUM | Visual match but element found by position | Log and proceed | | LOW | Can't find element / ambiguous screenshot | Stop, report to human |
Computer use can cause irreversible actions (delete files, send emails, submit forms). Apply these guardrails:
Keep screenshots of:
For complex GUI flows, describe the steps and ask for confirmation before executing:
Before I click "Submit", here's what will happen:
- Form data: {summary}
- This action cannot be undone
- Proceeding? (yes/no)
For web UIs, Playwright MCP is almost always better than computer use:
| | Playwright MCP | Computer Use | |--|---------------|-------------| | Reliability | High (DOM-based) | Medium (pixel-based) | | Speed | Fast | Slow (screenshot per action) | | Testability | Scriptable, repeatable | Hard to reproduce exactly | | Cost | Low | High (vision model per screenshot) | | Works on | Web browsers | Any visual surface |
Use Playwright MCP for: Web app testing, scraping, form automation on websites.
Use Computer Use for: Native desktop apps, embedded UIs, legacy apps with no API.
Computer use is expensive:
Estimate before using: If a GUI flow has N steps, expect N × (screenshot tokens + generation tokens). For flows > 20 steps, consider whether a shell/API approach exists.
Computer use requires the Claude Desktop app (not CLI or Web). The Desktop app has the screen capture and input simulation capabilities that CLI lacks.
CLI: ❌ Computer use not available
Web: ❌ Computer use not available
Desktop: ✅ Computer use available
development
Enhanced plan-authoring skill with Pre-Writing context gathering, task metadata, non-TDD templates, Red Flags, telemetry, and an automated plan linter. Use when you have a spec or requirements for a multi-step task, before touching code.
tools
Documentation intelligence engine with graph-based API docs, algorithm library, and drift detection
tools
Ultraplan cloud planning — kick off a plan in the cloud from your terminal, review and revise in the browser, then execute remotely or send back to CLI
tools
--- name: mcp description: Configure MCP servers for Claude Code — stdio vs HTTP, authentication, Tools/Resources/Prompts distinction, channels (CI webhook, mobile relay, Discord bridge, fakechat), and cost of always-loaded tools. Use this skill whenever adding an MCP server, debugging connection issues, choosing between MCP Tools vs Prompts vs Resources, installing channel servers, or managing .mcp.json. Triggers on: "MCP server", "mcp config", "add Obsidian MCP", "install context7", "channels"