plugins/phantom/skills/computer-control/SKILL.md
Automates desktop GUI workflows via computer use API with screenshot capture. Use when scripting GUI interactions or recording browser sessions for tutorials.
npx skillsauth add athola/claude-night-market computer-controlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use Claude's Computer Use API to see and control desktop environments through screenshots and mouse/keyboard actions.
Why this stays opt-in. Per docs/inclusive-defaults.md (TRUE-exception category 4), Computer Use takes screenshots and synthesizes keyboard/mouse input — cross-process side effects that must always be explicitly invoked, never default-on.
The computer use system has three layers:
phantom.display) - executes OS-level
actions via xdotool/scrot on the real or virtual displayphantom.loop) - manages the conversation
cycle between Claude API and the display toolkitphantom.cli) - command-line interface for running
tasks or checking environment readinessUser Task
|
v
Agent Loop <----> Claude API (beta)
| |
v v
Display Toolkit tool_use responses
| (click, type, screenshot)
v
OS Commands (xdotool, scrot)
|
v
Display (X11 / Xvfb / WSLg)
cd plugins/phantom
uv run python -m phantom.cli --check
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop
result = run_loop(
task="Take a screenshot of the desktop",
api_key="sk-ant-...",
loop_config=LoopConfig(
model="claude-sonnet-4-6",
max_iterations=10,
),
display_config=DisplayConfig(width=1920, height=1080),
)
print(f"Done in {result.iterations} iterations")
print(result.final_text)
| Model | Tool Version | Beta Flag |
|-------|-------------|-----------|
| Opus 4.6, Sonnet 4.6, Opus 4.5 | computer_20251124 | computer-use-2025-11-24 |
| Sonnet 4.5, Haiku 4.5, older | computer_20250124 | computer-use-2025-01-24 |
The resolve_tool_version() function handles this mapping
automatically based on the model name.
All versions:
screenshot - capture displayleft_click - click at [x, y]type - type text stringkey - press key combo (e.g., ctrl+s)mouse_move - move cursorEnhanced (20250124+):
scroll - scroll with direction and amountleft_click_drag - drag between coordinatesright_click, middle_click, double_click, triple_clickhold_key - hold key for durationwait - pause between actionsLatest (20251124):
zoom - inspect screen region at full resolutionComputer use carries risks. Follow these guidelines:
max_iterations to
prevent runaway API costson_actionLinux (native or WSL2 with WSLg):
sudo apt install xdotool scrot xclip
Headless (Docker/CI):
# Install Xvfb for virtual display
sudo apt install xvfb xdotool scrot xclip
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1
tools
Detect friction signals; graduate patterns into rules. Use for session retrospectives.
testing
Use when you need a diff-derived test plan for an MR — reads the diff, groups changes by area, runs targeted verifications, and proves revert-tests are genuine guards, not dead assertions.
development
Curate the web-capture index. Use when the capture backlog grows, captures sit unprocessed at seedling/pending, or to surface stored research during work.
testing
Probe memory/summary clarity via dual anchor questions: task progress, info gaps. Use when verifying session state or summary before handoff or compression.