Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mediar-ai/packages/skills/skills/desktop-computer-automation

Name: packages/skills/skills/desktop-computer-automation
Author: mediar-ai

packages/skills/skills/desktop-computer-automation/SKILL.md

npx skillsauth add mediar-ai/skillhubz packages/skills/skills/desktop-computer-automation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Desktop Computer Automation

CRITICAL RULES:

Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action.

Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action.

Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands.

Always report task results before finishing.

Control your desktop (macOS, Windows, Linux) using npx @midscene/computer@1. Each CLI command maps directly to an MCP tool -- you (the AI agent) act as the brain, deciding which actions to take based on screenshots.

Prerequisites

Midscene requires models with strong visual grounding capabilities. Configure these environment variables:

MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"

Commands

Connect to Desktop

npx @midscene/computer@1 connect
npx @midscene/computer@1 connect --displayId <id>

List Displays

npx @midscene/computer@1 list_displays

Take Screenshot

npx @midscene/computer@1 take_screenshot

Perform Action

Use act to interact with the computer. Describe what you want to do in natural language:

npx @midscene/computer@1 act --prompt "type hello world in the search field and press Enter"
npx @midscene/computer@1 act --prompt "drag the file icon to the Trash"
npx @midscene/computer@1 act --prompt "search for the weather in Shanghai using the Chrome browser, tell me the result"

Disconnect

npx @midscene/computer@1 disconnect

Workflow Pattern

Connect to establish a session
Health check -- take a screenshot and verify it succeeds, then move the mouse to a random position
Launch the target app and take screenshot to see the current state
Execute action using act to perform the desired action
Disconnect when done
Report results -- summarize what was accomplished

Best Practices

Always run a health check first after connecting
Bring the target app to the foreground before using this skill (e.g., open -a <AppName> on macOS)
Be specific about UI elements: Say "the red close button in the top-left corner" instead of "the close button"
Describe locations when possible: "the icon in the top-right corner of the menu bar"
Never run in background: Every midscene command must run synchronously
Check for multiple displays: Use list_displays if an app is not visible
Batch related operations into a single act command when possible
Set up PATH before running (macOS): export PATH="/usr/sbin:/usr/bin:/bin:/sbin:$PATH"

Troubleshooting

macOS: Accessibility Permission Denied

Open System Settings > Privacy & Security > Accessibility and add your terminal app.

macOS: Xcode Command Line Tools Not Found

xcode-select --install

API Key Not Set

Check .env file contains MIDSCENE_MODEL_API_KEY=<your-key>.

AI Cannot Find the Element

Take a screenshot to verify the element is actually visible
Use more specific descriptions (include color, position, surrounding text)
Ensure the element is not hidden behind another window

mediar-ai/packages/skills/skills/desktop-computer-automation

packages/skills/skills/desktop-computer-automation/SKILL.md

# Desktop Computer Automation > **CRITICAL RULES:** > > 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. > 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. > 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which c

4 stars

tools

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add mediar-ai/skillhubz packages/skills/skills/desktop-computer-automation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 18, 2026, 6:43 AM27.9s1 file scanned

SKILL.md

Desktop Computer Automation

CRITICAL RULES:

Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action.

Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action.

Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands.

Always report task results before finishing.

Prerequisites

Midscene requires models with strong visual grounding capabilities. Configure these environment variables:

MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"

Commands

Connect to Desktop

npx @midscene/computer@1 connect
npx @midscene/computer@1 connect --displayId <id>

List Displays

npx @midscene/computer@1 list_displays

Take Screenshot

npx @midscene/computer@1 take_screenshot

Perform Action

Use act to interact with the computer. Describe what you want to do in natural language:

npx @midscene/computer@1 act --prompt "type hello world in the search field and press Enter"
npx @midscene/computer@1 act --prompt "drag the file icon to the Trash"
npx @midscene/computer@1 act --prompt "search for the weather in Shanghai using the Chrome browser, tell me the result"

Disconnect

npx @midscene/computer@1 disconnect

Workflow Pattern

Connect to establish a session
Health check -- take a screenshot and verify it succeeds, then move the mouse to a random position
Launch the target app and take screenshot to see the current state
Execute action using act to perform the desired action
Disconnect when done
Report results -- summarize what was accomplished

Best Practices

Always run a health check first after connecting
Bring the target app to the foreground before using this skill (e.g., open -a <AppName> on macOS)
Be specific about UI elements: Say "the red close button in the top-left corner" instead of "the close button"
Describe locations when possible: "the icon in the top-right corner of the menu bar"
Never run in background: Every midscene command must run synchronously
Check for multiple displays: Use list_displays if an app is not visible
Batch related operations into a single act command when possible
Set up PATH before running (macOS): export PATH="/usr/sbin:/usr/bin:/bin:/sbin:$PATH"

Troubleshooting

macOS: Accessibility Permission Denied

Open System Settings > Privacy & Security > Accessibility and add your terminal app.

macOS: Xcode Command Line Tools Not Found

xcode-select --install

API Key Not Set

Check .env file contains MIDSCENE_MODEL_API_KEY=<your-key>.

AI Cannot Find the Element

Take a screenshot to verify the element is actually visible
Use more specific descriptions (include color, position, surrounding text)
Ensure the element is not hidden behind another window

Related Skills

mediar-ai/tui-ui

tools

VerifiedTrustedCommunity

Design web-like user interfaces in the terminal and inside tmux with a cell-grid Canvas, CSS-like box model, flexbox/grid layout, and 15 reusable widgets such as Panel, Table, Card, ProgressBar, Meter, Tabs, Tree, Badge, Banner, and a braille line chart. Use when an agent needs a dashboard, panel, table, status page, TUI layout, tmux dashboard, screenshot-driven CLI/TUI replica, ANSI frame, truecolor render, pyte PNG screenshot smoke test, wide-character alignment, or a new terminal widget.

7SKILL.mdUpdated Jul 11, 2026

mediar-ai/drive-tui

tools

VerifiedTrustedCommunity

Drive interactive terminal (TUI) programs — CLIs, REPLs, installers, menu apps, agent CLIs, and editors like vim — through a PTY, reading semantic screen snapshots. A pattern library classifies a screen (REPL, menu, pager, fzf search, confirm dialog, form, spinner, wizard) and drives it with a ready recipe. Use when a program expects a live terminal (arrow-key menus, prompts, spinners, password fields, curses UIs), or when a piped command hangs or prints nothing.

7SKILL.mdUpdated Jul 11, 2026

mediar-ai/cmd-art

tools

VerifiedTrustedCommunity

Design and render terminal/CMD visual effects and ASCII art from a one-line request via the pluggable `fx` engine (18 hot-swappable, themeable effects plus scripted shows). Effects include donut, matrix rain, plasma, fire, a spinning 3D ball, Game of Life, wireframe cube, 3D text banners, rainbow/lolcat gradient text, starfield, tunnel, fireworks, image-to-ASCII, and more. Use when the request is for a terminal animation, ANSI/CLI art, or a new console effect. Pure Python stdlib; truecolor.

7SKILL.mdUpdated Jul 11, 2026

mediar-ai/packages/skills/skills/x-twitter-scraper

tools

VerifiedTrustedCommunity

# X Twitter Scraper Use Xquik for X/Twitter tweet search, user lookup, profile tweets, follower export, media download, monitors, webhooks, posting workflows, and MCP-backed API exploration. ## Prerequisites - A Xquik API key in `XQUIK_API_KEY`. - Internet access to `https://xquik.com/api/v1`, `https://xquik.com/mcp`, and `https://docs.xquik.com`. - A clear user request that identifies the target tweets, users, accounts, keywords, media, monitor, webhook, or write action. ## Source Truth -

6SKILL.mdUpdated May 31, 2026

mediar-ai/packages/skills/skills/x-twitter-scraper

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mediar-ai/skillhubz.git

# Copy into Claude Code skills folder (global)
cp -r skillhubz/packages/skills/skills/desktop-computer-automation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mediar-ai/skillhubz

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT