Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ranbot-ai/computer-use-agents

Name: computer-use-agents
Author: ranbot-ai

skills/computer-use-agents/SKILL.md

npx skillsauth add ranbot-ai/awesome-skills computer-use-agents

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Computer Use Agents

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control.

Patterns

Perception-Reasoning-Action Loop

The fundamental architecture of computer use agents: observe screen, reason about next action, execute action, repeat. This loop integrates vision models with action execution through an iterative pipeline.

Key components:

PERCEPTION: Screenshot captures current screen state
REASONING: Vision-language model analyzes and plans
ACTION: Execute mouse/keyboard operations
FEEDBACK: Observe result, continue or correct

Critical insight: Vision agents are completely still during "thinking" phase (1-5 seconds), creating a detectable pause pattern.

When to use: Building any computer use agent from scratch,Integrating vision models with desktop control,Understanding agent behavior patterns

from anthropic import Anthropic from PIL import Image import base64 import pyautogui import time

class ComputerUseAgent: """ Perception-Reasoning-Action loop implementation. Based on Anthropic Computer Use patterns. """

def __init__(self, client: Anthropic, model: str = "claude-sonnet-4-20250514"):
    self.client = client
    self.model = model
    self.max_steps = 50  # Prevent runaway loops
    self.action_delay = 0.5  # Seconds between actions

def capture_screenshot(self) -> str:
    """Capture screen and return base64 encoded image."""
    screenshot = pyautogui.screenshot()
    # Resize for token efficiency (1280x800 is good balance)
    screenshot = screenshot.resize((1280, 800), Image.LANCZOS)

    import io
    buffer = io.BytesIO()
    screenshot.save(buffer, format="PNG")
    return base64.b64encode(buffer.getvalue()).decode()

def execute_action(self, action: dict) -> dict:
    """Execute mouse/keyboard action on the computer."""
    action_type = action.get("type")

    if action_type == "click":
        x, y = action["x"], action["y"]
        button = action.get("button", "left")
        pyautogui.click(x, y, button=button)
        return {"success": True, "action": f"clicked at ({x}, {y})"}

    elif action_type == "type":
        text = action["text"]
        pyautogui.typewrite(text, interval=0.02)
        return {"success": True, "action": f"typed {len(text)} chars"}

    elif action_type == "key":
        key = action["key"]
        pyautogui.press(key)
        return {"success": True, "action": f"pressed {key}"}

    elif action_type == "scroll":
        direction = action.get("direction", "down")
        amount = action.get("amount", 3)
        scroll = -amount if direction == "down" else amount
        pyautogui.scroll(scroll)
        return {"success": True, "action": f"scrolled {direction}"}

    elif action_type == "move":
        x, y = action["x"], action["y"]
        pyautogui.moveTo(x, y)
        return {"success": True, "action": f"moved to ({x}, {y})"}

    else:
        return {"success": False, "error": f"Unknown action: {action_type}"}

def run(self, task: str) -> dict:
    """
    Run perception-reasoning-action loop until task complete.

    The loop:
    1. Screenshot current state
    2. Send to vision model with task context
    3. Parse action from response
    4. Execute action
    5. Repeat until done or max steps
    """
    messages = []
    step_count = 0

    system_prompt = """You are a computer use agent. You can see the screen
    and control mouse/keyboard.

    Available actions (respond with JSON):
    - {"type": "click", "x": 100, "y": 200, "button": "left"}
    - {"type": "type", "text": "hello world"}
    - {"type": "key", "key": "enter"}
    - {"type": "scroll", "direction": "down", "amount": 3}
    - {"type": "done", "result": "task completed successfully"}

    Always respond with ONLY a JSON action object.
    Be precise with coordinates - click exactly where needed.
    If you see an error, try to recover.
    """

    while step_count < self.max_steps:
        step_count += 1

        # 1. PERCEPTION: Capture current screen
        screenshot_b64 = self.capture_screenshot()

        # 2. REASONING: Send to vision model
        user_content = [
            {"type": "text", "text": f"Task: {task}\n\nStep {step_count}. What action should I take?"},
            {"type": "image", "source": {
                "type": "base64",
                "media_type": "image/png",
                "data": screenshot_b64
            }}
        ]

        me

ranbot-ai/computer-use-agents

skills/computer-use-agents/SKILL.md

4 stars

tools

Updated Apr 22, 2026

$ install --global

skillsauth

npx skillsauth add ranbot-ai/awesome-skills computer-use-agents

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 22, 2026, 2:13 PM74.4s1 file scanned

SKILL.md

name:: computer-use-agents
description:: Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-sourc
category:: AI & Agents
source:: antigravity
tags:: [python, react, api, mcp, claude, ai, agent, llm, gpt, automation]
url:: https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/computer-use-agents

Computer Use Agents

Patterns

Perception-Reasoning-Action Loop

Key components:

PERCEPTION: Screenshot captures current screen state
REASONING: Vision-language model analyzes and plans
ACTION: Execute mouse/keyboard operations
FEEDBACK: Observe result, continue or correct

Critical insight: Vision agents are completely still during "thinking" phase (1-5 seconds), creating a detectable pause pattern.

When to use: Building any computer use agent from scratch,Integrating vision models with desktop control,Understanding agent behavior patterns

from anthropic import Anthropic from PIL import Image import base64 import pyautogui import time

class ComputerUseAgent: """ Perception-Reasoning-Action loop implementation. Based on Anthropic Computer Use patterns. """

def __init__(self, client: Anthropic, model: str = "claude-sonnet-4-20250514"):
    self.client = client
    self.model = model
    self.max_steps = 50  # Prevent runaway loops
    self.action_delay = 0.5  # Seconds between actions

def capture_screenshot(self) -> str:
    """Capture screen and return base64 encoded image."""
    screenshot = pyautogui.screenshot()
    # Resize for token efficiency (1280x800 is good balance)
    screenshot = screenshot.resize((1280, 800), Image.LANCZOS)

    import io
    buffer = io.BytesIO()
    screenshot.save(buffer, format="PNG")
    return base64.b64encode(buffer.getvalue()).decode()

def execute_action(self, action: dict) -> dict:
    """Execute mouse/keyboard action on the computer."""
    action_type = action.get("type")

    if action_type == "click":
        x, y = action["x"], action["y"]
        button = action.get("button", "left")
        pyautogui.click(x, y, button=button)
        return {"success": True, "action": f"clicked at ({x}, {y})"}

    elif action_type == "type":
        text = action["text"]
        pyautogui.typewrite(text, interval=0.02)
        return {"success": True, "action": f"typed {len(text)} chars"}

    elif action_type == "key":
        key = action["key"]
        pyautogui.press(key)
        return {"success": True, "action": f"pressed {key}"}

    elif action_type == "scroll":
        direction = action.get("direction", "down")
        amount = action.get("amount", 3)
        scroll = -amount if direction == "down" else amount
        pyautogui.scroll(scroll)
        return {"success": True, "action": f"scrolled {direction}"}

    elif action_type == "move":
        x, y = action["x"], action["y"]
        pyautogui.moveTo(x, y)
        return {"success": True, "action": f"moved to ({x}, {y})"}

    else:
        return {"success": False, "error": f"Unknown action: {action_type}"}

def run(self, task: str) -> dict:
    """
    Run perception-reasoning-action loop until task complete.

    The loop:
    1. Screenshot current state
    2. Send to vision model with task context
    3. Parse action from response
    4. Execute action
    5. Repeat until done or max steps
    """
    messages = []
    step_count = 0

    system_prompt = """You are a computer use agent. You can see the screen
    and control mouse/keyboard.

    Available actions (respond with JSON):
    - {"type": "click", "x": 100, "y": 200, "button": "left"}
    - {"type": "type", "text": "hello world"}
    - {"type": "key", "key": "enter"}
    - {"type": "scroll", "direction": "down", "amount": 3}
    - {"type": "done", "result": "task completed successfully"}

    Always respond with ONLY a JSON action object.
    Be precise with coordinates - click exactly where needed.
    If you see an error, try to recover.
    """

    while step_count < self.max_steps:
        step_count += 1

        # 1. PERCEPTION: Capture current screen
        screenshot_b64 = self.capture_screenshot()

        # 2. REASONING: Send to vision model
        user_content = [
            {"type": "text", "text": f"Task: {task}\n\nStep {step_count}. What action should I take?"},
            {"type": "image", "source": {
                "type": "base64",
                "media_type": "image/png",
                "data": screenshot_b64
            }}
        ]

        me

Related Skills

ranbot-ai/ditto

tools

VerifiedTrustedCommunity

Use when a user asks to mine or update a private, evidence-backed work profile from local Claude Code, Codex, Copilot CLI, or OpenCode sessions.

5SKILL.mdUpdated Jul 18, 2026

ranbot-ai/diagnose-android-overheating

data-ai

VerifiedTrustedCommunity

Use when diagnosing Android overheating, idle heat, thermal throttling, charging or radio heat, or abnormal battery drain with read-only ADB evidence and approval gates.

5SKILL.mdUpdated Jul 18, 2026

ranbot-ai/diagnose-android-overheating

ranbot-ai/competitor-ad-intelligence

research

VerifiedTrustedCommunity

Research public competitor ads, analyze creative patterns and landing pages, and produce an evidence-labeled strategic teardown.

5SKILL.mdUpdated Jul 18, 2026

ranbot-ai/competitor-ad-intelligence

ranbot-ai/anywrite

tools

VerifiedTrustedCommunity

Compiled CLI covering all 52 endpoints of the Anytype local API — objects, properties, tags, search, chat, files — one binary, no MCP server needed.

5SKILL.mdUpdated Jul 18, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ranbot-ai/awesome-skills.git

# Copy into Claude Code skills folder (global)
cp -r awesome-skills/skills/computer-use-agents ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ranbot-ai/awesome-skills

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT