.agents/skills/cua/SKILL.md
Computer Use Agent — autonomous browser control via Steel, Browser Use, and Browserbase cloud providers. Supports headful CDP sessions, CAPTCHA solving, proxy rotation, live view, session recording/replay, Playwright/Puppeteer integration, and LLM-driven visual browsing. Use when asked about computer use, browser automation, web scraping, Steel sessions, cloud browsers, headful browsing, CAPTCHA solving, or autonomous web navigation.
npx skillsauth add x402agent/solana-clawd cuaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomous browser control powered by three cloud browser providers: Steel, Browser Use, and Browserbase. Each offers CDP-based headful browser sessions for Playwright/Puppeteer automation, LLM-driven visual browsing, and human-in-the-loop workflows.
# Environment setup
export STEEL_API_KEY="ste-..."
export BROWSERUSE_API_KEY="bu_..."
export BROWSERBASE_API_KEY="bb_live_..."
export BROWSERBASE_PROJECT_ID="446f4b4c-..."
# Daemon commands
/cua new # Create a Steel session (default provider)
/cua new --provider browserbase # Create a Browserbase session
/cua new --provider browseruse # Create a Browser Use session
/cua browse https://example.com # Navigate in active session
/cua screenshot # Capture current page
/cua status # Show session details + live view URL
/cua release # End session
/cua list # List all active sessions
┌──────────────────────────────────────────────────────────────────┐
│ SolanaOS CUA Skill │
│ │
│ ┌──────────────┐ ┌───────────────┐ ┌───────────────────────┐ │
│ │ Steel │ │ Browser Use │ │ Browserbase │ │
│ │ Provider │ │ Provider │ │ Provider │ │
│ │ │ │ │ │ │ │
│ │ • Sessions │ │ • CDP URL │ │ • Sessions │ │
│ │ • Proxy │ │ • Cloud │ │ • Extensions │ │
│ │ • CAPTCHA │ │ • Profiles │ │ • Stealth Mode │ │
│ │ • Live View │ │ • Proxy │ │ • Proxies │ │
│ │ • Recording │ │ │ │ • Contexts │ │
│ │ • Mobile │ │ │ │ • Functions │ │
│ │ • Regions │ │ │ │ • Live View │ │
│ │ • HLS Replay │ │ │ │ • Session Recording │ │
│ └──────┬───────┘ └──────┬────────┘ └──────────┬────────────┘ │
│ │ │ │ │
│ ┌──────┴──────────────────┴───────────────────────┴───────────┐ │
│ │ CDP (Chrome DevTools Protocol) │ │
│ │ WebSocket Connection │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────┴──────────────────────────────────┐ │
│ │ Playwright / Puppeteer / Selenium │ │
│ │ │ │
│ │ • Page navigation • Form automation │ │
│ │ • Element interaction • Screenshot capture │ │
│ │ • File download/upload • PDF generation │ │
│ │ • Network interception • Cookie management │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ LLM Vision Agent (Optional) │ │
│ │ │ │
│ │ Screenshot → Vision LLM → Action Plan → Playwright Exec │ │
│ │ │ │
│ │ Supported: Claude, GPT-4o, Gemini, Llama 3.2, Qwen 2.5 VL │ │
│ └──────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
| Feature | Steel | Browser Use | Browserbase | |---------|-------|-------------|-------------| | CDP WebSocket | ✅ | ✅ | ✅ | | Playwright | ✅ | ✅ | ✅ | | Puppeteer | ✅ | ✅ | ✅ | | Selenium | ✅ | ❌ | ✅ | | Live View | ✅ WebRTC | ❌ | ✅ iframe | | Session Recording | ✅ HLS/MP4 | ❌ | ✅ rrweb | | Proxy | ✅ Residential | ✅ | ✅ | | CAPTCHA Solving | ✅ Auto | ❌ | ✅ Auto | | Stealth Mode | ✅ Fingerprint | ❌ | ✅ Advanced | | Mobile Mode | ✅ | ❌ | ✅ Viewport | | Multi-Region | ✅ LAX/ORD/IAD | ❌ | ✅ Multiple | | Extensions | ❌ | ❌ | ✅ | | Contexts (persist state) | ❌ | ✅ Profiles | ✅ | | Keep Alive | ❌ | ❌ | ✅ | | Functions (serverless) | ❌ | ❌ | ✅ | | Max Timeout | 24h | N/A | Plan-based | | Human-in-the-loop | ✅ interactive | ❌ | ✅ interactive |
// One-line Playwright connection (simplest)
browser := chromium.ConnectOverCDP("wss://connect.steel.dev?apiKey=YOUR_KEY")
// Or with full options
session := steel.Create(ctx, &steel.CreateOptions{
UseProxy: true,
SolveCaptcha: true,
Timeout: 600000, // 10 minutes
Region: "lax",
DeviceConfig: &steel.DeviceConfig{Device: "mobile"},
})
CREATE → LIVE → [automation] → RELEASE
↓
FAILED (crash/timeout)
page.evaluate(() => undefined)Release() when done// Embed live view (read-only)
iframe := session.DebugURL + "?interactive=false"
// Embed with control (human-in-the-loop)
iframe := session.DebugURL + "?interactive=true&showControls=true"
// Get HLS recording playlist
playlist := steel.HLSPlaylist(session.ID)
// Use with hls.js or native Safari HLS
| Region | Code | Location |
|--------|------|----------|
| Los Angeles | lax | Los Angeles, USA |
| Chicago | ord | Chicago, USA |
| Washington DC | iad | Washington DC, USA |
// Via Browserbase API
session := browserbase.CreateSession(ctx, cfg, "default")
// Returns CDP URL for Playwright connection
browser := chromium.ConnectOverCDP(session.CDPURL)
context := browser.Contexts()[0]
page := context.Pages()[0]
Every Browserbase session is recorded and can be inspected:
// Create cloud browser via API
session := browserUseProvider.CreateSession(ctx, cfg, "default")
// Returns CDP URL
// Or via CLI
browser-use cloud connect
// List sessions
browser-use sessions
// Close specific session
browser-use --session mySession close
// Close all sessions
browser-use close --all
| Command | Description |
|---------|-------------|
| /cua new [--provider steel\|browserbase\|browseruse] | Create session |
| /cua browse <url> | Navigate to URL (via shell Playwright script) |
| /cua screenshot | Capture current page |
| /cua status | Show active session details |
| /cua release | End current session |
| /cua release --all | End all sessions |
| /cua list | List active sessions |
| /cua live | Get live view URL |
| /cua replay <session-id> | Get HLS replay URL |
| /cua agent <objective> | Start autonomous CUA agent |
The CUA agent combines cloud browser sessions with vision LLMs for autonomous web navigation.
1. CREATE SESSION → Steel/Browserbase/BrowserUse
2. SCREENSHOT → Capture page via CDP
3. VISION ANALYSIS → Send to Claude/GPT-4o/Gemini
4. ACTION PLANNING → LLM determines next step
5. PLAYWRIGHT EXEC → Execute action via CDP
6. REPEAT → Until objective complete or max steps
7. RELEASE → Clean up session
You are a Computer Use Agent controlling a cloud browser via Playwright.
OBJECTIVE: {objective}
Current page screenshot is attached. Analyze the page and determine the next action.
Available actions:
- page.goto(url) — Navigate to URL
- page.click(selector) — Click element
- page.fill(selector, text) — Fill input
- page.press(selector, key) — Press key
- page.screenshot() — Capture page
- page.waitForSelector(selector) — Wait for element
- page.evaluate(js) — Run JavaScript
- DONE — Task is complete
Respond with JSON: {"action": "...", "args": {...}, "reasoning": "..."}
| Variable | Provider | Required | Description |
|----------|----------|----------|-------------|
| STEEL_API_KEY | Steel | For Steel | Steel API key |
| BROWSERUSE_API_KEY | Browser Use | For BU | Browser Use cloud API key |
| BROWSERBASE_API_KEY | Browserbase | For BB | Browserbase API key |
| BROWSERBASE_PROJECT_ID | Browserbase | For BB | Browserbase project ID |
| BROWSERBASE_PROXIES | Browserbase | No | Enable proxy (default: true) |
| BROWSERBASE_ADVANCED_STEALTH | Browserbase | No | Advanced stealth mode |
| BROWSERBASE_KEEP_ALIVE | Browserbase | No | Keep alive after disconnect |
| BROWSERBASE_SESSION_TIMEOUT | Browserbase | No | Session timeout (ms) |
| ANTHROPIC_API_KEY | Vision LLM | No | For Claude vision agent |
| OPENAI_API_KEY | Vision LLM | No | For GPT-4o vision agent |
POST /v1/sessions Create session
GET /v1/sessions List sessions
GET /v1/sessions/:id Get session details
POST /v1/sessions/:id/release Release session
POST /v1/sessions/release Release all sessions
GET /v1/sessions/:id/screenshot Capture screenshot
GET /v1/sessions/:id/events Get replay events
GET /v1/sessions/:id/hls Get HLS playlist
WSS connect.steel.dev?apiKey=...&sessionId=... CDP WebSocket
POST /v1/sessions Create session
GET /v1/sessions/:id Get session details
POST /v1/sessions/:id Close session (REQUEST_RELEASE)
GET /v1/sessions/:id/debug Get debug URLs
POST /api/v2/browsers Create cloud browser
PATCH /api/v2/browsers/:id Stop browser (action: "stop")
import { chromium } from "playwright-core";
const browser = await chromium.connectOverCDP(
`wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}`
);
import Steel from "steel-sdk";
import { chromium } from "playwright";
const client = new Steel({ steelAPIKey: process.env.STEEL_API_KEY });
const session = await client.sessions.create({
useProxy: true,
solveCaptcha: true,
});
const browser = await chromium.connectOverCDP(
`wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`
);
const page = browser.contexts()[0].pages()[0];
await page.goto("https://example.com");
await browser.close();
await client.sessions.release(session.id);
import { chromium } from "playwright-core";
import { Browserbase } from "@browserbasehq/sdk";
const bb = new Browserbase({ apiKey: process.env.BROWSERBASE_API_KEY });
const session = await bb.sessions.create();
const browser = await chromium.connectOverCDP(session.connectUrl);
const page = browser.contexts()[0].pages()[0];
await page.goto("https://example.com");
import { chromium } from "playwright-core";
// After creating a session via API, connect with CDP URL
const browser = await chromium.connectOverCDP(session.cdpUrl);
development
Formally verify programs by writing Lean 4 proofs. Trigger this skill whenever the user wants to formally verify code, generate Lean 4 proofs, prove properties about algorithms or smart contracts, verify invariants, convert program logic into formal specifications, or anything involving Lean 4 and formal verification. Also trigger when the user mentions "qedgen", "lean proof", "formal proof", "verify my code", "prove correctness", "formal verification", or wants mathematical guarantees about their implementation.
data-ai
Orchestrate multi-bot trading swarms on Pump.fun with persona-driven agents
tools
End-to-end Solana development playbook (Jan 2026). Prefer Solana Foundation framework-kit (@solana/client + @solana/react-hooks) for React/Next.js UI. Prefer @solana/kit for all new client/RPC/transaction code. When legacy dependencies require web3.js, isolate it behind @solana/web3-compat (or @solana/web3.js as a true legacy fallback). Covers wallet-standard-first connection (incl. ConnectorKit), Anchor/Pinocchio programs, Codama-based client generation, LiteSVM/Mollusk/Surfpool testing, and security checklists.
tools
Buy and sell tokens on Pump.fun bonding curves and AMM pools