4ier/claw-use-android/SKILL.md
# Claw Use Android — Phone Control for AI Agents Give your AI agent eyes, hands, and a voice on a real Android phone. `claw-use-android` is an Android app + CLI (`cua`) that exposes HTTP endpoints for full phone control. No ADB, no root, no PC. ## Setup ```bash # Install the APK on your Android phone, enable Accessibility Service # Then register the device: cua add redmi 192.168.0.105 <token> cua ping ``` ## New in v2.0.0: Unified API Three new endpoints replace the scattered old endpoints
npx skillsauth add openclaw/skills 4ier/claw-use-androidInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Give your AI agent eyes, hands, and a voice on a real Android phone.
claw-use-android is an Android app + CLI (cua) that exposes HTTP endpoints for full phone control. No ADB, no root, no PC.
# Install the APK on your Android phone, enable Accessibility Service
# Then register the device:
cua add redmi 192.168.0.105 <token>
cua ping
Three new endpoints replace the scattered old endpoints for AI agent workflows:
Returns elements with stable integer ref IDs, semantic zone, and role annotations.
cua screen # full semantic UI tree (JSON)
cua screen -c # compact: only interactive/text elements
Response:
{
"package": "com.android.settings",
"elements": [
{"ref": 1, "text": "设置", "zone": "header"},
{"ref": 2, "text": "搜索", "zone": "header", "role": "button", "click": true},
{"ref": 3, "text": "WLAN", "zone": "content"}
]
}
Returns a base64-encoded JPEG screenshot.
cua snapshot # save screenshot, print path
cua snapshot 50 720 out.jpg # quality, maxWidth, output
All operations through a single entry point, using ref IDs from /screen.
cua act '{"click": 3}' # click ref 3
cua act '{"click": "OK"}' # click by text (fallback)
cua act '{"click": [1, 2, 3]}' # click refs in sequence
cua act '{"tap": {"x": 540, "y": 960}}'
cua act '{"type": "hello"}' # type into focused field
cua act '{"type": {"ref": 3, "text": "hello"}}' # focus ref then type
cua act '{"swipe": "up"}' # directional swipe
cua act '{"scroll": "down"}' # scroll nearest scrollable
cua act '{"back": true}'
cua act '{"home": true}'
cua act '{"recents": true}'
cua act '{"longpress": 3}' # long press ref
cua act '{"launch": "com.duolingo"}'
# Multiple actions in one request:
cua act '{"home": true, "back": true}'
# 1. Observe
cua screen -c # get refs
# 2. Act
cua act '{"click": 5}' # click ref 5
# 3. Observe again
cua screen -c # see result
执行手机操作前,先读 flows.md(与本文件同目录)。
/flow 或批量脚本执行,跳过逐步推理{"screen":true} 断点 → 在该步读屏后由 agent 决策,然后继续flows.md主动沉淀(必须执行): 完成任何多步操作后,立即审视刚才的步骤序列。如果发现可复用的模式(哪怕只是部分步骤),当场追加到 flows.md。不要等用户提醒。沉淀是 agent 的责任,不是用户的。
这样做的好处:
/flow 在设备端 100ms 轮询执行,不经过 LLMcua)All legacy endpoints remain supported alongside the new unified API.
cua add <name> <ip> <token> # register device with alias
cua devices # list all (with live status)
cua use <name> # switch default device
cua rm <name> # remove device
cua -d <name> <command> # target specific device
cua discover # scan LAN for devices (192.168.x.x:7333)
cua screen # full UI tree (JSON)
cua screen -c # compact: only interactive/text elements
cua screenshot # save screenshot, print path
cua screenshot 50 720 out.jpg # quality, maxWidth, output
cua notifications # list all notifications
cua status # health dashboard
cua info # device model, screen size, permissions
cua tap <x> <y> # tap coordinates
cua click <text> # tap element by visible text
cua longpress <x> <y> # long press
cua swipe up|down|left|right
cua scroll up|down|left|right
cua type "text" # type text (CJK supported)
cua back # system back
cua home # go home
cua launch <package> # launch app
cua launch # list all apps
cua open <url> # open URL
cua call <number> # phone call
cua intent '<json>' # fire Android Intent
cua tts "hello" # speak through phone speaker
cua say "你好" # alias
cua clipboard # read clipboard
cua clipboard "text" # write to clipboard
cua camera [front|back] [quality] [output.jpg] # take photo
cua volume # read all volumes
cua volume media 10 # set media volume
cua volume media up # adjust volume
cua battery # battery status
cua wifi # WiFi info
cua location # GPS/network location
cua vibrate [ms] # vibrate (default 200ms)
cua contacts [search] # list/search contacts
cua sms list [limit] # read SMS
cua sms send <number> <message> # send SMS
cua file list [path] # list directory
cua file read <path> # read file
cua file write <path> <content> # write file
cua file delete <path> # delete file
cua wake # wake screen
cua lock / cua unlock # lock/unlock (PIN required)
cua config pin 123456 # remember lock screen PIN for auto-unlock
cua config pattern 256398 # EXPERIMENTAL: pattern unlock (not yet verified)
cua flow '{
"steps": [
{"wait": "继续安装", "then": "tap", "timeout": 10000},
{"wait": "继续更新", "then": "tap", "timeout": 10000},
{"wait": "完成", "then": "tap", "timeout": 60000, "optional": true}
]
}'
Flow runs entirely on the phone with zero LLM calls. The device polls its accessibility tree at 100ms intervals and reacts instantly when the target element appears.
Step fields:
wait — text to find (case-insensitive partial match)waitId — resource ID to findwaitDesc — content description to findwaitGone — wait for text to DISAPPEARthen — action: tap, click, longpress, back, home, nonetimeout — per-step timeout in ms (default 10000)optional — if true, timeout doesn't fail the flowpauseMs — pause after action before next step (default 500)# Atomic find-and-tap: retries until element appears
curl -X POST /click -d '{"text":"继续安装","retry":3,"retryMs":2000}'
Complete recipe for adding a new Android device from zero to fully operational.
# Scan LAN for devices
cua discover
# Register with a friendly name
cua add <name> <ip> <token>
# Verify connectivity
cua -d <name> ping
cua -d <name> info
# PIN unlock (recommended — proven reliable via a11y button tapping)
cua -d <name> config pin <PIN>
# Verify: lock then unlock
cua -d <name> lock
sleep 3
cua -d <name> unlock
# Should show {"unlocked":true}
Important: Only PIN unlock is verified to work. Pattern unlock is experimental and unreliable — the accessibility gesture dispatch doesn't consistently hit the correct grid coordinates across different devices and screen sizes. If the device uses pattern lock, change it to PIN.
cua -d <name> setup-perms
This automates granting all 9 app permissions on MIUI devices: 位置, 相机, 麦克风, 照片和视频, 音乐和音频, 短信, 电话, 联系人, 日历
The command navigates through Settings → Apps → Claw Use → Permissions and clicks through each permission grant dialog.
If setup-perms fails (common on tablets with dual-pane layout), grant manually:
These settings prevent MIUI from killing the service:
# Navigate to app settings
cua -d <name> intent '{"action":"android.settings.APPLICATION_DETAILS_SETTINGS","uri":"package:com.clawuse.android"}'
Then via a11y or manually ensure:
cua -d <name> status # check a11y health, uptime, request count
cua -d <name> screen -c # verify a11y tree works
cua -d <name> screenshot 50 720 /tmp/verify.jpg # verify screenshot
# Test auto-unlock end-to-end
cua -d <name> lock
sleep 3
cua -d <name> screen -c # should auto-unlock then return tree
MIUI Tablets (Xiaomi Pad 5, etc.):
APPLICATION_DETAILS_SETTINGS intent opens app LIST, not specific appsetup-perms may need manual fallback for tablet layoutMIUI Phones (Redmi K60 Ultra, etc.):
General Android:
takeScreenshot() returns black image on lock screen (Android security)flagRetrieveInteractiveWindows (added in v1.6.2)Update a device to a new APK version without ADB:
# Serve APK on LAN (from the machine with the APK)
cd /path/to/apk && python3 -m http.server 9090 &
# On the device, open browser to download
cua -d <name> intent '{"action":"android.intent.action.VIEW","uri":"http://<lan-ip>:9090/app.apk"}'
# Or via browser navigation for MIUI browser:
cua -d <name> click "浏览器"
cua -d <name> click "搜索或输入网址"
cua -d <name> type "http://<lan-ip>:9090/app.apk"
# ... then handle download + install prompts
# MIUI install flow (after APK opens in installer)
cua -d <name> flow '{
"steps": [
{"wait": "继续安装", "then": "tap", "timeout": 15000},
{"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
{"wait": "继续更新", "then": "tap", "timeout": 15000}
]
}'
# Verify new version after service restart (~30s)
sleep 30
cua -d <name> ping
UpdateReceiver: The app listens for MY_PACKAGE_REPLACED broadcast and auto-restarts the service after update. No manual intervention needed after install completes.
cua act '{"launch": "org.telegram.messenger"}'
cua screen -c
cua act '{"click": "Search Chats"}'
cua act '{"type": "John"}'
cua act '{"click": "John"}'
cua launch org.telegram.messenger
cua screen -c
cua click "Search Chats"
cua type "John"
cua click "John"
cua screen -c # what elements exist (structured, with refs)
cua snapshot 50 720 /tmp/look.jpg # what it looks like (visual)
Prefer screen -c over snapshot for decision-making. Structured a11y data is faster to process, has exact coordinates, and provides ref IDs for /act. Use snapshot only when visual context matters (images, colors, layout).
Automatic — any command auto-unlocks if PIN is configured. No special handling needed.
cua flow '{
"steps": [
{"wait": "继续安装", "then": "tap", "timeout": 15000},
{"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
{"wait": "继续更新", "then": "tap", "timeout": 10000}
]
}'
cua add phone1 192.168.0.101 <token>
cua add tablet 192.168.0.102 <token>
cua -d phone1 say "hello from phone 1"
cua -d tablet screenshot
click by text instead of tap by coordinates whenever text is visiblescreen -c as the primary perception tool — compact filters noise/flow for multi-step mechanical sequences — saves tokens, 100x faster than LLM-per-stepintent deep links for app navigation (e.g., https://t.me/c/{id}/{topic}/{msg})screenshot?maxWidth=720 is scaled, screen bounds are actual pixelstap when click can work — text-based is resolution-independent┌─────────────────────────────────────────────┐
│ Android Device │
│ │
│ :http process main process │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ BridgeService│ HTTP │ AccessibilityBridge│ │
│ │ NanoHTTPD │─────→│ A11yInternalServer│ │
│ │ 0.0.0.0:7333│proxy │ 127.0.0.1:7334 │ │
│ └──────────────┘ └──────────────────┘ │
│ ↑ auth+CORS ↑ a11y service │
│ ↑ auto-unlock ↑ gesture dispatch │
│ ↑ config/status ↑ tree traversal │
└────────────────────────────────────────────── ┘
↑ HTTP
┌────────────┐
│ Agent/CLI │ cua commands / curl
└────────────┘
| Platform | Package | CLI | Status |
|----------|---------|-----|--------|
| Android | claw-use-android | cua | ✅ Available |
| iOS | claw-use-ios | cui | 🔮 Planned |
| Windows | claw-use-windows | cuw | 🔮 Planned |
| Linux | claw-use-linux | cul | 🔮 Planned |
| macOS | claw-use-mac | cum | 🔮 Planned |
tools
Use when the user wants to connect to, test, or use the McDonalds service at mcp.mcd.cn, including checking authentication, probing MCP endpoints, listing tools, or calling McDonalds MCP tools through a reusable local CLI.
development
Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API
development
SlowMist AI Agent Security Review — comprehensive security framework for skills, repositories, URLs, on-chain addresses, and products (Claude Code version)
data-ai
去除中文文本中的 AI 写作痕迹,使其读起来自然。基于维基百科 AI 写作特征指南,检测 24 种 AI 模式。触发词:humanizer-cn、去除 AI 痕迹、去除 AI 写作痕迹、中文文本人性化。