mobile_use_skill/SKILL.md
This skill should be used when the user asks to control an Android phone, tap the screen, take a phone screenshot, automate Android via ADB, type into a mobile app, swipe on screen, navigate back/home, or interact with UI elements on a connected device.
npx skillsauth add am009/mobile-use-skill mobile-useInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Control Android devices through ADB with screenshot-based interaction.
Capture a screenshot, then let the grounding workflow interpret the image and execute the action directly.
Write the operation description as clearly and specifically as possible so the model can distinguish the target from nearby controls and avoid accidental taps on the wrong UI element.
from mobile_use import get_screenshot, interact_with_screen
get_screenshot("/tmp/screen.png")
result = interact_with_screen("/tmp/screen.png", "点击微信")
This is the recommended way to click, long-press, or swipe based on a screenshot. Prefer natural-language targets over hard-coded coordinates.
Typical tuned call:
from mobile_use import get_screenshot, interact_with_screen
get_screenshot("/tmp/screen.png")
result = interact_with_screen(
"/tmp/screen.png",
"点击底部中间的登录按钮",
reasoning_effort="low",
max_rounds=3,
)
result includes the grounding decision and an execution field. When the action is accepted, execution.performed is True and execution.controller_result contains the ADB-layer result.
get_screenshot()Capture a device screenshot.
get_screenshot(save_path: str = None) -> str
This is usually the first step before calling interact_with_screen(...).
interact_with_screen()Interpret a screenshot plus a natural-language instruction, then execute the grounded action on the device.
interact_with_screen(
image: str,
instruction: str,
*,
config: GroundingConfig | None = None,
model: str | None = None,
reasoning_effort: str | None = None,
max_rounds: int | None = None,
out: str | None = None,
workdir: str | None = None,
timeout_sec: int | None = None,
) -> dict
Notes:
点击底部中间的“继续”按钮.back()Press the back button.
back() -> str
home()Press the Home key.
home() -> str
enter()Press the Enter key.
enter() -> str
keyevent()Send any Android keycode.
keyevent(code: str) -> str
Common key codes:
KEYCODE_BACKKEYCODE_HOMEKEYCODE_MENUKEYCODE_ENTERKEYCODE_VOLUME_UPKEYCODE_VOLUME_DOWNKEYCODE_POWERKEYCODE_CAMERAKEYCODE_SEARCHKEYCODE_DPAD_UPKEYCODE_DPAD_DOWNKEYCODE_DPAD_LEFTKEYCODE_DPAD_RIGHTKEYCODE_DPAD_CENTERKEYCODE_TABKEYCODE_SPACEKEYCODE_DELKEYCODE_ESCAPEtext()Type text into the currently focused input field.
text(input_str: str) -> str
get_device_size()Get screen dimensions.
get_device_size() -> Tuple[int, int]
Returns (width, height) in pixels.
ANDROID_SERIAL: target device serial number. Defaults to the first connected device.Example:
export ANDROID_SERIAL=emulator-5554
Requirements:
adb installed and available in PATHopencv-python>=4.5.0, pyshine>=0.0.6Install dependencies:
pip install opencv-python pyshine
Grounding or controller execution may raise RuntimeError when:
ANDROID_SERIAL is not foundExample:
try:
result = interact_with_screen("/tmp/screen.png", "点击微信")
except RuntimeError as e:
print(f"ADB error: {e}")
development
Write python airtest scripts to automate tasks on mobile phone. Use when the user need to update his phone-control scripts.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.