.agents/skills/phoneagent/SKILL.md
Control a connected iPhone, iOS simulator, Android emulator, or Android device from macOS through PhoneAgent's JSON-RPC bridge. Use when users ask to automate mobile UI actions, inspect accessibility trees, toggle Settings switches, navigate apps, or capture screenshots by sending RPC methods like get_tree, get_screen_image, get_context, tap_element, enter_text, scroll, swipe, and open_app.
npx skillsauth add rounak/phoneagent phoneagentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this workflow to drive iOS or Android UI through PhoneAgent's JSON-RPC bridge.
All shell commands below assume you are in the repo root:
cd "$(git rev-parse --show-toplevel)"
127.0.0.1:45678 by default).# iOS (XCTest-hosted bridge)
./.agents/skills/phoneagent/scripts/start_rpc_bridge_local.sh
# Android (adb bridge; emulator or physical device)
./.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh
Notes:
start_rpc_bridge_local.sh is interactive and will show a numbered list of iOS devices/simulators.
Enter the number for the destination you want.start_rpc_bridge_local.sh starts a localhost-only forwarder.pymobiledevice3 into a local venv:
python3 -m venv .venv && ./.venv/bin/python -m pip install -U pip && ./.venv/bin/python -m pip install pymobiledevice3start_android_rpc_bridge_local.sh uses adb; if multiple devices are connected it prompts for the serial.PHONEAGENT_RPC_READY ... in logs before sending RPC calls../.agents/skills/phoneagent/scripts/rpc.py get-tree >/dev/null && echo rpc-ready
127.0.0.1:45678 as the RPC endpoint (or rpc.py --port <port> if customized).Notes:
xcrun devicectl list devices, then run:
python3 ./.agents/skills/phoneagent/scripts/forward_rpc_localhost.py --udid <UDID> (binds 127.0.0.1:45678)Use the helper CLI:
# iOS bundle identifier
./.agents/skills/phoneagent/scripts/rpc.py open-app com.apple.Preferences
# Android package name
./.agents/skills/phoneagent/scripts/rpc.py open-app com.android.settings
./.agents/skills/phoneagent/scripts/rpc.py get-tree | head
# Use coordinates copied from the tree (XCUI frame string).
./.agents/skills/phoneagent/scripts/rpc.py enter-text \
--coordinate '{{33.0, 861.0}, {364.0, 38.0}}' \
--text 'Display'
./.agents/skills/phoneagent/scripts/rpc.py tap-element \
--coordinate '{{37.7, 969.7}, {199.7, 29.0}}'
get_tree.tap_element / enter_text).tree from the action response to verify the UI changed as expected.get_context and write result.screenshot_base64 to a PNG (or use ./.agents/skills/phoneagent/scripts/rpc.py get-screen-image, which writes PNG files to /tmp/phoneagent-artifacts).Use swipe to reveal off-screen content, then use the returned tree (or call get_tree if needed).
Use one request at a time per server. Do not fire concurrent batches.
Split long keyboard input into chunks; do not send giant enter_text payloads in one call.
All RPC requests are newline-delimited JSON objects with this shape:
{"id":1,"method":"<method>","params":{...}}
All success responses look like:
{"id":1,"result":{...}}
get_tree{"tree": "<string>"}Example:
{"id":1,"method":"get_tree","params":{}}
get_screen_image{"screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}Example:
{"id":2,"method":"get_screen_image","params":{}}
get_context{"tree":"<string>","screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}Example:
{"id":3,"method":"get_context","params":{}}
open_appbundle_identifier (string, required).
com.apple.Preferences).com.android.settings).{"bundle_identifier":"<string>", "tree":"<string>"} (Android also includes package_name).Example:
{"id":4,"method":"open_app","params":{"bundle_identifier":"com.apple.Preferences"}}
tapx (number, required), y (number, required). Coordinates are in absolute screen points as reported by the tree.{"tree":"<string>"}Example:
{"id":5,"method":"tap","params":{"x":120,"y":300}}
tap_elementcoordinate (string, required). Must look like {{x, y}, {w, h}} (copied from the tree).count (integer, optional; default 1). Use 2 for double-tap.longPress (boolean, optional; default false). When true, performs a long-press gesture.{"coordinate":"<string>", "count":<number>, "longPress":<bool>, "tree":"<string>"}Example:
{"id":6,"method":"tap_element","params":{"coordinate":"{{20.0, 165.0}, {390.0, 90.0}}","count":1,"longPress":false}}
enter_textcoordinate (string, required). Must look like {{x, y}, {w, h}} (copied from the tree).text (string, required).{"coordinate":"<string>", "tree":"<string>"}Example:
{"id":7,"method":"enter_text","params":{"coordinate":"{{33.0, 861.0}, {364.0, 38.0}}","text":"hello"}}
scrollx (number, required), y (number, required), distanceX (number, required), distanceY (number, required).{"tree":"<string>"}Example:
{"id":8,"method":"scroll","params":{"x":215,"y":760,"distanceX":0,"distanceY":-460}}
swipex (number, required), y (number, required), direction (string, required; one of up, down, left, right).{"tree":"<string>"}Example:
{"id":9,"method":"swipe","params":{"x":215,"y":760,"direction":"up"}}
stopxcodebuild test session).{}Example:
{"id":10,"method":"stop","params":{}}
com.apple.Preferencescom.apple.cameracom.apple.mobileslideshowcom.apple.MobileSMScom.apple.springboardcom.android.settingscom.android.camera2com.google.android.apps.photoscom.google.android.apps.messagingopen_app, restart the test-hosted server and retry with a known-good bundle id.get_tree again and recalculate target.xcodebuild test and resume from latest verified app state.adb (adb kill-server && adb start-server), relaunch the bridge, and retry.stop only when the task is complete.stop is not sent, terminate the xcodebuild session manually.development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.