The Android implementation of Claw Use — a protocol for AI agents to control real devices.
One app. Three core endpoints. Full phone control over HTTP. No ADB. No root. No PC.
# 1. See the screen — semantic UI tree with refs
curl http://phone:7333/screen -H "X-Bridge-Token: $TOKEN"
# → {"package":"com.whatsapp","elements":[{"ref":1,"text":"Search","role":"button","click":true}, ...]}
# 2. Act on what you see
curl -X POST http://phone:7333/act -H "X-Bridge-Token: $TOKEN" \
-d '{"click": 1}' # click ref 1
# → {"ref":1,"ok":true,"x":610,"y":280,"text":"Search"}
# 3. Observe the result
curl http://phone:7333/screen -H "X-Bridge-Token: $TOKEN"
# → new UI tree with new refsThat's the core loop: screen → act → screen. No coordinate guessing, no pixel parsing.
Agents get faster over time. The pattern:
1. Check flows.md — a library of learned UI sequences
2. Match found? → Run via /flow (device-side, 100ms polling, zero LLM cost)
3. Flow fails or no match? → Fall back to screen → act loop
4. Task done? → Save the new sequence to flows.md for next time
Example: installing an APK on MIUI takes 5+ dialogs. First time, the agent navigates each one via screen→act (slow, ~40s). After that, it's a single /flow call:
curl -X POST http://phone:7333/flow -H "X-Bridge-Token: $TOKEN" \
-d '{"steps":[
{"wait":"继续安装","then":"tap","timeout":15000},
{"wait":"已了解此应用未经安全检测","then":"tap","timeout":10000,"optional":true},
{"wait":"继续更新","then":"tap","timeout":15000}
]}'
# Runs entirely on-device. No LLM calls. Completes in seconds./flow executes on the phone itself — polling the accessibility tree at 100ms intervals and reacting instantly when target elements appear. The agent skill includes a flows.md file that accumulates these patterns over time.
Three unified endpoints replace the old scattered API for agent workflows:
| Endpoint | Method | Purpose |
|---|---|---|
/screen |
GET | Semantic UI tree with stable ref IDs, zone, role |
/snapshot |
GET | JPEG screenshot (base64) |
/act |
POST | Unified action: click ref/text, tap, type, swipe, scroll, back, home, launch |
All legacy endpoints (/click, /tap, /swipe, /type, /scroll, /global, /screenshot, etc.) remain supported.
# Click by ref (preferred — fast, precise)
{"click": 3}
# Click by text (fallback — searches UI tree)
{"click": "Send"}
# Click multiple refs in sequence
{"click": [1, 2, 3]}
# Tap coordinates
{"tap": {"x": 540, "y": 960}}
# Type text (into focused field, or focus ref first)
{"type": "Hello world"}
{"type": {"ref": 5, "text": "Hello world"}}
# Swipe / scroll
{"swipe": "up"}
{"scroll": "down"}
# Navigation
{"back": true}
{"home": true}
{"recents": true}
# Launch app
{"launch": "com.whatsapp"}
# Long press
{"longpress": 3}
# Multiple actions in one request
{"home": true, "back": true}{
"package": "com.android.settings",
"elements": [
{"ref": 1, "text": "Settings", "zone": "header"},
{"ref": 2, "text": "Search", "zone": "header", "role": "button", "click": true},
{"ref": 3, "text": "WLAN", "zone": "content"},
{"ref": 4, "text": "Bluetooth", "zone": "content"}
]
}Query params:
compact=true— only interactive/text elementstimeout=5000— max ms to wait for accessibility tree
- AI agent with a real phone: Your agent can send messages, check apps, take screenshots, and speak — on a real device with real accounts
- Revive broken phones: USB port dead? Screen cracked? If WiFi works, Claw Use gives the phone a second life
- Remote phone access: Add Tailscale and control your phone from anywhere in the world
- Spare phone automation: Turn that old phone in your drawer into a dedicated AI worker
- Testing & QA: Automate real-device testing without emulators
Every phone control solution requires a PC running ADB. This one doesn't.
Install the app → enable Accessibility Service → your phone is now an HTTP-controlled device. Connect from anywhere on the same network. Add Tailscale and control it from anywhere in the world.
Built for AI agents that need a real phone — not an emulator, not a cloud device, your actual phone with your actual apps, accounts, and data.
| Endpoint | Method | What it does |
|---|---|---|
/screen |
GET | Semantic UI tree — elements with ref IDs, zone, role (v2.0) |
/snapshot |
GET | JPEG screenshot as base64 (v2.0) |
/screenshot |
GET | Screenshot (legacy, same format) |
/notifications |
GET | All notifications with title, text, actions |
/info |
GET | Device model, OS, screen size, permissions |
/status |
GET | Full health dashboard (uptime, request count, a11y latency) |
| Endpoint | Method | What it does |
|---|---|---|
/act |
POST | Unified action — click/tap/type/swipe/scroll/nav/launch (v2.0) |
/tap |
POST | Tap at coordinates (legacy) |
/click |
POST | Tap by text/desc/id (legacy) |
/longpress |
POST | Long press (legacy) |
/swipe |
POST | Swipe direction (legacy) |
/scroll |
POST | Scroll direction (legacy) |
/type |
POST | Type text (legacy) |
/global |
POST | Back, home, recents, notifications, power dialog |
/launch |
GET/POST | List installed apps / launch by package name |
/intent |
POST | Fire any Android Intent |
| Endpoint | Method | What it does |
|---|---|---|
/tts |
POST | Speak text through the phone speaker |
/tts/voices |
GET | List available TTS voices |
/audio/record |
POST | Record audio from microphone |
| Endpoint | Method | What it does |
|---|---|---|
/clipboard |
GET/POST | Read or write clipboard text |
/camera |
POST | Capture photo (front/back) |
/volume |
GET/POST | Read/set volume |
/battery |
GET | Battery level, charging, temperature |
/wifi |
GET | WiFi info (SSID, IP, signal) |
/location |
GET | GPS/network location |
/vibrate |
POST | Vibrate (one-shot or pattern) |
/contacts |
GET | Search and list contacts |
/sms |
GET/POST | Read/send SMS |
/file |
GET/POST/DELETE | File operations |
| Endpoint | Method | What it does |
|---|---|---|
/batch |
POST | Multiple operations in one request |
/flow |
POST | Multi-step automation with conditions |
| Endpoint | Method | What it does |
|---|---|---|
/screen/wake |
POST | Wake the screen |
/screen/lock |
POST | Lock the device |
/screen/unlock |
POST | Unlock with PIN |
/config |
GET/POST/DELETE | Configure PIN for remote unlock |
/ping |
GET | Health check (no auth required) |
All endpoints (except /ping) require a token:
X-Bridge-Token: <your-token>
Token is generated on first launch and shown in the setup screen + notification bar.
# Device management
cua add redmi 192.168.0.105 <token>
cua devices
cua discover # scan local network
# New unified commands (v2.0)
cua screen # semantic UI tree
cua screen -c # compact (text/interactive only)
cua snapshot # save screenshot
cua act '{"click": 3}' # click ref 3
cua act '{"type": "hello"}' # type text
cua act '{"swipe": "up"}' # swipe
# Legacy commands (still supported)
cua tap 500 1000
cua click "Send"
cua swipe up
cua type "Hello"
cua screenshot
# Full setup
cua onboard # discover → register → PIN → perms → verify
cua setup-perms # grant MIUI permissionsDownload the APK from Releases and install it.
Settings → Accessibility → Claw Use → Enable
Settings → Notifications → Notification Access → Claw Use
curl http://<phone-ip>:7333/ping
# → {"status":"ok","service":"claw-use-android","version":"2.0.0"}curl -X POST http://<phone-ip>:7333/config \
-H "X-Bridge-Token: <token>" \
-d '{"pin":"your-existing-pin"}'Install Tailscale on the phone. Your phone gets a stable 100.x.x.x address accessible from anywhere.
curl http://100.x.x.x:7333/screen -H "X-Bridge-Token: <token>"No port forwarding. No dynamic DNS. Just works.
The app includes an UpdateReceiver that listens for MY_PACKAGE_REPLACED. After installing a new version, the BridgeService automatically restarts — no manual app launch needed.
This enables fully autonomous OTA updates: an AI agent can build a new APK, send it to the phone, navigate to download it, tap through the installer, and regain control after the update completes. Zero human intervention.
Xiaomi's aggressive battery optimization will kill background services. To keep Claw Use alive:
- Battery saver: Set to "No restrictions" for Claw Use
- Autostart: Enable in Security → Autostart
- Lock in recents: Open Claw Use → long press in recent apps → tap the lock icon
- Battery optimization: The app auto-requests exemption on launch
┌──────────────────────────────────────┐
│ BridgeService │
│ 0.0.0.0:7333 │
│ │
│ Auth · CORS · Auto-unlock · Routing │
│ │
│ ┌────────────┐ ┌───────────────┐ │
│ │ScreenHandler│ │ ActHandler │ │
│ │ /screen │ │ /act │ │
│ │ /snapshot │ │ unified ops │ │
│ └─────┬──────┘ └──────┬────────┘ │
│ │ │ │
│ ┌─────▼────────────────▼────────┐ │
│ │ AccessibilityBridge │ │
│ │ UI tree · Gestures · TTS │ │
│ └───────────────────────────────┘ │
│ WakeLock + WifiLock + Foreground │
└──────────────────────────────────────┘
- Android 7.0+ (API 24) for core features
- Android 11+ (API 30) for
/snapshotand/screenshot - No root required
- No ADB required
- No PC required
git clone https://github.com/4ier/claw-use-android.git
cd claw-use-android
./gradlew assembleDebug
# APK at app/build/outputs/apk/debug/app-debug.apkClaw Use Android is the first implementation of the Claw Use protocol — a standard HTTP API for AI agents to control physical devices.
The protocol defines a common set of endpoints that any device can implement. The same cua CLI and agent skills work across all compliant devices:
cua add redmi 192.168.0.105 <token> # Android phone
cua add ipad 100.80.1.10 <token> # future: iOS
cua add laptop 100.80.1.20 <token> # future: desktop
cua -d redmi screen # same command, any deviceMIT
Built for agents that need a real phone.