Allowing claws to make better use of any application.
claw-use is a persistent daemon that lets agents interact with any running macOS application — browsers, native apps, even locked screens. One CLI (cua), every app, always on.
We benchmarked cua against the tools agents use today — browser automation (Playwright/CDP snapshots), AppleScript via exec, and HTTP fetch. Same page, same tasks, real measurements.
| Task | Today | cua | Improvement |
|---|---|---|---|
| List running apps | 190ms (AppleScript) | 27ms | 7x faster |
| Web page snapshot | ~1,200ms (browser tool) | 235ms | 5x faster |
| Click an element | snapshot + act (2 calls) | 968ms (1 call via pipe) |
Single round-trip |
| Screenshot | ~500ms (browser) | 109ms | 4.5x faster |
| Text extraction | 300ms (web fetch) | 251ms | Comparable |
Same page (IANA Example Domains), same information:
| Format | Bytes | Relative |
|---|---|---|
| Browser tool (ARIA tree) | 3,318 | 5.7x more |
| cua JSON | 2,526 | 4.4x more |
| cua compact | 577 | 1x |
The compact snapshot contains everything an agent needs — links with URLs, headings, page type, word count. The browser tool returns the entire DOM tree including footer cells, ARIA landmarks, and nested role annotations.
At 10 snapshots per task, that's ~27,000 bytes saved per task — real money at LLM token prices.
"Read a page and click something"
- Today:
snapshot→ parse →act click= 3 tool calls, ~4,000+ bytes - cua:
pipe safari click --match "Sign in"= 1 tool call, ~50 bytes back
"Fill a login form"
- Today:
snapshot→fill email→fill password→click submit= 4 calls - cua: 3×
pipecommands = 3 calls, each self-contained (no snapshot step needed)
Fewer tool calls = fewer LLM round-trips = faster completion = lower cost.
| Capability | Current Tools | cua |
|---|---|---|
| 🔒 Locked screen | Dead — browser tools need a display | Safari transport works via AppleScript |
| 📱 Native apps (Notes, Calendar, Numbers) | No tool covers this | Full AX UI tree with refs and actions |
| 🔄 Transport failure | Retry the same broken path | Auto-fallback: AX → Safari → CDP → AppleScript |
| ⚡ One-shot interactions | Snapshot, parse, then act (3 steps) | pipe = snapshot + match + act in 1 call |
# What's running?
cua list
# → Safari, Obsidian, Notes, Calendar, Numbers...
# What's on screen?
cua snapshot "Safari"
# → Enriched UI: 31 elements, buttons, tabs, text fields with refs (e1, e2...)
cua pipe safari click --match "Sign in"
# ✅ clicked "Sign in" (score: 100)
# All of this works with the screen locked 🔒Screenshot any app and feed it to a vision model. "What does the screen look like right now?"
cua screenshot "Xcode"
# → /tmp/cua-screenshot-xcode-1234567.png (feed to GPT-4V, Claude, etc.)Default output is compact — optimized for agent token budgets. Use --format json when you need structured data, --pretty for human-readable JSON.
cua runs a persistent daemon (cuad) that maintains connections to every app through four transport layers:
- Accessibility (AX) — richest data: roles, labels, values, actions for any native app
- Chrome DevTools Protocol — persistent WebSocket to Electron apps, 7ms eval
- AppleScript — app scripting + Safari JS injection (works on locked screens 🔒)
- Screenshots — CGWindowListCreateImage for visual capture
The self-healing router picks the best transport per app and auto-falls back on failure. Snapshots are cached with stable refs.
→ Deep dive: docs/ARCHITECTURE.md
brew install thegreysky/tap/cuacurl -fsSL https://raw.githubusercontent.com/thegreysky/agentview/main/install.sh | shgit clone https://github.com/thegreysky/agentview.git
cd agentview
swift build -c release
cp .build/release/cua .build/release/cuad ~/.local/bin/# Start the daemon
cua daemon start
# Grant Accessibility permission when prompted
# For Safari: enable Develop → Allow JavaScript from Apple Eventscua is designed to be called from AI agent tool loops. The JSON output is structured for LLM consumption:
- Refs (
e1,e2,w1) are stable handles to UI elements - Fuzzy matching (
--match "Sign In") means your agent doesn't need exact selectors - Semantic page types (
login,search,article,table) let your agent understand what it's looking at pipecommand combines snapshot + match + act in one round-trip (~200ms total)
# cua skill for OpenClaw agents
name: cua
description: See and interact with any macOS app via cua CLI## Available Commands
- `cua list` — see what's running
- `cua snapshot <app>` — get UI state with refs
- `cua act <app> click --ref e3` — click element e3
- `cua web navigate <url>` — open a URL in Safari
- `cua web fill "email" --value "..."` — fill a form field
- `cua screenshot <app>` — capture window screenshot
## Tips
- Use `pipe` for one-shot interactions (faster than snapshot + act)
- Check `status` before UI operations — if screen is locked, use web commands
- Refs are stable within a session — `e1` stays `e1` until the element disappears- Phase 1: Daemon + UDS + persistent CDP + screen state
- Phase 2: Self-healing router + transport fallback
- Phase 3: Snapshot cache + event bus
- Phase 4: Safari browser control + semantic page analysis
- Phase 5: Transport-aware enrichers + OCR fallback
- Chrome DevTools integration (remote debugging)
- Multi-display support
- Agent skill marketplace integration
MIT