Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
b6a15ae
feat: add data model foundation for progressive skeleton traversal
lahfir Mar 10, 2026
b13dc69
feat: implement progressive skeleton traversal with ref-rooted drill-…
lahfir Mar 10, 2026
36fe3e4
fix: mention --skeleton in STALE_REF error suggestion
lahfir Mar 10, 2026
872bd11
docs: document --skeleton and --root flags in skill reference
lahfir Mar 10, 2026
a23f25a
docs: update phases.md and CLAUDE.md for progressive skeleton traversal
lahfir Mar 10, 2026
854607b
docs: make progressive skeleton traversal the default agent workflow
lahfir Mar 10, 2026
cec176d
fix: preserve skeleton drill-down anchors
cursoragent Mar 10, 2026
1ebc865
fix: preserve drill-down refs across skeleton re-snapshots
lahfir Mar 11, 2026
dceb283
fix: preserve drill-down depth for root snapshots
cursoragent Mar 11, 2026
5acd3b4
fix: verify AX action effect on Electron elements before trusting suc…
lahfir Mar 11, 2026
725df15
chore: ignore .context/ for local tooling artifacts
lahfir Apr 14, 2026
4424e17
style: collapse web_action_had_effect signature to single line
lahfir Apr 14, 2026
1797abd
test: cover RefMap save oversize rejection
lahfir Apr 14, 2026
4aa90f2
test: assert stale-ref suggestion mentions --skeleton
lahfir Apr 14, 2026
188b62c
test: cover skeleton-to-drill-down counter continuity
lahfir Apr 14, 2026
8491481
test: cover --root + --surface rejection at execute() boundary
lahfir Apr 14, 2026
85b7d25
test: add filesystem-redirected integration tests for run_from_ref
lahfir Apr 14, 2026
62f8eab
test: add golden fixtures for skeleton output and drill-down refmap
lahfir Apr 14, 2026
1bc23cb
fix: bound build_subtree recursion on Electron wrapper chains
lahfir Apr 14, 2026
181a835
chore: add pre-commit hook running fmt + clippy + tests
lahfir Apr 14, 2026
8d963e8
fix: prevent orphaned drill-down refs leaking across skeleton refresh
lahfir Apr 14, 2026
2032537
fix: align resolve traversal with snapshot child-attribute set
lahfir Apr 14, 2026
0fcf4e8
fix: bypass AXConfirm on web elements and add skeleton anchors to dri…
lahfir Apr 14, 2026
ae78cbc
fix: compare AX elements via CFEqual in web_action_had_effect
lahfir Apr 14, 2026
d06a6c2
refactor: unify allocate_refs across snapshot and drill-down paths
lahfir Apr 14, 2026
7c7837a
chore: drop with_root from drill test names after allocator unification
lahfir Apr 14, 2026
3450e34
docs: compound DRY ref-allocator dedupe into knowledge base
lahfir Apr 14, 2026
b814300
refactor: drop unused root_ref field from TreeOptions
lahfir Apr 14, 2026
4aa43fa
fix: suppress skeleton flag when --root is set
lahfir Apr 14, 2026
f6f8825
fix: detect web action effect via value + selected + focus change
lahfir Apr 14, 2026
dfdcee5
fix: emit skeleton boundary when drill path is about to hit raw cap
lahfir Apr 14, 2026
24e0481
perf: compute is_in_webarea once per verified-press call
lahfir Apr 14, 2026
54cc15f
fix: skeleton anchors in drill-downs must not inherit root_ref; resto…
lahfir Apr 15, 2026
1b1088e
fix: suppress skeleton anchor creation in drill-down mode to prevent …
lahfir Apr 15, 2026
35ceacb
fix: bounds-based resolver pruning and CGClick fallback on chain timeout
lahfir Apr 15, 2026
e9503d3
refactor: resolve 9 code-review todos for progressive skeleton traversal
lahfir Apr 16, 2026
43d0e6a
docs: compound progressive snapshot review hardening
lahfir Apr 16, 2026
774b8a5
docs: tighten progressive snapshot compound write-up
lahfir Apr 16, 2026
3f011a7
docs: update README with progressive skeleton traversal and fix comma…
lahfir Apr 16, 2026
d75db76
docs: correct command count to 53, add Notifications section and plat…
lahfir Apr 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .githooks/pre-commit
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/usr/bin/env bash
# Pre-commit hook for agent-desktop.
# Mirrors the CI quality gates so a failing commit never reaches origin.
#
# Setup (once per clone):
# git config core.hooksPath .githooks
#
# Bypass for a single commit (only when truly necessary):
# git commit --no-verify

set -euo pipefail

if [ -n "${SKIP_PRECOMMIT:-}" ]; then
echo "pre-commit: SKIP_PRECOMMIT set, skipping"
exit 0
fi

if ! command -v cargo >/dev/null 2>&1; then
echo "pre-commit: cargo not found in PATH, skipping (install Rust to enable)"
exit 0
fi

cd "$(git rev-parse --show-toplevel)"

if git diff --cached --name-only --diff-filter=ACMR | grep -qE '\.(rs|toml)$'; then
HAS_RUST_CHANGES=1
else
HAS_RUST_CHANGES=0
fi

if [ "$HAS_RUST_CHANGES" -eq 0 ]; then
echo "pre-commit: no Rust changes staged, skipping cargo checks"
exit 0
fi

run() {
local label="$1"
shift
printf '\033[1;34m▶ %s\033[0m\n' "$label"
if ! "$@"; then
printf '\033[1;31m✗ %s failed\033[0m\n' "$label" >&2
echo "" >&2
echo "pre-commit: refusing the commit. Fix the issues above or rerun with SKIP_PRECOMMIT=1 to bypass." >&2
exit 1
fi
}

run "cargo fmt --all -- --check" cargo fmt --all -- --check
run "cargo clippy --all-targets -- -D warnings" cargo clippy --all-targets -- -D warnings
run "cargo test --lib --workspace" cargo test --lib --workspace

printf '\033[1;32m✓ pre-commit checks passed\033[0m\n'
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,8 @@ docs/*
!docs/architecture.excalidraw
!docs/architecture.png
!docs/phases.md
!docs/solutions/
!docs/solutions/**
todos/
.cursor/
.cursor/
.context/
31 changes: 27 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,16 @@ cargo tree -p agent-desktop-core # Verify no platform crate leaks

Run the binary: `./target/release/agent-desktop snapshot --app Finder -i`

## Pre-commit Hook

The repo ships a pre-commit hook at `.githooks/pre-commit` that runs `cargo fmt --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test --lib --workspace` against staged Rust changes. Wire it up once after cloning:

```bash
git config core.hooksPath .githooks
```

Bypass for an emergency commit with `git commit --no-verify` or `SKIP_PRECOMMIT=1 git commit ...`.

## Project Overview

Cross-platform Rust CLI + MCP server enabling AI agents to observe and control desktop applications via native OS accessibility trees.
Expand Down Expand Up @@ -57,6 +67,10 @@ agent-desktop/
├── clippy.toml # project-wide lint config
├── crates/
│ ├── core/ # agent-desktop-core (platform-agnostic)
│ │ └── src/
│ │ ├── ref_alloc.rs # Shared ref helpers (INTERACTIVE_ROLES, is_collapsible)
│ │ ├── snapshot_ref.rs # Ref-rooted drill-down (run_from_ref)
│ │ └── commands/ # one file per command
│ ├── macos/ # agent-desktop-macos (Phase 1)
│ ├── windows/ # agent-desktop-windows (stub → Phase 2)
│ └── linux/ # agent-desktop-linux (stub → Phase 2)
Expand All @@ -66,6 +80,8 @@ agent-desktop/
│ ├── cli_args.rs # all command argument structs
│ ├── dispatch.rs # command dispatcher + parse helpers
│ └── batch_dispatch.rs # batch command execution
├── docs/
│ └── solutions/ # documented solutions to past problems (bugs, best practices, workflow patterns), organized by category with YAML frontmatter (module, tags, problem_type); relevant when implementing or debugging in documented areas
└── tests/
├── fixtures/ # golden JSON snapshots
└── integration/ # macOS CI integration tests
Expand Down Expand Up @@ -284,16 +300,20 @@ Error responses:
- RefMap stored at `~/.agent-desktop/last_refmap.json` with `0o600` permissions, directory at `0o700`
- Each snapshot REPLACES the refmap file entirely (atomic write via temp + rename)
- Action commands use optimistic re-identification: `(pid, role, name, bounds_hash)`. Return `STALE_REF` on mismatch.
- Progressive traversal: `--skeleton` clamps depth to 3, annotates truncated containers with `children_count`. Named/described containers at boundary receive refs as drill-down targets
- Drill-down: `--root @ref` starts from a previously-discovered ref with scoped invalidation (only that ref's subtree refs are replaced on re-drill)
- RefMap size check: write-side guard prevents >1MB refmap files

## PlatformAdapter Trait

12 methods with default implementations returning `not_supported()`:
13 methods with default implementations returning `not_supported()`:

```rust
pub trait PlatformAdapter: Send + Sync {
fn list_windows(&self, filter: &WindowFilter) -> Result<Vec<WindowInfo>, AdapterError>;
fn list_apps(&self) -> Result<Vec<AppInfo>, AdapterError>;
fn get_tree(&self, win: &WindowInfo, opts: &TreeOptions) -> Result<AccessibilityNode, AdapterError>;
fn get_subtree(&self, handle: &NativeHandle, opts: &TreeOptions) -> Result<AccessibilityNode, AdapterError>;
fn execute_action(&self, handle: &NativeHandle, action: Action) -> Result<ActionResult, AdapterError>;
fn resolve_element(&self, entry: &RefEntry) -> Result<NativeHandle, AdapterError>;
fn check_permissions(&self) -> PermissionStatus;
Expand Down Expand Up @@ -406,7 +426,9 @@ Target binary size: <15MB per platform.
- `cargo test --workspace`
- Binary size check: fail if release binary exceeds 15MB

## Implemented Commands (50)
## Implemented Commands (53)

> **Platform note:** All 53 commands are implemented on macOS (Phase 1). Windows and Linux adapters are planned (Phase 2/3) and will support the same command surface; notification commands depend on platform-specific notification APIs.

| Category | Commands |
|----------|----------|
Expand All @@ -415,9 +437,10 @@ Target binary size: <15MB per platform.
| Interaction (14) | `click`, `double-click`, `triple-click`, `right-click`, `type`, `set-value`, `clear`, `focus`, `select`, `toggle`, `check`, `uncheck`, `expand`, `collapse` |
| Scroll (2) | `scroll`, `scroll-to` |
| Keyboard (3) | `press`, `key-down`, `key-up` |
| Mouse (5) | `hover`, `drag`, `mouse-move`, `mouse-click`, `mouse-down`, `mouse-up` |
| Mouse (6) | `hover`, `drag`, `mouse-move`, `mouse-click`, `mouse-down`, `mouse-up` |
| Notifications (4) *(macOS)* | `list-notifications`, `dismiss-notification`, `dismiss-all-notifications`, `notification-action` |
| Clipboard (3) | `clipboard-get`, `clipboard-set`, `clipboard-clear` |
| Wait (1) | `wait` (with `--element`, `--window`, `--text`, `--menu` flags) |
| Wait (1) | `wait` (with `--element`, `--window`, `--text`, `--menu`, `--notification` flags) |
| System (3) | `status`, `permissions`, `version` |
| Batch (1) | `batch` |

Expand Down
38 changes: 35 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
## Key Features

- **Native Rust CLI**: Fast, single binary, no runtime dependencies
- **50 commands**: Observation, interaction, keyboard, mouse, clipboard, window management
- **53 commands**: Observation, interaction, keyboard, mouse, notifications, clipboard, window management
- **Progressive skeleton traversal**: 78–96% token reduction on dense apps via shallow overview + targeted drill-down
- **Snapshot & refs**: AI-optimized workflow using deterministic element references (`@e1`, `@e2`)
- **AX-first interactions**: Every action exhausts pure accessibility API strategies before falling back to mouse events
- **Structured JSON output**: Machine-readable responses with error codes and recovery hints
Expand Down Expand Up @@ -52,6 +53,24 @@ agent-desktop permissions --request # trigger system dialog

## Core Workflow for AI

For dense apps (Slack, VS Code, Notion), use **progressive skeleton traversal** to minimize token usage:

```bash
# 1. Shallow overview — depth-3 map, truncated containers show children_count
agent-desktop snapshot --skeleton --app Slack -i --compact

# 2. Drill into a region of interest (named containers get refs as drill targets)
agent-desktop snapshot --root @e3 -i --compact

# 3. Act on an element found in the drill-down
agent-desktop click @e12

# 4. Re-drill the same region to verify the state change
agent-desktop snapshot --root @e3 -i --compact
```

For simple apps, a full snapshot is fine:

```bash
agent-desktop snapshot --app Finder -i # get interactive elements with refs
agent-desktop click @e3 # click a button by ref
Expand All @@ -60,8 +79,6 @@ agent-desktop press cmd+s # keyboard shortcut
agent-desktop snapshot -i # re-observe after UI changes
```

The snapshot + ref pattern is optimal for LLMs: refs provide deterministic element selection without re-querying the accessibility tree.

```
Agent loop: snapshot → decide → act → snapshot → decide → act → ...
```
Expand Down Expand Up @@ -141,6 +158,18 @@ agent-desktop maximize w-4521 # maximize
agent-desktop restore w-4521 # restore
```

### Notifications *(macOS only)*

```bash
agent-desktop list-notifications # list all notifications
agent-desktop list-notifications --app "Slack" # filter by app
agent-desktop list-notifications --text "deploy" --limit 5 # filter by text
agent-desktop dismiss-notification 1 # dismiss by index
agent-desktop dismiss-all-notifications # dismiss all
agent-desktop dismiss-all-notifications --app "Slack" # dismiss all from app
agent-desktop notification-action 1 --action "Reply" # click action button
```

### Clipboard

```bash
Expand Down Expand Up @@ -192,6 +221,8 @@ agent-desktop snapshot [OPTIONS]
| `--compact` | off | Omit empty structural nodes |
| `--include-bounds` | off | Include pixel bounds (x, y, width, height) |
| `--max-depth <N>` | 10 | Maximum tree depth |
| `--skeleton` | off | Shallow 3-level overview; truncated containers show `children_count` and get refs as drill targets |
| `--root <REF>` | - | Start traversal from this ref; merges into existing refmap with scoped invalidation |
| `--surface <TYPE>` | window | `window`, `focused`, `menu`, `menubar`, `sheet`, `popover`, `alert` |

## JSON Output
Expand Down Expand Up @@ -262,6 +293,7 @@ snapshot → act → STALE_REF? → snapshot again → retry
| Screenshot | **Yes** | Planned | Planned |
| Clipboard | **Yes** | Planned | Planned |
| App & window management | **Yes** | Planned | Planned |
| Notifications | **Yes** | Planned | Planned |

## Development

Expand Down
10 changes: 10 additions & 0 deletions crates/core/src/adapter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ pub struct TreeOptions {
pub interactive_only: bool,
pub compact: bool,
pub surface: SnapshotSurface,
pub skeleton: bool,
}

impl Default for TreeOptions {
Expand All @@ -40,6 +41,7 @@ impl Default for TreeOptions {
interactive_only: false,
compact: false,
surface: SnapshotSurface::Window,
skeleton: false,
}
}
}
Expand Down Expand Up @@ -244,4 +246,12 @@ pub trait PlatformAdapter: Send + Sync {
) -> Result<ActionResult, AdapterError> {
Err(AdapterError::not_supported("notification_action"))
}

fn get_subtree(
&self,
_handle: &NativeHandle,
_opts: &TreeOptions,
) -> Result<AccessibilityNode, AdapterError> {
Err(AdapterError::not_supported("get_subtree"))
}
}
Loading
Loading