Skip to content

feat: cua-driver agent integration + headless CLIs (Node + Rust)#7

Merged
tensorboy merged 1 commit intomainfrom
feat/headless-cli-and-cua-agent
Apr 27, 2026
Merged

feat: cua-driver agent integration + headless CLIs (Node + Rust)#7
tensorboy merged 1 commit intomainfrom
feat/headless-cli-and-cua-agent

Conversation

@tensorboy
Copy link
Copy Markdown
Owner

Three deliverables landing together because they share the EventSink decoupling:

1. cua-driver agent integration (Tauri + frontend)

  • Hawkeye chat panel "Agent" toggle: Gemini gets a curated 8-tool catalog and drives macOS apps in the background via trycua/cua's cua-driver — no focus theft, no cursor movement.
  • New: src-tauri/src/agent/{protocol,cua_driver,tools,runner}.rs, commands/agent_cmd.rs, src/hooks/useAgent.ts
  • Modified: ai/{types,provider,gemini}.rschat_with_tools + Gemini function-calling
  • 5/5 unit tests pass for the cua-driver protocol layer

2. Phase 1 — Node CLI (@hawkeye/cli)

  • New packages/cli/ workspace package wrapping @hawkeye/core, zero changes to core
  • 7 subcommands: init / perceive / plan / execute / run / chat / daemon
  • Global --json flag, 4-layer config merge, 17 KB ESM bundle

3. Phase 2 — Rust CLI (hawkeye-cli) + EventSink decoupling

  • New src/event_sink.rs (EventSink trait + TauriSink / StdoutSink / NoopSink)
  • New src/bin/cli.rs — clap-based CLI: config / observe / chat / agent / agent-status
  • Refactored run_user_turn and ObserveLoop::start to take Arc<dyn EventSink> instead of AppHandle
  • Both bins build clean: cargo build --bin hawkeye-{desktop,cli}

Verification

  • cargo test --lib agent:: → 5 passed
  • Tauri desktop app unchanged
  • TypeScript clean for new files

Documentation

  • HEADLESS.md, HEADLESS_PLAN.md — execution plan + ops guide
  • packages/desktop-tauri/AGENT_INTEGRATION.md — architecture deep-dive
  • packages/cli/README.md — Node CLI usage

Three deliverables landing together because they share the EventSink decoupling:

## 1. cua-driver agent integration (Tauri + frontend)
Hawkeye's chat panel now has an "Agent" toggle: when on, Gemini receives a
curated 8-tool catalog (screenshot/list_windows/get_window_state/click/
type_text/press_key/scroll/launch_app) and can drive native macOS apps in
the background via trycua/cua's cua-driver — no focus theft, no cursor
movement, even on non-AX surfaces (Chromium, Blender, Figma).

- src-tauri/src/agent/{protocol,cua_driver,tools,runner,mod}.rs — Unix-socket
  client speaking cua-driver's line-delimited JSON protocol; curated tools;
  tool-use loop with MAX_TOOL_ROUNDS=8 and screenshot follow-up image attach
- src-tauri/src/commands/agent_cmd.rs — get_agent_status, start_agent,
  chat_with_agent, invoke_cua_tool Tauri commands
- src-tauri/src/ai/{types,provider,gemini}.rs — FunctionDeclaration /
  FunctionCall / ToolMessage / ToolTurn types; AiProvider::chat_with_tools
  default-impl + full Gemini implementation with function_call /
  function_response part translation
- src/hooks/useAgent.ts + components/ChatPanel.tsx — agent-mode toggle,
  live tool-call streaming via agent:tool-call-* events, audit trail UI
- 5/5 unit tests pass for the cua-driver protocol layer

## 2. Phase 1 — Node CLI (@hawkeye/cli)
New workspace package wrapping @hawkeye/core. Zero changes to core.

- packages/cli/ — TypeScript ESM, tsup-bundled, 17 KB shebang'd dist/main.js
- 7 subcommands: init / perceive / plan / execute / run / chat / daemon
- Global --json flag for NDJSON output
- 4-layer config merge: defaults → ~/.config/hawkeye/cli.json → env → CLI args
- daemon falls back to polling pending an `observation` event in core

## 3. Phase 2 — Rust CLI (hawkeye-cli) + EventSink decoupling
The Tauri Rust backend now exposes a UI-agnostic EventSink trait so the same
observe + agent code paths run from a standalone CLI binary.

- src/event_sink.rs — EventSink trait + TauriSink / StdoutSink / NoopSink
- src/bin/cli.rs — clap-based CLI: config / observe / chat / agent / agent-status
- Cargo.toml — explicit [[bin]] hawkeye-{desktop,cli} + clap dep
- Refactor: lib.rs all `mod` → `pub mod`; AppState + event_sink field;
  ObserveLoop::start and run_user_turn take Arc<dyn EventSink> instead of
  AppHandle; perception::init dropped its unused AppHandle param

Both bins build clean. 5/5 lib tests pass. Tauri desktop app unchanged.

## Documentation
- HEADLESS_PLAN.md — full execution plan (Phases 1-3)
- HEADLESS.md — operations guide for both CLIs + cua-driver agent mode
- packages/desktop-tauri/AGENT_INTEGRATION.md — architecture deep-dive
- packages/cli/README.md — Node CLI usage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tensorboy tensorboy merged commit 46fdfe0 into main Apr 27, 2026
@tensorboy tensorboy deleted the feat/headless-cli-and-cua-agent branch April 27, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant