Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions .claude/agents/listen-drive-and-steer-system-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ You are running as job `{{JOB_ID}}`. Your job file is at `apps/listen/jobs/{{JOB
## Workflows

You have three workflows: `Work & Progress Updates`, `Summary`, and `Clean Up`.
As you work through your designated task, fulfill the details of each workflow.
**All three are mandatory.** The job is not complete until all three are done.

### 1. Work & Progress Updates

Expand Down Expand Up @@ -36,12 +36,24 @@ yq -i '.summary = "Opened Safari, captured accessibility tree with 42 elements,

### 3. Clean Up

After writing your summary, clean up everything you created during the job:
**This step is mandatory — do not skip it.** After writing your summary, run cleanup before you finish.

- IMPORTANT: **Kill any tmux sessions you created** with `drive session kill <name>` — only sessions YOU created, not the session you are running in
- IMPORTANT: **Close apps you opened** that were not already running before your task started
- **Remove any previous coding instances** that were not closed in the previous session. Look sitting Claude Code, PI, Gemini, Codex, OpenCode, or other agents just sitting doing nothing.
- **Remove temp files** you wrote to `/tmp/` that are no longer needed
- **Leave the desktop as you found it** — minimize or close windows you opened
Before starting your task, note which apps are already running so you know what to close afterward:
```bash
osascript -e 'tell application "System Events" to get name of every process whose background only is false'
```

Clean up everything you created:

- **Kill tmux sessions you created**`drive session kill <name>` — only sessions YOU created, not your own job session
- **Close apps you opened** that were not already running before your task — use `osascript -e 'quit app "AppName"'`
- **Remove temp files** you wrote to `/tmp/``rm /tmp/steer-* /tmp/your-files`
- **Close extra windows** — if an app was already running, close only the windows you opened
- **Remove idle coding instances** — close any Claude Code, PI, Gemini, Codex, or OpenCode windows just sitting doing nothing

After cleanup, append a final update confirming cleanup is done:
```bash
yq -i '.updates += ["Cleanup complete — closed opened apps and removed temp files"]' apps/listen/jobs/{{JOB_ID}}.yaml
```

Do NOT kill your own job session (`job-{{JOB_ID}}`) — the worker process handles that.
2 changes: 1 addition & 1 deletion .claude/commands/listen-drive-and-steer-user-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ skills:

You are an autonomous macOS agent with full control of this device via two CLI tools:

- **steer** — GUI automation (see screen, click, type, hotkey, OCR, window management)
- **steer** — GUI automation (see screen, click, type, hotkey, OCR, window management). Binary: `apps/steer/.build/release/steer`. **If `steer` is not available** (command not found), use the native macOS toolkit instead: `osascript` for GUI control, `screencapture` for screenshots, `open` to launch apps, `pbpaste`/`pbcopy` for clipboard, `curl` for web requests, and Safari's `source of front document` to read web page content. See the steer SKILL.md for full native toolkit reference.
- **drive** — Terminal automation (tmux sessions, run commands, read output, parallel execution)

## Your Primary Task
Expand Down
100 changes: 100 additions & 0 deletions .claude/skills/steer/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,106 @@ Binary: `apps/steer/.build/release/steer`

Run `steer --help` and `steer help <command>` to learn each command's flags before using it.

## ⚠️ If `steer` is not available

If `steer` is not installed or `command not found`, use the **Native macOS Toolkit** below instead. These tools are always available and cover most of the same capabilities.

### Native macOS Toolkit (AppleScript + shell)

**Launch / activate apps:**
```bash
osascript -e 'tell application "Safari" to activate'
open -a Safari
open -a Safari "https://news.ycombinator.com"
```

**Navigate to a URL in Safari:**
```bash
osascript -e 'tell application "Safari" to set URL of front document to "https://example.com"'
# Or open a new window:
osascript -e 'tell application "Safari" to open location "https://example.com"'
```

**Wait for page load:**
```bash
# Poll until Safari is not loading
for i in $(seq 1 30); do
loading=$(osascript -e 'tell application "Safari" to return loading of front document')
[ "$loading" = "false" ] && break
sleep 1
done
```

**Take a screenshot:**
```bash
screencapture -x /tmp/screenshot.png # full screen, silent
screencapture -x -R 0,0,1920,1080 /tmp/s.png # region
```

**Read clipboard / write clipboard:**
```bash
pbpaste # read
echo "text" | pbcopy # write
```

**Type text into focused app:**
```bash
osascript -e 'tell application "System Events" to keystroke "hello world"'
```

**Press keys / hotkeys:**
```bash
osascript -e 'tell application "System Events" to keystroke "a" using command down' # Cmd+A
osascript -e 'tell application "System Events" to key code 36' # Return (key code 36)
osascript -e 'tell application "System Events" to key code 53' # Escape
```

**Click at coordinates:**
```bash
osascript -e 'tell application "System Events" to click at {500, 300}'
```

**Read text from screen (OCR via Vision framework):**
```bash
# Use drive to run a python one-liner:
python3 -c "
import Vision, Quartz, objc
# ... or use screencapture + a Vision script
"
# Simpler: read Safari's page source directly when dealing with web content
osascript -e 'tell application "Safari" to return source of front document'
```

**Get web page content without GUI:**
```bash
curl -s "https://example.com"
# Or from the already-open Safari page:
osascript -e 'tell application "Safari" to return source of front document'
```

**Create a Note in Notes.app:**
```bash
osascript <<'EOF'
tell application "Notes"
activate
set newNote to make new note at folder "Notes" with properties {name:"My Title", body:"Content here"}
end tell
EOF
```

**Check what apps are running:**
```bash
osascript -e 'tell application "System Events" to get name of every process whose background only is false'
```

### Workflow with native toolkit

The same observe-act-verify loop applies:
1. `screencapture -x /tmp/before.png` — capture state
2. Perform one action (osascript / open / curl)
3. `screencapture -x /tmp/after.png` — verify it worked
4. Read the screenshot or page source to confirm before proceeding

## Commands

| Command | Purpose |
Expand Down
3 changes: 3 additions & 0 deletions .env.sample
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ OPENAI_API_KEY=
GEMINI_API_KEY=
OPENROUTER_API_KEY=

# Listen server — set a strong random secret; all API callers must send X-API-Key: <value>
LISTEN_API_KEY=

# Listen server URL (default: http://localhost:7600)
AGENT_SANDBOX_URL=

21 changes: 14 additions & 7 deletions apps/direct/client.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,38 @@
import os

import httpx


def _headers() -> dict:
key = os.environ.get("LISTEN_API_KEY", "")
return {"X-API-Key": key} if key else {}


def start_job(url: str, prompt: str) -> dict:
"""POST to url/job with prompt, returns response dict."""
response = httpx.post(f"{url}/job", json={"prompt": prompt})
response = httpx.post(f"{url}/job", json={"prompt": prompt}, headers=_headers())
response.raise_for_status()
return response.json()


def get_job(url: str, job_id: str) -> str:
"""GET url/job/{job_id}, returns YAML content."""
response = httpx.get(f"{url}/job/{job_id}")
response = httpx.get(f"{url}/job/{job_id}", headers=_headers())
response.raise_for_status()
return response.text


def list_jobs(url: str, archived: bool = False) -> str:
"""GET url/jobs, returns YAML content."""
params = {"archived": "true"} if archived else {}
response = httpx.get(f"{url}/jobs", params=params)
response = httpx.get(f"{url}/jobs", params=params, headers=_headers())
response.raise_for_status()
return response.text


def clear_jobs(url: str) -> dict:
"""POST url/jobs/clear, returns response dict."""
response = httpx.post(f"{url}/jobs/clear")
response = httpx.post(f"{url}/jobs/clear", headers=_headers())
response.raise_for_status()
return response.json()

Expand All @@ -34,7 +41,7 @@ def latest_jobs(url: str, n: int = 1) -> str:
"""GET the full details of the latest N jobs."""
import yaml

response = httpx.get(f"{url}/jobs")
response = httpx.get(f"{url}/jobs", headers=_headers())
response.raise_for_status()
data = yaml.safe_load(response.text)
jobs = data.get("jobs") or []
Expand All @@ -45,14 +52,14 @@ def latest_jobs(url: str, n: int = 1) -> str:
parts = []
for job in latest:
job_id = job["id"]
detail = httpx.get(f"{url}/job/{job_id}")
detail = httpx.get(f"{url}/job/{job_id}", headers=_headers())
detail.raise_for_status()
parts.append(detail.text)
return "---\n".join(parts)


def stop_job(url: str, job_id: str) -> dict:
"""DELETE url/job/{job_id}, returns response dict."""
response = httpx.delete(f"{url}/job/{job_id}")
response = httpx.delete(f"{url}/job/{job_id}", headers=_headers())
response.raise_for_status()
return response.json()
43 changes: 30 additions & 13 deletions apps/drive/modules/tmux.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,23 +79,40 @@ def to_dict(self) -> dict:
def open_terminal_window(command: str) -> None:
"""Open a new Terminal.app window and run a command in it.
Uses AppleScript on macOS to tell Terminal.app to execute a script.
The new window inherits the current working directory.
Writes the command to a temporary shell script instead of embedding it
directly in the AppleScript string, preventing AppleScript injection via
special characters (single quotes, newlines, backticks) in cwd or command.
"""
if platform.system() != "Darwin":
return # silently skip on non-macOS
import shlex
import stat
import tempfile

cwd = os.getcwd()
shell_command = f"cd '{cwd}' && {command}"
escaped = shell_command.replace("\\", "\\\\").replace('"', '\\"')
subprocess.run(
[
"osascript",
"-e",
f'tell application "Terminal" to do script "{escaped}"',
],
capture_output=True,
text=True,
)
script_content = f"#!/bin/sh\ncd {shlex.quote(cwd)} && {command}\n"
fd, script_path = tempfile.mkstemp(suffix=".sh", prefix="drive-term-")
try:
os.write(fd, script_content.encode())
os.close(fd)
os.chmod(script_path, stat.S_IRWXU) # 0o700 — owner-execute only
# script_path is a system-generated path; safe to embed after quote-escaping
escaped_path = script_path.replace("\\", "\\\\").replace('"', '\\"')
subprocess.run(
[
"osascript",
"-e",
f'tell application "Terminal" to do script "{escaped_path}"',
],
capture_output=True,
text=True,
)
except Exception:
try:
os.close(fd)
except OSError:
pass
raise


def create_session(
Expand Down
Loading