diff --git a/.claude/agents/listen-drive-and-steer-system-prompt.md b/.claude/agents/listen-drive-and-steer-system-prompt.md index d276128..4cb19fa 100644 --- a/.claude/agents/listen-drive-and-steer-system-prompt.md +++ b/.claude/agents/listen-drive-and-steer-system-prompt.md @@ -7,7 +7,7 @@ You are running as job `{{JOB_ID}}`. Your job file is at `apps/listen/jobs/{{JOB ## Workflows You have three workflows: `Work & Progress Updates`, `Summary`, and `Clean Up`. -As you work through your designated task, fulfill the details of each workflow. +**All three are mandatory.** The job is not complete until all three are done. ### 1. Work & Progress Updates @@ -36,12 +36,24 @@ yq -i '.summary = "Opened Safari, captured accessibility tree with 42 elements, ### 3. Clean Up -After writing your summary, clean up everything you created during the job: +**This step is mandatory — do not skip it.** After writing your summary, run cleanup before you finish. -- IMPORTANT: **Kill any tmux sessions you created** with `drive session kill ` — only sessions YOU created, not the session you are running in -- IMPORTANT: **Close apps you opened** that were not already running before your task started -- **Remove any previous coding instances** that were not closed in the previous session. Look sitting Claude Code, PI, Gemini, Codex, OpenCode, or other agents just sitting doing nothing. -- **Remove temp files** you wrote to `/tmp/` that are no longer needed -- **Leave the desktop as you found it** — minimize or close windows you opened +Before starting your task, note which apps are already running so you know what to close afterward: +```bash +osascript -e 'tell application "System Events" to get name of every process whose background only is false' +``` + +Clean up everything you created: + +- **Kill tmux sessions you created** — `drive session kill ` — only sessions YOU created, not your own job session +- **Close apps you opened** that were not already running before your task — use `osascript -e 'quit app "AppName"'` +- **Remove temp files** you wrote to `/tmp/` — `rm /tmp/steer-* /tmp/your-files` +- **Close extra windows** — if an app was already running, close only the windows you opened +- **Remove idle coding instances** — close any Claude Code, PI, Gemini, Codex, or OpenCode windows just sitting doing nothing + +After cleanup, append a final update confirming cleanup is done: +```bash +yq -i '.updates += ["Cleanup complete — closed opened apps and removed temp files"]' apps/listen/jobs/{{JOB_ID}}.yaml +``` Do NOT kill your own job session (`job-{{JOB_ID}}`) — the worker process handles that. diff --git a/.claude/commands/listen-drive-and-steer-user-prompt.md b/.claude/commands/listen-drive-and-steer-user-prompt.md index 4f11c77..130f495 100644 --- a/.claude/commands/listen-drive-and-steer-user-prompt.md +++ b/.claude/commands/listen-drive-and-steer-user-prompt.md @@ -9,7 +9,7 @@ skills: You are an autonomous macOS agent with full control of this device via two CLI tools: -- **steer** — GUI automation (see screen, click, type, hotkey, OCR, window management) +- **steer** — GUI automation (see screen, click, type, hotkey, OCR, window management). Binary: `apps/steer/.build/release/steer`. **If `steer` is not available** (command not found), use the native macOS toolkit instead: `osascript` for GUI control, `screencapture` for screenshots, `open` to launch apps, `pbpaste`/`pbcopy` for clipboard, `curl` for web requests, and Safari's `source of front document` to read web page content. See the steer SKILL.md for full native toolkit reference. - **drive** — Terminal automation (tmux sessions, run commands, read output, parallel execution) ## Your Primary Task diff --git a/.claude/skills/steer/SKILL.md b/.claude/skills/steer/SKILL.md index 0d1e2b0..5d1084c 100644 --- a/.claude/skills/steer/SKILL.md +++ b/.claude/skills/steer/SKILL.md @@ -9,6 +9,106 @@ Binary: `apps/steer/.build/release/steer` Run `steer --help` and `steer help ` to learn each command's flags before using it. +## ⚠️ If `steer` is not available + +If `steer` is not installed or `command not found`, use the **Native macOS Toolkit** below instead. These tools are always available and cover most of the same capabilities. + +### Native macOS Toolkit (AppleScript + shell) + +**Launch / activate apps:** +```bash +osascript -e 'tell application "Safari" to activate' +open -a Safari +open -a Safari "https://news.ycombinator.com" +``` + +**Navigate to a URL in Safari:** +```bash +osascript -e 'tell application "Safari" to set URL of front document to "https://example.com"' +# Or open a new window: +osascript -e 'tell application "Safari" to open location "https://example.com"' +``` + +**Wait for page load:** +```bash +# Poll until Safari is not loading +for i in $(seq 1 30); do + loading=$(osascript -e 'tell application "Safari" to return loading of front document') + [ "$loading" = "false" ] && break + sleep 1 +done +``` + +**Take a screenshot:** +```bash +screencapture -x /tmp/screenshot.png # full screen, silent +screencapture -x -R 0,0,1920,1080 /tmp/s.png # region +``` + +**Read clipboard / write clipboard:** +```bash +pbpaste # read +echo "text" | pbcopy # write +``` + +**Type text into focused app:** +```bash +osascript -e 'tell application "System Events" to keystroke "hello world"' +``` + +**Press keys / hotkeys:** +```bash +osascript -e 'tell application "System Events" to keystroke "a" using command down' # Cmd+A +osascript -e 'tell application "System Events" to key code 36' # Return (key code 36) +osascript -e 'tell application "System Events" to key code 53' # Escape +``` + +**Click at coordinates:** +```bash +osascript -e 'tell application "System Events" to click at {500, 300}' +``` + +**Read text from screen (OCR via Vision framework):** +```bash +# Use drive to run a python one-liner: +python3 -c " +import Vision, Quartz, objc +# ... or use screencapture + a Vision script +" +# Simpler: read Safari's page source directly when dealing with web content +osascript -e 'tell application "Safari" to return source of front document' +``` + +**Get web page content without GUI:** +```bash +curl -s "https://example.com" +# Or from the already-open Safari page: +osascript -e 'tell application "Safari" to return source of front document' +``` + +**Create a Note in Notes.app:** +```bash +osascript <<'EOF' +tell application "Notes" + activate + set newNote to make new note at folder "Notes" with properties {name:"My Title", body:"Content here"} +end tell +EOF +``` + +**Check what apps are running:** +```bash +osascript -e 'tell application "System Events" to get name of every process whose background only is false' +``` + +### Workflow with native toolkit + +The same observe-act-verify loop applies: +1. `screencapture -x /tmp/before.png` — capture state +2. Perform one action (osascript / open / curl) +3. `screencapture -x /tmp/after.png` — verify it worked +4. Read the screenshot or page source to confirm before proceeding + ## Commands | Command | Purpose | diff --git a/.env.sample b/.env.sample index 8918b97..a7ab6c9 100644 --- a/.env.sample +++ b/.env.sample @@ -3,6 +3,9 @@ OPENAI_API_KEY= GEMINI_API_KEY= OPENROUTER_API_KEY= +# Listen server — set a strong random secret; all API callers must send X-API-Key: +LISTEN_API_KEY= + # Listen server URL (default: http://localhost:7600) AGENT_SANDBOX_URL= diff --git a/apps/direct/client.py b/apps/direct/client.py index efe5a4d..0c26b5b 100644 --- a/apps/direct/client.py +++ b/apps/direct/client.py @@ -1,16 +1,23 @@ +import os + import httpx +def _headers() -> dict: + key = os.environ.get("LISTEN_API_KEY", "") + return {"X-API-Key": key} if key else {} + + def start_job(url: str, prompt: str) -> dict: """POST to url/job with prompt, returns response dict.""" - response = httpx.post(f"{url}/job", json={"prompt": prompt}) + response = httpx.post(f"{url}/job", json={"prompt": prompt}, headers=_headers()) response.raise_for_status() return response.json() def get_job(url: str, job_id: str) -> str: """GET url/job/{job_id}, returns YAML content.""" - response = httpx.get(f"{url}/job/{job_id}") + response = httpx.get(f"{url}/job/{job_id}", headers=_headers()) response.raise_for_status() return response.text @@ -18,14 +25,14 @@ def get_job(url: str, job_id: str) -> str: def list_jobs(url: str, archived: bool = False) -> str: """GET url/jobs, returns YAML content.""" params = {"archived": "true"} if archived else {} - response = httpx.get(f"{url}/jobs", params=params) + response = httpx.get(f"{url}/jobs", params=params, headers=_headers()) response.raise_for_status() return response.text def clear_jobs(url: str) -> dict: """POST url/jobs/clear, returns response dict.""" - response = httpx.post(f"{url}/jobs/clear") + response = httpx.post(f"{url}/jobs/clear", headers=_headers()) response.raise_for_status() return response.json() @@ -34,7 +41,7 @@ def latest_jobs(url: str, n: int = 1) -> str: """GET the full details of the latest N jobs.""" import yaml - response = httpx.get(f"{url}/jobs") + response = httpx.get(f"{url}/jobs", headers=_headers()) response.raise_for_status() data = yaml.safe_load(response.text) jobs = data.get("jobs") or [] @@ -45,7 +52,7 @@ def latest_jobs(url: str, n: int = 1) -> str: parts = [] for job in latest: job_id = job["id"] - detail = httpx.get(f"{url}/job/{job_id}") + detail = httpx.get(f"{url}/job/{job_id}", headers=_headers()) detail.raise_for_status() parts.append(detail.text) return "---\n".join(parts) @@ -53,6 +60,6 @@ def latest_jobs(url: str, n: int = 1) -> str: def stop_job(url: str, job_id: str) -> dict: """DELETE url/job/{job_id}, returns response dict.""" - response = httpx.delete(f"{url}/job/{job_id}") + response = httpx.delete(f"{url}/job/{job_id}", headers=_headers()) response.raise_for_status() return response.json() diff --git a/apps/drive/modules/tmux.py b/apps/drive/modules/tmux.py index 09ec220..83ae60b 100644 --- a/apps/drive/modules/tmux.py +++ b/apps/drive/modules/tmux.py @@ -79,23 +79,40 @@ def to_dict(self) -> dict: def open_terminal_window(command: str) -> None: """Open a new Terminal.app window and run a command in it. - Uses AppleScript on macOS to tell Terminal.app to execute a script. - The new window inherits the current working directory. + Writes the command to a temporary shell script instead of embedding it + directly in the AppleScript string, preventing AppleScript injection via + special characters (single quotes, newlines, backticks) in cwd or command. """ if platform.system() != "Darwin": return # silently skip on non-macOS + import shlex + import stat + import tempfile + cwd = os.getcwd() - shell_command = f"cd '{cwd}' && {command}" - escaped = shell_command.replace("\\", "\\\\").replace('"', '\\"') - subprocess.run( - [ - "osascript", - "-e", - f'tell application "Terminal" to do script "{escaped}"', - ], - capture_output=True, - text=True, - ) + script_content = f"#!/bin/sh\ncd {shlex.quote(cwd)} && {command}\n" + fd, script_path = tempfile.mkstemp(suffix=".sh", prefix="drive-term-") + try: + os.write(fd, script_content.encode()) + os.close(fd) + os.chmod(script_path, stat.S_IRWXU) # 0o700 — owner-execute only + # script_path is a system-generated path; safe to embed after quote-escaping + escaped_path = script_path.replace("\\", "\\\\").replace('"', '\\"') + subprocess.run( + [ + "osascript", + "-e", + f'tell application "Terminal" to do script "{escaped_path}"', + ], + capture_output=True, + text=True, + ) + except Exception: + try: + os.close(fd) + except OSError: + pass + raise def create_session( diff --git a/apps/listen/main.py b/apps/listen/main.py index a3b4394..99f633f 100644 --- a/apps/listen/main.py +++ b/apps/listen/main.py @@ -1,4 +1,6 @@ +import fcntl import os +import re import shutil import signal import subprocess @@ -8,22 +10,66 @@ from uuid import uuid4 import yaml -from fastapi import FastAPI, HTTPException +from dotenv import load_dotenv +from fastapi import Depends, FastAPI, HTTPException, Security from fastapi.responses import PlainTextResponse +from fastapi.security import APIKeyHeader from pydantic import BaseModel +load_dotenv(Path(__file__).parent.parent.parent / ".env") + app = FastAPI() JOBS_DIR = Path(__file__).parent / "jobs" JOBS_DIR.mkdir(exist_ok=True) ARCHIVED_DIR = JOBS_DIR / "archived" +_JOB_ID_RE = re.compile(r"^[0-9a-f]{8}$") +_api_key_header = APIKeyHeader(name="X-API-Key", auto_error=False) + + +def _require_auth(api_key: str = Security(_api_key_header)) -> None: + """Require a valid X-API-Key header matching LISTEN_API_KEY in the environment.""" + expected = os.environ.get("LISTEN_API_KEY") + if not expected: + raise HTTPException(status_code=500, detail="LISTEN_API_KEY not configured on server") + if api_key != expected: + raise HTTPException(status_code=401, detail="Unauthorized") + + +def _validate_job_id(job_id: str) -> Path: + """Validate job_id format and return the resolved job file path. + + Enforces a strict ^[0-9a-f]{8}$ pattern and additionally checks that + the resolved path stays inside JOBS_DIR to guard against path traversal. + """ + if not _JOB_ID_RE.match(job_id): + raise HTTPException(status_code=400, detail="Invalid job_id") + job_file = JOBS_DIR / f"{job_id}.yaml" + if not job_file.resolve().is_relative_to(JOBS_DIR.resolve()): + raise HTTPException(status_code=400, detail="Invalid job_id") + return job_file + + +def _read_job(path: Path) -> dict: + with open(path) as f: + fcntl.flock(f.fileno(), fcntl.LOCK_SH) + return yaml.safe_load(f) + + +def _write_job(path: Path, data: dict) -> None: + with open(path, "r+") as f: + fcntl.flock(f.fileno(), fcntl.LOCK_EX) + f.seek(0) + f.truncate() + yaml.dump(data, f, default_flow_style=False, sort_keys=False) + class JobRequest(BaseModel): prompt: str -@app.post("/job") +@app.post("/job", dependencies=[Depends(_require_auth)]) def create_job(req: JobRequest): job_id = uuid4().hex[:8] now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") @@ -38,54 +84,52 @@ def create_job(req: JobRequest): "summary": "", } - # Write YAML before spawning worker (worker reads it on startup) job_file = JOBS_DIR / f"{job_id}.yaml" + # Initial write — no lock needed; file is brand new with open(job_file, "w") as f: yaml.dump(job_data, f, default_flow_style=False, sort_keys=False) - # Spawn the worker process + # Spawn worker — prompt is read from YAML by the worker, not passed as argv + # (argv is visible to all users via ps aux for the process lifetime) worker_path = Path(__file__).parent / "worker.py" proc = subprocess.Popen( - [sys.executable, str(worker_path), job_id, req.prompt], + [sys.executable, str(worker_path), job_id], cwd=str(Path(__file__).parent), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, ) - # Update PID after spawn + # Update PID with exclusive lock to avoid racing the worker's session write job_data["pid"] = proc.pid - with open(job_file, "w") as f: - yaml.dump(job_data, f, default_flow_style=False, sort_keys=False) + _write_job(job_file, job_data) return {"job_id": job_id, "status": "running"} -@app.get("/job/{job_id}", response_class=PlainTextResponse) +@app.get("/job/{job_id}", response_class=PlainTextResponse, dependencies=[Depends(_require_auth)]) def get_job(job_id: str): - job_file = JOBS_DIR / f"{job_id}.yaml" + job_file = _validate_job_id(job_id) if not job_file.exists(): raise HTTPException(status_code=404, detail="Job not found") return job_file.read_text() -@app.get("/jobs", response_class=PlainTextResponse) +@app.get("/jobs", response_class=PlainTextResponse, dependencies=[Depends(_require_auth)]) def list_jobs(archived: bool = False): search_dir = ARCHIVED_DIR if archived else JOBS_DIR jobs = [] for f in sorted(search_dir.glob("*.yaml")): - with open(f) as fh: - data = yaml.safe_load(fh) + data = _read_job(f) jobs.append({ "id": data.get("id"), "status": data.get("status"), "prompt": data.get("prompt"), "created_at": data.get("created_at"), }) - result = yaml.dump({"jobs": jobs}, default_flow_style=False, sort_keys=False) - return result + return yaml.dump({"jobs": jobs}, default_flow_style=False, sort_keys=False) -@app.post("/jobs/clear") +@app.post("/jobs/clear", dependencies=[Depends(_require_auth)]) def clear_jobs(): ARCHIVED_DIR.mkdir(exist_ok=True) count = 0 @@ -95,29 +139,29 @@ def clear_jobs(): return {"archived": count} -@app.delete("/job/{job_id}") +@app.delete("/job/{job_id}", dependencies=[Depends(_require_auth)]) def stop_job(job_id: str): - job_file = JOBS_DIR / f"{job_id}.yaml" + job_file = _validate_job_id(job_id) if not job_file.exists(): raise HTTPException(status_code=404, detail="Job not found") - with open(job_file) as f: - data = yaml.safe_load(f) - + data = _read_job(job_file) pid = data.get("pid") - if pid: + if isinstance(pid, int) and pid > 0: try: os.kill(pid, signal.SIGTERM) - except ProcessLookupError: + except (ProcessLookupError, PermissionError): pass data["status"] = "stopped" - with open(job_file, "w") as f: - yaml.dump(data, f, default_flow_style=False, sort_keys=False) + _write_job(job_file, data) return {"job_id": job_id, "status": "stopped"} if __name__ == "__main__": import uvicorn - uvicorn.run(app, host="0.0.0.0", port=7600) + # Bind to localhost only — not exposed to the network by default. + # Set LISTEN_HOST=0.0.0.0 only if you have added network-level access controls. + host = os.environ.get("LISTEN_HOST", "127.0.0.1") + uvicorn.run(app, host=host, port=7600) diff --git a/apps/listen/pyproject.toml b/apps/listen/pyproject.toml index db61813..8dcef70 100644 --- a/apps/listen/pyproject.toml +++ b/apps/listen/pyproject.toml @@ -7,4 +7,5 @@ dependencies = [ "fastapi", "uvicorn", "pyyaml", + "python-dotenv>=1.2.2", ] diff --git a/apps/listen/uv.lock b/apps/listen/uv.lock index 57a42f7..bcb54ce 100644 --- a/apps/listen/uv.lock +++ b/apps/listen/uv.lock @@ -94,6 +94,7 @@ version = "0.1.0" source = { virtual = "." } dependencies = [ { name = "fastapi" }, + { name = "python-dotenv" }, { name = "pyyaml" }, { name = "uvicorn" }, ] @@ -101,6 +102,7 @@ dependencies = [ [package.metadata] requires-dist = [ { name = "fastapi" }, + { name = "python-dotenv", specifier = ">=1.2.2" }, { name = "pyyaml" }, { name = "uvicorn" }, ] @@ -199,14 +201,6 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/5c/96/5fb7d8c3c17bc8c62fdb031c47d77a1af698f1d7a406b0f79aaa1338f9ad/pydantic_core-2.41.5-cp314-cp314t-win32.whl", hash = "sha256:b4ececa40ac28afa90871c2cc2b9ffd2ff0bf749380fbdf57d165fd23da353aa", size = 1988906 }, { url = "https://files.pythonhosted.org/packages/22/ed/182129d83032702912c2e2d8bbe33c036f342cc735737064668585dac28f/pydantic_core-2.41.5-cp314-cp314t-win_amd64.whl", hash = "sha256:80aa89cad80b32a912a65332f64a4450ed00966111b6615ca6816153d3585a8c", size = 1981607 }, { url = "https://files.pythonhosted.org/packages/9f/ed/068e41660b832bb0b1aa5b58011dea2a3fe0ba7861ff38c4d4904c1c1a99/pydantic_core-2.41.5-cp314-cp314t-win_arm64.whl", hash = "sha256:35b44f37a3199f771c3eaa53051bc8a70cd7b54f333531c59e29fd4db5d15008", size = 1974769 }, - { url = "https://files.pythonhosted.org/packages/11/72/90fda5ee3b97e51c494938a4a44c3a35a9c96c19bba12372fb9c634d6f57/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-macosx_10_12_x86_64.whl", hash = "sha256:b96d5f26b05d03cc60f11a7761a5ded1741da411e7fe0909e27a5e6a0cb7b034", size = 2115441 }, - { url = "https://files.pythonhosted.org/packages/1f/53/8942f884fa33f50794f119012dc6a1a02ac43a56407adaac20463df8e98f/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-macosx_11_0_arm64.whl", hash = "sha256:634e8609e89ceecea15e2d61bc9ac3718caaaa71963717bf3c8f38bfde64242c", size = 1930291 }, - { url = "https://files.pythonhosted.org/packages/79/c8/ecb9ed9cd942bce09fc888ee960b52654fbdbede4ba6c2d6e0d3b1d8b49c/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:93e8740d7503eb008aa2df04d3b9735f845d43ae845e6dcd2be0b55a2da43cd2", size = 1948632 }, - { url = "https://files.pythonhosted.org/packages/2e/1b/687711069de7efa6af934e74f601e2a4307365e8fdc404703afc453eab26/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f15489ba13d61f670dcc96772e733aad1a6f9c429cc27574c6cdaed82d0146ad", size = 2138905 }, - { url = "https://files.pythonhosted.org/packages/09/32/59b0c7e63e277fa7911c2fc70ccfb45ce4b98991e7ef37110663437005af/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:7da7087d756b19037bc2c06edc6c170eeef3c3bafcb8f532ff17d64dc427adfd", size = 2110495 }, - { url = "https://files.pythonhosted.org/packages/aa/81/05e400037eaf55ad400bcd318c05bb345b57e708887f07ddb2d20e3f0e98/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:aabf5777b5c8ca26f7824cb4a120a740c9588ed58df9b2d196ce92fba42ff8dc", size = 1915388 }, - { url = "https://files.pythonhosted.org/packages/6e/0d/e3549b2399f71d56476b77dbf3cf8937cec5cd70536bdc0e374a421d0599/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c007fe8a43d43b3969e8469004e9845944f1a80e6acd47c150856bb87f230c56", size = 1942879 }, - { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017 }, { url = "https://files.pythonhosted.org/packages/5f/9b/1b3f0e9f9305839d7e84912f9e8bfbd191ed1b1ef48083609f0dabde978c/pydantic_core-2.41.5-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:b2379fa7ed44ddecb5bfe4e48577d752db9fc10be00a6b7446e9663ba143de26", size = 2101980 }, { url = "https://files.pythonhosted.org/packages/a4/ed/d71fefcb4263df0da6a85b5d8a7508360f2f2e9b3bf5814be9c8bccdccc1/pydantic_core-2.41.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:266fb4cbf5e3cbd0b53669a6d1b039c45e3ce651fd5442eff4d07c2cc8d66808", size = 1923865 }, { url = "https://files.pythonhosted.org/packages/ce/3a/626b38db460d675f873e4444b4bb030453bbe7b4ba55df821d026a0493c4/pydantic_core-2.41.5-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:58133647260ea01e4d0500089a8c4f07bd7aa6ce109682b1426394988d8aaacc", size = 2134256 }, @@ -217,6 +211,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/36/c7/cfc8e811f061c841d7990b0201912c3556bfeb99cdcb7ed24adc8d6f8704/pydantic_core-2.41.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:56121965f7a4dc965bff783d70b907ddf3d57f6eba29b6d2e5dabfaf07799c51", size = 2145302 }, ] +[[package]] +name = "python-dotenv" +version = "1.2.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/82/ed/0301aeeac3e5353ef3d94b6ec08bbcabd04a72018415dcb29e588514bba8/python_dotenv-1.2.2.tar.gz", hash = "sha256:2c371a91fbd7ba082c2c1dc1f8bf89ca22564a087c2c287cd9b662adde799cf3", size = 50135 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0b/d7/1959b9648791274998a9c3526f6d0ec8fd2233e4d4acce81bbae76b44b2a/python_dotenv-1.2.2-py3-none-any.whl", hash = "sha256:1d8214789a24de455a8b8bd8ae6fe3c6b69a5e3d64aa8a8e5d68e694bbcb285a", size = 22101 }, +] + [[package]] name = "pyyaml" version = "6.0.3" diff --git a/apps/listen/worker.py b/apps/listen/worker.py index 16faadc..9301297 100644 --- a/apps/listen/worker.py +++ b/apps/listen/worker.py @@ -4,10 +4,15 @@ markers, polls for completion, then updates the job YAML. """ +import fcntl import os import re +import shlex +import shutil +import stat import subprocess import sys +import tempfile import time import uuid from datetime import datetime, timezone @@ -17,11 +22,15 @@ SENTINEL_PREFIX = "__JOBDONE_" POLL_INTERVAL = 2.0 +JOB_TIMEOUT_SECONDS = 3600 # 1 hour max per job def _tmux(*args: str, check: bool = True) -> subprocess.CompletedProcess[str]: """Run a tmux command.""" - return subprocess.run(["tmux", *args], capture_output=True, text=True, check=check) + tmux_bin = shutil.which("tmux") + if tmux_bin is None: + raise RuntimeError("tmux not found in PATH") + return subprocess.run([tmux_bin, *args], capture_output=True, text=True, check=check) def _session_exists(name: str) -> bool: @@ -30,20 +39,49 @@ def _session_exists(name: str) -> bool: def _open_terminal(session_name: str, cwd: str) -> None: - """Open a new Terminal.app window with a tmux session attached.""" - tmux_cmd = f"cd '{cwd}' && tmux new-session -A -s {session_name}" - escaped = tmux_cmd.replace("\\", "\\\\").replace('"', '\\"') - subprocess.run( - ["osascript", "-e", f'tell application "Terminal" to do script "{escaped}"'], - capture_output=True, - text=True, + """Open a new Terminal.app window with a tmux session attached. + + Writes a temporary shell script instead of embedding the command + directly in the AppleScript string, eliminating AppleScript injection + via special characters in cwd or session_name. + """ + tmux_bin = shutil.which("tmux") or "tmux" + script_content = ( + "#!/bin/sh\n" + f"cd {shlex.quote(cwd)} && " + f"{shlex.quote(tmux_bin)} new-session -A -s {shlex.quote(session_name)}\n" ) + fd, script_path = tempfile.mkstemp(suffix=".sh", prefix="steer-term-") + try: + os.write(fd, script_content.encode()) + os.close(fd) + os.chmod(script_path, stat.S_IRWXU) # 0o700 — owner-execute only + # script_path is a system-generated path; safe to embed after quote-escaping + escaped_path = script_path.replace("\\", "\\\\").replace('"', '\\"') + subprocess.run( + [ + "osascript", + "-e", 'tell application "Terminal" to activate', + "-e", f'tell application "Terminal" to do script "{escaped_path}"', + ], + capture_output=True, + text=True, + ) + except Exception: + try: + os.close(fd) + except OSError: + pass + raise + # Wait for session to appear deadline = time.monotonic() + 5.0 while time.monotonic() < deadline: if _session_exists(session_name): + Path(script_path).unlink(missing_ok=True) return time.sleep(0.2) + Path(script_path).unlink(missing_ok=True) raise RuntimeError(f"tmux session '{session_name}' did not appear within 5s") @@ -58,83 +96,123 @@ def _capture_pane(session: str) -> str: return result.stdout -def _wait_for_sentinel(session: str, token: str) -> int: - """Poll until sentinel appears. No timeout — waits forever.""" +def _wait_for_sentinel(session: str, token: str, timeout: float = JOB_TIMEOUT_SECONDS) -> int: + """Poll until sentinel appears or timeout is reached.""" pattern = re.compile( rf"^{re.escape(SENTINEL_PREFIX)}{token}:(\d+)\s*$", re.MULTILINE ) - while True: + deadline = time.monotonic() + timeout + while time.monotonic() < deadline: time.sleep(POLL_INTERVAL) captured = _capture_pane(session) match = pattern.search(captured) if match: return int(match.group(1)) + raise TimeoutError(f"Job timed out after {timeout}s") + + +def _read_job_file(job_file: Path) -> dict: + """Read job YAML with shared file lock.""" + with open(job_file) as f: + fcntl.flock(f.fileno(), fcntl.LOCK_SH) + return yaml.safe_load(f) + + +def _write_job_file(job_file: Path, data: dict) -> None: + """Write job YAML with exclusive file lock.""" + with open(job_file, "r+") as f: + fcntl.flock(f.fileno(), fcntl.LOCK_EX) + f.seek(0) + f.truncate() + yaml.dump(data, f, default_flow_style=False, sort_keys=False) def main(): - if len(sys.argv) < 3: - print("Usage: worker.py ") + if len(sys.argv) < 2: + print("Usage: worker.py ") sys.exit(1) job_id = sys.argv[1] - prompt = sys.argv[2] + + # Validate job_id format before any filesystem use + if not re.match(r"^[0-9a-f]{8}$", job_id): + print(f"Invalid job_id format: {job_id}", file=sys.stderr) + sys.exit(1) + + repo_root = Path(__file__).parent.parent.parent + + # Load env vars BEFORE any reference to anthropic_key + env_clean = {k: v for k, v in os.environ.items() if k != "CLAUDECODE"} + os.environ.clear() + os.environ.update(env_clean) + + from dotenv import load_dotenv + load_dotenv(repo_root / ".env") + anthropic_key = os.environ.get("ANTHROPIC_API_KEY") jobs_dir = Path(__file__).parent / "jobs" job_file = jobs_dir / f"{job_id}.yaml" if not job_file.exists(): - print(f"Job file not found: {job_file}") + print(f"Job file not found: {job_file}", file=sys.stderr) sys.exit(1) - repo_root = Path(__file__).parent.parent.parent + # Read prompt from YAML — not from argv (argv is visible in ps aux) + job_data = _read_job_file(job_file) + prompt = job_data.get("prompt", "") + sys_prompt_file = ( repo_root / ".claude" / "agents" / "listen-drive-and-steer-system-prompt.md" ) sys_prompt = sys_prompt_file.read_text().replace("{{JOB_ID}}", job_id) - # Write system prompt to a temp file to avoid shell escaping issues - sys_prompt_tmp = Path(f"/tmp/steer-sysprompt-{job_id}.txt") - sys_prompt_tmp.write_text(sys_prompt) + # Create temp files with O_EXCL + 0o600 (no TOCTOU, owner-only readable) + sys_fd, sys_prompt_path = tempfile.mkstemp(prefix=f"steer-sys-{job_id}-", suffix=".txt") + try: + os.write(sys_fd, sys_prompt.encode()) + finally: + os.close(sys_fd) + os.chmod(sys_prompt_path, stat.S_IRUSR | stat.S_IWUSR) + + prompt_fd, prompt_path = tempfile.mkstemp(prefix=f"steer-prompt-{job_id}-", suffix=".txt") + try: + os.write(prompt_fd, f"/listen-drive-and-steer-user-prompt {prompt}".encode()) + finally: + os.close(prompt_fd) + os.chmod(prompt_path, stat.S_IRUSR | stat.S_IWUSR) - # Write user prompt to a temp file to avoid tmux send-keys truncation - prompt_tmp = Path(f"/tmp/steer-prompt-{job_id}.txt") - prompt_tmp.write_text(f"/listen-drive-and-steer-user-prompt {prompt}") + sys_prompt_tmp = Path(sys_prompt_path) + prompt_tmp = Path(prompt_path) session_name = f"job-{job_id}" token = uuid.uuid4().hex[:8] - # Build the claude command — read prompt from file to avoid truncation + # Build claude command — API key is NOT embedded in the command string claude_cmd = ( f"claude --dangerously-skip-permissions" - f' --append-system-prompt "$(cat {sys_prompt_tmp})"' - f' "$(cat {prompt_tmp})"' + f' --append-system-prompt "$(cat {shlex.quote(sys_prompt_path)})"' + f' "$(cat {shlex.quote(prompt_path)})"' ) - - # Wrap with sentinel: ; echo "__JOBDONE_:$?" wrapped = f'{claude_cmd} ; echo "{SENTINEL_PREFIX}{token}:$?"' start_time = time.time() - # Strip CLAUDECODE from env so nested claude doesn't conflict - env_clean = {k: v for k, v in os.environ.items() if k != "CLAUDECODE"} - os.environ.clear() - os.environ.update(env_clean) - try: - # Open headed Terminal window with tmux session _open_terminal(session_name, str(repo_root)) - # Send the wrapped command + # Inject API key into the tmux session environment — NOT in the command string. + # This prevents the key from appearing in scrollback or capture-pane output. + if anthropic_key: + _tmux("setenv", "-t", session_name, "ANTHROPIC_API_KEY", anthropic_key) + _tmux("setenv", "-t", session_name, "CLAUDE_API_KEY", anthropic_key) + _send_keys(session_name, wrapped) - # Update job with session info - with open(job_file) as f: - data = yaml.safe_load(f) + # Update job with session info (locked write) + data = _read_job_file(job_file) data["session"] = session_name - with open(job_file, "w") as f: - yaml.dump(data, f, default_flow_style=False, sort_keys=False) + _write_job_file(job_file, data) - # Wait for completion — no timeout exit_code = _wait_for_sentinel(session_name, token) except Exception as e: @@ -144,18 +222,13 @@ def main(): duration = round(time.time() - start_time) now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") - with open(job_file) as f: - data = yaml.safe_load(f) - + data = _read_job_file(job_file) data["status"] = "completed" if exit_code == 0 else "failed" data["exit_code"] = exit_code data["duration_seconds"] = duration data["completed_at"] = now + _write_job_file(job_file, data) - with open(job_file, "w") as f: - yaml.dump(data, f, default_flow_style=False, sort_keys=False) - - # Clean up sys_prompt_tmp.unlink(missing_ok=True) prompt_tmp.unlink(missing_ok=True) if _session_exists(session_name): diff --git a/apps/steer/Sources/steer/AccessibilityTree.swift b/apps/steer/Sources/steer/AccessibilityTree.swift index 9e517bb..6250faa 100644 --- a/apps/steer/Sources/steer/AccessibilityTree.swift +++ b/apps/steer/Sources/steer/AccessibilityTree.swift @@ -138,6 +138,6 @@ enum AccessibilityTree { static func axVal(_ el: AXUIElement, _ a: String) -> AXValue? { var v: CFTypeRef? guard AXUIElementCopyAttributeValue(el, a as CFString, &v) == .success else { return nil } - return (v as! AXValue) + return v as? AXValue } } diff --git a/apps/steer/Sources/steer/Clipboard.swift b/apps/steer/Sources/steer/Clipboard.swift index 63dace2..2c5dcc1 100644 --- a/apps/steer/Sources/steer/Clipboard.swift +++ b/apps/steer/Sources/steer/Clipboard.swift @@ -1,6 +1,22 @@ import ArgumentParser import Foundation +/// Escape a string for embedding inside a JSON string literal. +/// Uses JSONSerialization to handle all control characters per the JSON spec. +private func jsonEscape(_ s: String) -> String { + guard let data = try? JSONSerialization.data(withJSONObject: s), + let encoded = String(data: data, encoding: .utf8) else { + // Fallback: manual escaping of the most common characters + return s.replacingOccurrences(of: "\\", with: "\\\\") + .replacingOccurrences(of: "\"", with: "\\\"") + .replacingOccurrences(of: "\n", with: "\\n") + .replacingOccurrences(of: "\r", with: "\\r") + .replacingOccurrences(of: "\t", with: "\\t") + } + // JSONSerialization wraps the value in quotes — strip them + return String(encoded.dropFirst().dropLast()) +} + struct Clipboard: ParsableCommand { static let configuration = CommandConfiguration( abstract: "Read or write the system clipboard." @@ -28,7 +44,7 @@ struct Clipboard: ParsableCommand { case "text": let content = ClipboardControl.readText() if json { - let escaped = (content ?? "").replacingOccurrences(of: "\"", with: "\\\"").replacingOccurrences(of: "\n", with: "\\n") + let escaped = jsonEscape(content ?? "") print("{\"action\":\"read\",\"type\":\"text\",\"content\":\"\(escaped)\",\"ok\":true}") } else { print(content ?? "(clipboard empty)") @@ -45,7 +61,7 @@ struct Clipboard: ParsableCommand { guard let text = text else { throw ValidationError("Provide text to write") } ClipboardControl.writeText(text) if json { - let escaped = text.replacingOccurrences(of: "\"", with: "\\\"").replacingOccurrences(of: "\n", with: "\\n") + let escaped = jsonEscape(text) print("{\"action\":\"write\",\"type\":\"text\",\"content\":\"\(escaped)\",\"ok\":true}") } else { print("Copied to clipboard: \"\(text.prefix(80))\(text.count > 80 ? "..." : "")\"") diff --git a/apps/steer/Sources/steer/Hotkey.swift b/apps/steer/Sources/steer/Hotkey.swift index 47be458..c6ab414 100644 --- a/apps/steer/Sources/steer/Hotkey.swift +++ b/apps/steer/Sources/steer/Hotkey.swift @@ -1,6 +1,18 @@ import ArgumentParser import Foundation +private func jsonEscape(_ s: String) -> String { + guard let data = try? JSONSerialization.data(withJSONObject: s), + let encoded = String(data: data, encoding: .utf8) else { + return s.replacingOccurrences(of: "\\", with: "\\\\") + .replacingOccurrences(of: "\"", with: "\\\"") + .replacingOccurrences(of: "\n", with: "\\n") + .replacingOccurrences(of: "\r", with: "\\r") + .replacingOccurrences(of: "\t", with: "\\t") + } + return String(encoded.dropFirst().dropLast()) +} + struct Hotkey: ParsableCommand { static let configuration = CommandConfiguration( abstract: "Press a key combination: cmd+s, ctrl+c, return, escape, etc." @@ -16,7 +28,8 @@ struct Hotkey: ParsableCommand { Keyboard.hotkey(combo) if json { - print("{\"action\":\"hotkey\",\"combo\":\"\(combo)\",\"ok\":true}") + let escaped = jsonEscape(combo) + print("{\"action\":\"hotkey\",\"combo\":\"\(escaped)\",\"ok\":true}") } else { print("Pressed \(combo)") } diff --git a/apps/steer/Sources/steer/Type.swift b/apps/steer/Sources/steer/Type.swift index 8f6a442..fe0024d 100644 --- a/apps/steer/Sources/steer/Type.swift +++ b/apps/steer/Sources/steer/Type.swift @@ -1,6 +1,18 @@ import ArgumentParser import Foundation +private func jsonEscape(_ s: String) -> String { + guard let data = try? JSONSerialization.data(withJSONObject: s), + let encoded = String(data: data, encoding: .utf8) else { + return s.replacingOccurrences(of: "\\", with: "\\\\") + .replacingOccurrences(of: "\"", with: "\\\"") + .replacingOccurrences(of: "\n", with: "\\n") + .replacingOccurrences(of: "\r", with: "\\r") + .replacingOccurrences(of: "\t", with: "\\t") + } + return String(encoded.dropFirst().dropLast()) +} + struct Type: ParsableCommand { static let configuration = CommandConfiguration( commandName: "type", @@ -40,7 +52,7 @@ struct Type: ParsableCommand { Keyboard.typeText(text) if json { - let escaped = text.replacingOccurrences(of: "\"", with: "\\\"") + let escaped = jsonEscape(text) print("{\"action\":\"type\",\"text\":\"\(escaped)\",\"ok\":true}") } else { print("Typed \"\(text)\"\(into != nil ? " into \(into!)" : "")")