agent-desktop-harness is designed for isolated GUI QA, not unrestricted desktop automation.
The current release candidate is local developer tooling. It is intended to run on a developer-controlled machine or CI worker where the caller already has permission to launch the target app and inspect generated evidence.
The default target is Xvfb sessions created for the harness. The harness must not send input to the user's real desktop. This boundary reduces the risk of accidental clicks, data exposure, credential leakage, and destructive actions in unrelated applications.
Sessions should launch only commands that policy allows. MCP tools and HTTP routes must not accept arbitrary shell strings and execute them directly.
The core launch model uses structured commands:
{
"command": "pnpm",
"args": ["dev"]
}Shell strings are intentionally not accepted:
pnpm dev && arbitrary-commandHarnessPolicy.allowedCommands can restrict launch commands to an explicit allowlist. If no allowlist is provided, app launch is rejected unless allowUnlistedCommandsForLocalDevelopment is explicitly set to true. That flag is unsafe for general use and is intended only for local development experiments.
Each session should run inside an explicit workspace root. Evidence should be written to a predictable session directory. The harness should avoid reading or writing outside the configured workspace and evidence directories unless explicitly allowed.
MCP clients should receive typed GUI QA tools, not a general shell. Shell execution belongs to the calling coding agent's own environment and approval model, not this harness.
The current core implementation uses child_process.spawn(command, args, { shell: false }) for app launch.
The MCP server exposes desktop_launch_app with structured { command, args, cwd, env } input only. The HTTP server exposes POST /sessions/:sessionId/launch with the same structured model. Both delegate policy checks and process launch to the core SessionManager.
The HTTP server is intended for local agent orchestration. It binds to 127.0.0.1 by default and rejects non-loopback bind hosts. Allowed bind hosts are 127.0.0.1, localhost, and ::1. CORS is not enabled by default.
Do not run the HTTP server behind a public proxy without adding an authentication and authorization layer first.
The noVNC live observer is optional. It starts x11vnc against the isolated Xvfb display and serves noVNC through novnc_proxy or websockify.
Security expectations:
- Observer services bind to
127.0.0.1by default. - The MVP rejects non-local observer hosts.
- Do not bind live observer services to
0.0.0.0on shared machines. - Use SSH tunnels for remote viewing.
viewOnlydefaults totrue.- If a password is provided, raw password text must not appear in action logs, HTTP responses, MCP responses, or CLI smoke output.
- Live views may expose sensitive app state, private URLs, source code, tokens, credentials, or customer data.
- Stop observers after use. Stopping a session stops its observers automatically.
The live observer watches the harness-owned Xvfb session. It must not attach to or control the user's real desktop.
MCP tools can launch allowlisted local commands, send input to the isolated Xvfb display, and read evidence paths. Use the MCP server only with trusted local clients and trusted agent workflows.
The MCP server should write protocol data to stdout only. Diagnostic logs should go to stderr so MCP clients do not receive malformed protocol messages.
Screenshots and action logs may contain source code, credentials, private URLs, tokens, customer data, or local file paths. Evidence storage must be treated as sensitive output.
Current safeguards:
- Typed secret redaction support for input text.
typeText({ secret: true })records only a redacted marker and text length. - Clear evidence directory paths.
- Session metadata that records policy configuration.
- User-controlled retention and cleanup.
Secret text redaction only applies when the caller sets secret: true. If secret is omitted or false, typed text may appear in action logs and tool results.
Core input actions use xdotool with the session DISPLAY environment variable. They are scoped to isolated Xvfb sessions and must not target the user's host desktop.
Text input is passed as process arguments, not through shell interpolation. Secret text must be sent with secret: true so the action log does not include raw text.
Future versions should support network policy controls for launched sessions. Potential controls include offline mode, allowlisted hosts, blocked hosts, and per-session network metadata.
Dangerous actions should support human approval gates before execution. Examples include launching non-allowlisted commands, enabling real desktop control in a future version, attaching to external processes, or using broader network access.