Issue imported from tinyhumansai/openhuman#1401
Created at: unknown
Summary
Make agent workspace security real and OS-enforced by turning sandbox_mode = "sandboxed" into an actual jailed execution environment with a gated workspace boundary, starting with macOS and Windows and keeping Linux as the parity path.
Problem
Today OpenHuman has useful application-layer security controls, but the actual sandbox story is incomplete:
- there is a
SecurityPolicy with allowlists, risk gating, rate limits, and workspace-relative filesystem checks
- there is a
Sandbox abstraction with backends for Landlock / Firejail / Bubblewrap / Docker
- agent definitions already declare
sandbox_mode, including code_executor as sandboxed
But the end-to-end isolation boundary is still weaker than it needs to be for an autonomous coding agent.
The main gaps in the current codebase are:
sandbox_mode is mostly a logical/task-local signal for tool behavior, not a guaranteed OS-level jail (src/openhuman/agent/harness/sandbox_context.rs:1-54, src/openhuman/agent/harness/definition.rs:142-145).
code_executor is explicitly marked sandboxed, but that mode still exposes shell, node_exec, npm_exec, git_operations, and file tools without a platform-enforced process jail (src/openhuman/agent/agents/code_executor/agent.toml:1-40).
SandboxBackend only exposes Landlock, Firejail, Bubblewrap, Docker, and None — there is no native macOS or Windows backend in config (src/openhuman/config/schema/channels.rs:300-330).
- auto-detection is Linux-centric, tries Bubblewrap on macOS, and has no Windows-native branch at all; when nothing matches it falls back to
NoopSandbox (src/openhuman/security/detect.rs:7-103).
- the
Sandbox::wrap_command() abstraction appears to be implemented in backend files but not actually wired into runtime command execution; production references are absent outside backend tests (src/openhuman/security/traits.rs, backend implementations, and no runtime callsites to wrap_command).
ShellTool validates command shape and clears the environment, but it builds and runs a native shell command directly; no OS sandbox is applied in the tool path shown here (src/openhuman/tools/impl/system/shell.rs:95-169).
NativeRuntime currently just does sh -lc <command> in the workspace directory, which changes cwd but does not impose filesystem or process isolation (src/openhuman/agent/host_runtime.rs:68-76).
- workspace file tools like
file_read are properly path-scoped and symlink-aware, but those protections do not cover every execution path that can touch the filesystem (src/openhuman/tools/impl/filesystem/file_read.rs:48-110).
That means the repo has a policy layer, but not yet a trustworthy cross-platform execution boundary for “sandboxed” agents.
Why this matters:
- a coding or tool-making agent should not be able to wander arbitrarily outside the workspace just because it has
shell access
- workspace gating should be enforced by the OS, not only by Rust-side path validation and command heuristics
- the project’s docs already set an expectation of workspace-scoped tooling, so the implementation needs to match that promise consistently
Relevant platform references:
Solution
Treat this as “real sandboxing for agent execution,” not just “more path checks.”
Platform direction
macOS
- Add a native macOS sandbox backend based on App Sandbox / Seatbelt semantics for the helper process that executes agent commands.
- The goal is a kernel-enforced boundary around agent-launched processes, with access narrowed to the workspace, required temp/runtime paths, and explicitly granted capabilities only.
- Do not rely on Bubblewrap as the macOS strategy. The repo currently tries that path, but the intended macOS security model should be native.
- If
sandbox-exec is used as an implementation technique for local/dev helper wrapping, keep that an implementation detail; the user-facing goal should be native macOS sandbox enforcement aligned with Apple’s sandbox model for embedded command-line tools.
Windows
- Add a native Windows sandbox backend instead of falling back to no-op.
- Split the problem in two:
- execution boundary: Win32 app isolation / AppContainer-style process isolation, plus Job Objects for process-tree governance and cleanup
- workspace boundary: a projected or virtualized workspace layer so the agent sees only the intended tree by default; ProjFS is the most relevant starting point for a virtual overlay view of the workspace
- If a full native isolated runtime is too large for the first cut, define an incremental path where Windows Sandbox is the heavyweight fallback while the AppContainer/ProjFS path is built out.
Linux
- Use Landlock as the preferred native backend, because it is kernel-level, unprivileged, and already has a foothold in the repo.
- Keep Bubblewrap / Firejail as fallbacks where Landlock is unavailable or the kernel ABI is too old.
- Tighten the existing implementation so sandbox application is child-process-scoped and actually integrated into command execution rather than being a mostly dormant abstraction.
Product / architecture scope
- Make
sandbox_mode = "sandboxed" mean: the agent’s side-effecting tools execute inside an OS-enforced jail, not just that some tools choose to behave conservatively.
- Distinguish
read_only from sandboxed more strongly:
read_only: policy-level denial of write/admin actions
sandboxed: policy-level restrictions plus OS-enforced filesystem/process isolation
- Define a minimal allowed surface for sandboxed execution:
- workspace root (or a projected overlay of it)
- bounded temp/runtime directories
- explicit network mode (
none, restricted, allowed) rather than ambient host access
- resource limits on process tree, wall clock, memory, and child-process spawning where the platform supports it
- Ensure the shell / node / npm / git execution path goes through the sandbox backend, not only direct filesystem tools.
- Add auditable logs that show which backend was selected, which policy profile was applied, and why execution fell back to a weaker mode if it did.
- Expose backend selection and effective isolation mode in settings / diagnostics so users can tell whether they are truly sandboxed.
Rollout suggestion
Phase 1
- Wire the existing sandbox abstraction into actual command execution.
- Add backend observability and explicit “effective isolation” diagnostics.
- Close obvious no-op gaps in
shell, node_exec, and related runtime launchers.
Phase 2
- macOS native backend
- Windows native backend
- upgrade
sandbox_mode = "sandboxed" semantics for agent execution
Phase 3
- Linux hardening and parity cleanup (Landlock-first, Bubblewrap/Firejail fallback)
- network/resource policy tightening
- docs and user-facing trust model cleanup
Acceptance criteria
Related
- Current sandbox backend config:
src/openhuman/config/schema/channels.rs
- Current backend auto-detect:
src/openhuman/security/detect.rs
- Current sandbox trait:
src/openhuman/security/traits.rs
- Current Landlock backend:
src/openhuman/security/landlock.rs
- Current Bubblewrap backend:
src/openhuman/security/bubblewrap.rs
- Current Firejail backend:
src/openhuman/security/firejail.rs
- Current shell execution path:
src/openhuman/tools/impl/system/shell.rs
- Current native runtime launcher:
src/openhuman/agent/host_runtime.rs
- Current agent sandbox mode plumbing:
src/openhuman/agent/harness/sandbox_context.rs
- Code executor definition:
src/openhuman/agent/agents/code_executor/agent.toml
- Workspace-safe file tool example:
src/openhuman/tools/impl/filesystem/file_read.rs
Summary
Make agent workspace security real and OS-enforced by turning
sandbox_mode = "sandboxed"into an actual jailed execution environment with a gated workspace boundary, starting with macOS and Windows and keeping Linux as the parity path.Problem
Today OpenHuman has useful application-layer security controls, but the actual sandbox story is incomplete:
SecurityPolicywith allowlists, risk gating, rate limits, and workspace-relative filesystem checksSandboxabstraction with backends for Landlock / Firejail / Bubblewrap / Dockersandbox_mode, includingcode_executorassandboxedBut the end-to-end isolation boundary is still weaker than it needs to be for an autonomous coding agent.
The main gaps in the current codebase are:
sandbox_modeis mostly a logical/task-local signal for tool behavior, not a guaranteed OS-level jail (src/openhuman/agent/harness/sandbox_context.rs:1-54,src/openhuman/agent/harness/definition.rs:142-145).code_executoris explicitly markedsandboxed, but that mode still exposesshell,node_exec,npm_exec,git_operations, and file tools without a platform-enforced process jail (src/openhuman/agent/agents/code_executor/agent.toml:1-40).SandboxBackendonly exposesLandlock,Firejail,Bubblewrap,Docker, andNone— there is no native macOS or Windows backend in config (src/openhuman/config/schema/channels.rs:300-330).NoopSandbox(src/openhuman/security/detect.rs:7-103).Sandbox::wrap_command()abstraction appears to be implemented in backend files but not actually wired into runtime command execution; production references are absent outside backend tests (src/openhuman/security/traits.rs, backend implementations, and no runtime callsites towrap_command).ShellToolvalidates command shape and clears the environment, but it builds and runs a native shell command directly; no OS sandbox is applied in the tool path shown here (src/openhuman/tools/impl/system/shell.rs:95-169).NativeRuntimecurrently just doessh -lc <command>in the workspace directory, which changes cwd but does not impose filesystem or process isolation (src/openhuman/agent/host_runtime.rs:68-76).file_readare properly path-scoped and symlink-aware, but those protections do not cover every execution path that can touch the filesystem (src/openhuman/tools/impl/filesystem/file_read.rs:48-110).That means the repo has a policy layer, but not yet a trustworthy cross-platform execution boundary for “sandboxed” agents.
Why this matters:
shellaccessRelevant platform references:
Solution
Treat this as “real sandboxing for agent execution,” not just “more path checks.”
Platform direction
macOS
sandbox-execis used as an implementation technique for local/dev helper wrapping, keep that an implementation detail; the user-facing goal should be native macOS sandbox enforcement aligned with Apple’s sandbox model for embedded command-line tools.Windows
Linux
Product / architecture scope
sandbox_mode = "sandboxed"mean: the agent’s side-effecting tools execute inside an OS-enforced jail, not just that some tools choose to behave conservatively.read_onlyfromsandboxedmore strongly:read_only: policy-level denial of write/admin actionssandboxed: policy-level restrictions plus OS-enforced filesystem/process isolationnone,restricted,allowed) rather than ambient host accessRollout suggestion
Phase 1
shell,node_exec, and related runtime launchers.Phase 2
sandbox_mode = "sandboxed"semantics for agent executionPhase 3
Acceptance criteria
Sandboxed agents are actually jailed — an agent definition with
sandbox_mode = "sandboxed"executes commands inside an OS-enforced isolation boundary, not just a logical mode flag.macOS has a native backend — OpenHuman supports a macOS-specific sandbox backend aligned with App Sandbox / Seatbelt semantics for agent-launched helper processes.
Windows has a native backend — OpenHuman supports a Windows-specific isolation path for agent execution, with a clear process boundary and a gated workspace view.
Linux has a documented native strategy — Landlock is the preferred Linux path, with explicit fallback behavior when unavailable.
Sandbox abstraction is wired into execution — shell / node / npm / git / similar process-launching tools route through the selected sandbox backend instead of bypassing it.
Workspace access is consistently gated — sandboxed execution cannot read or mutate arbitrary host paths outside the intended workspace surface, even if the tool path is shell-based rather than a native file helper.
Fallbacks are explicit and visible — if the host cannot provide the requested sandbox backend, the app reports that clearly in diagnostics and logs instead of silently behaving as fully sandboxed.
Security tests exist per platform path — add/update tests that verify sandboxed execution is confined to the workspace surface and that escape attempts fail in the expected way.
Docs match behavior — privacy/security and native-tool docs clearly explain what is enforced on each platform, including any degraded/fallback mode.
Diff coverage ≥ 80% — the implementing PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by
.github/workflows/coverage.yml).Example verify-before-close scenario: run a
sandboxedcode executor on macOS or Windows, allow it to edit and test a repo inside the workspace, and confirm that direct attempts to read or mutate outside-host paths fail at the OS boundary even when attempted throughshell.Related
src/openhuman/config/schema/channels.rssrc/openhuman/security/detect.rssrc/openhuman/security/traits.rssrc/openhuman/security/landlock.rssrc/openhuman/security/bubblewrap.rssrc/openhuman/security/firejail.rssrc/openhuman/tools/impl/system/shell.rssrc/openhuman/agent/host_runtime.rssrc/openhuman/agent/harness/sandbox_context.rssrc/openhuman/agent/agents/code_executor/agent.tomlsrc/openhuman/tools/impl/filesystem/file_read.rs