Skip to content

Make sandboxed agents use real OS-enforced workspace isolation #106

@ElioNeto

Description

@ElioNeto

Issue imported from tinyhumansai/openhuman#1401
Created at: unknown


Summary

Make agent workspace security real and OS-enforced by turning sandbox_mode = "sandboxed" into an actual jailed execution environment with a gated workspace boundary, starting with macOS and Windows and keeping Linux as the parity path.

Problem

Today OpenHuman has useful application-layer security controls, but the actual sandbox story is incomplete:

  • there is a SecurityPolicy with allowlists, risk gating, rate limits, and workspace-relative filesystem checks
  • there is a Sandbox abstraction with backends for Landlock / Firejail / Bubblewrap / Docker
  • agent definitions already declare sandbox_mode, including code_executor as sandboxed

But the end-to-end isolation boundary is still weaker than it needs to be for an autonomous coding agent.

The main gaps in the current codebase are:

  • sandbox_mode is mostly a logical/task-local signal for tool behavior, not a guaranteed OS-level jail (src/openhuman/agent/harness/sandbox_context.rs:1-54, src/openhuman/agent/harness/definition.rs:142-145).
  • code_executor is explicitly marked sandboxed, but that mode still exposes shell, node_exec, npm_exec, git_operations, and file tools without a platform-enforced process jail (src/openhuman/agent/agents/code_executor/agent.toml:1-40).
  • SandboxBackend only exposes Landlock, Firejail, Bubblewrap, Docker, and None — there is no native macOS or Windows backend in config (src/openhuman/config/schema/channels.rs:300-330).
  • auto-detection is Linux-centric, tries Bubblewrap on macOS, and has no Windows-native branch at all; when nothing matches it falls back to NoopSandbox (src/openhuman/security/detect.rs:7-103).
  • the Sandbox::wrap_command() abstraction appears to be implemented in backend files but not actually wired into runtime command execution; production references are absent outside backend tests (src/openhuman/security/traits.rs, backend implementations, and no runtime callsites to wrap_command).
  • ShellTool validates command shape and clears the environment, but it builds and runs a native shell command directly; no OS sandbox is applied in the tool path shown here (src/openhuman/tools/impl/system/shell.rs:95-169).
  • NativeRuntime currently just does sh -lc <command> in the workspace directory, which changes cwd but does not impose filesystem or process isolation (src/openhuman/agent/host_runtime.rs:68-76).
  • workspace file tools like file_read are properly path-scoped and symlink-aware, but those protections do not cover every execution path that can touch the filesystem (src/openhuman/tools/impl/filesystem/file_read.rs:48-110).

That means the repo has a policy layer, but not yet a trustworthy cross-platform execution boundary for “sandboxed” agents.

Why this matters:

  • a coding or tool-making agent should not be able to wander arbitrarily outside the workspace just because it has shell access
  • workspace gating should be enforced by the OS, not only by Rust-side path validation and command heuristics
  • the project’s docs already set an expectation of workspace-scoped tooling, so the implementation needs to match that promise consistently

Relevant platform references:

Solution

Treat this as “real sandboxing for agent execution,” not just “more path checks.”

Platform direction

macOS

  • Add a native macOS sandbox backend based on App Sandbox / Seatbelt semantics for the helper process that executes agent commands.
  • The goal is a kernel-enforced boundary around agent-launched processes, with access narrowed to the workspace, required temp/runtime paths, and explicitly granted capabilities only.
  • Do not rely on Bubblewrap as the macOS strategy. The repo currently tries that path, but the intended macOS security model should be native.
  • If sandbox-exec is used as an implementation technique for local/dev helper wrapping, keep that an implementation detail; the user-facing goal should be native macOS sandbox enforcement aligned with Apple’s sandbox model for embedded command-line tools.

Windows

  • Add a native Windows sandbox backend instead of falling back to no-op.
  • Split the problem in two:
    • execution boundary: Win32 app isolation / AppContainer-style process isolation, plus Job Objects for process-tree governance and cleanup
    • workspace boundary: a projected or virtualized workspace layer so the agent sees only the intended tree by default; ProjFS is the most relevant starting point for a virtual overlay view of the workspace
  • If a full native isolated runtime is too large for the first cut, define an incremental path where Windows Sandbox is the heavyweight fallback while the AppContainer/ProjFS path is built out.

Linux

  • Use Landlock as the preferred native backend, because it is kernel-level, unprivileged, and already has a foothold in the repo.
  • Keep Bubblewrap / Firejail as fallbacks where Landlock is unavailable or the kernel ABI is too old.
  • Tighten the existing implementation so sandbox application is child-process-scoped and actually integrated into command execution rather than being a mostly dormant abstraction.

Product / architecture scope

  • Make sandbox_mode = "sandboxed" mean: the agent’s side-effecting tools execute inside an OS-enforced jail, not just that some tools choose to behave conservatively.
  • Distinguish read_only from sandboxed more strongly:
    • read_only: policy-level denial of write/admin actions
    • sandboxed: policy-level restrictions plus OS-enforced filesystem/process isolation
  • Define a minimal allowed surface for sandboxed execution:
    • workspace root (or a projected overlay of it)
    • bounded temp/runtime directories
    • explicit network mode (none, restricted, allowed) rather than ambient host access
    • resource limits on process tree, wall clock, memory, and child-process spawning where the platform supports it
  • Ensure the shell / node / npm / git execution path goes through the sandbox backend, not only direct filesystem tools.
  • Add auditable logs that show which backend was selected, which policy profile was applied, and why execution fell back to a weaker mode if it did.
  • Expose backend selection and effective isolation mode in settings / diagnostics so users can tell whether they are truly sandboxed.

Rollout suggestion

Phase 1

  • Wire the existing sandbox abstraction into actual command execution.
  • Add backend observability and explicit “effective isolation” diagnostics.
  • Close obvious no-op gaps in shell, node_exec, and related runtime launchers.

Phase 2

  • macOS native backend
  • Windows native backend
  • upgrade sandbox_mode = "sandboxed" semantics for agent execution

Phase 3

  • Linux hardening and parity cleanup (Landlock-first, Bubblewrap/Firejail fallback)
  • network/resource policy tightening
  • docs and user-facing trust model cleanup

Acceptance criteria

  • Sandboxed agents are actually jailed — an agent definition with sandbox_mode = "sandboxed" executes commands inside an OS-enforced isolation boundary, not just a logical mode flag.

  • macOS has a native backend — OpenHuman supports a macOS-specific sandbox backend aligned with App Sandbox / Seatbelt semantics for agent-launched helper processes.

  • Windows has a native backend — OpenHuman supports a Windows-specific isolation path for agent execution, with a clear process boundary and a gated workspace view.

  • Linux has a documented native strategy — Landlock is the preferred Linux path, with explicit fallback behavior when unavailable.

  • Sandbox abstraction is wired into execution — shell / node / npm / git / similar process-launching tools route through the selected sandbox backend instead of bypassing it.

  • Workspace access is consistently gated — sandboxed execution cannot read or mutate arbitrary host paths outside the intended workspace surface, even if the tool path is shell-based rather than a native file helper.

  • Fallbacks are explicit and visible — if the host cannot provide the requested sandbox backend, the app reports that clearly in diagnostics and logs instead of silently behaving as fully sandboxed.

  • Security tests exist per platform path — add/update tests that verify sandboxed execution is confined to the workspace surface and that escape attempts fail in the expected way.

  • Docs match behavior — privacy/security and native-tool docs clearly explain what is enforced on each platform, including any degraded/fallback mode.

  • Diff coverage ≥ 80% — the implementing PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by .github/workflows/coverage.yml).

  • Example verify-before-close scenario: run a sandboxed code executor on macOS or Windows, allow it to edit and test a repo inside the workspace, and confirm that direct attempts to read or mutate outside-host paths fail at the OS boundary even when attempted through shell.

Related

  • Current sandbox backend config: src/openhuman/config/schema/channels.rs
  • Current backend auto-detect: src/openhuman/security/detect.rs
  • Current sandbox trait: src/openhuman/security/traits.rs
  • Current Landlock backend: src/openhuman/security/landlock.rs
  • Current Bubblewrap backend: src/openhuman/security/bubblewrap.rs
  • Current Firejail backend: src/openhuman/security/firejail.rs
  • Current shell execution path: src/openhuman/tools/impl/system/shell.rs
  • Current native runtime launcher: src/openhuman/agent/host_runtime.rs
  • Current agent sandbox mode plumbing: src/openhuman/agent/harness/sandbox_context.rs
  • Code executor definition: src/openhuman/agent/agents/code_executor/agent.toml
  • Workspace-safe file tool example: src/openhuman/tools/impl/filesystem/file_read.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentBuilt-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/.rust-coreCore Rust runtime in src/: CLI, core_server, shared infrastructure.securitySecurity, encryption, approvals, credentials, and trust boundaries.security-workSecurity hardening, vulnerabilities, or policy work.taskWork item that is not primarily a bug or a feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions