Make sandboxed agents use real OS-enforced workspace isolation

> *Issue imported from [tinyhumansai/openhuman#1401](https://github.com/tinyhumansai/openhuman/issues/1401)*
> *Created at: unknown*

---

## Summary

Make agent workspace security real and OS-enforced by turning `sandbox_mode = "sandboxed"` into an actual jailed execution environment with a gated workspace boundary, starting with macOS and Windows and keeping Linux as the parity path.

## Problem

Today OpenHuman has useful application-layer security controls, but the actual sandbox story is incomplete:
- there is a `SecurityPolicy` with allowlists, risk gating, rate limits, and workspace-relative filesystem checks
- there is a `Sandbox` abstraction with backends for Landlock / Firejail / Bubblewrap / Docker
- agent definitions already declare `sandbox_mode`, including `code_executor` as `sandboxed`

But the end-to-end isolation boundary is still weaker than it needs to be for an autonomous coding agent.

The main gaps in the current codebase are:
- `sandbox_mode` is mostly a logical/task-local signal for tool behavior, not a guaranteed OS-level jail (`src/openhuman/agent/harness/sandbox_context.rs:1-54`, `src/openhuman/agent/harness/definition.rs:142-145`).
- `code_executor` is explicitly marked `sandboxed`, but that mode still exposes `shell`, `node_exec`, `npm_exec`, `git_operations`, and file tools without a platform-enforced process jail (`src/openhuman/agent/agents/code_executor/agent.toml:1-40`).
- `SandboxBackend` only exposes `Landlock`, `Firejail`, `Bubblewrap`, `Docker`, and `None` — there is no native macOS or Windows backend in config (`src/openhuman/config/schema/channels.rs:300-330`).
- auto-detection is Linux-centric, tries Bubblewrap on macOS, and has no Windows-native branch at all; when nothing matches it falls back to `NoopSandbox` (`src/openhuman/security/detect.rs:7-103`).
- the `Sandbox::wrap_command()` abstraction appears to be implemented in backend files but not actually wired into runtime command execution; production references are absent outside backend tests (`src/openhuman/security/traits.rs`, backend implementations, and no runtime callsites to `wrap_command`).
- `ShellTool` validates command shape and clears the environment, but it builds and runs a native shell command directly; no OS sandbox is applied in the tool path shown here (`src/openhuman/tools/impl/system/shell.rs:95-169`).
- `NativeRuntime` currently just does `sh -lc <command>` in the workspace directory, which changes cwd but does not impose filesystem or process isolation (`src/openhuman/agent/host_runtime.rs:68-76`).
- workspace file tools like `file_read` are properly path-scoped and symlink-aware, but those protections do not cover every execution path that can touch the filesystem (`src/openhuman/tools/impl/filesystem/file_read.rs:48-110`).

That means the repo has a policy layer, but not yet a trustworthy cross-platform execution boundary for “sandboxed” agents.

Why this matters:
- a coding or tool-making agent should not be able to wander arbitrarily outside the workspace just because it has `shell` access
- workspace gating should be enforced by the OS, not only by Rust-side path validation and command heuristics
- the project’s docs already set an expectation of workspace-scoped tooling, so the implementation needs to match that promise consistently

Relevant platform references:
- Apple App Sandbox / helper tools: https://developer.apple.com/documentation/xcode/configuring-the-macos-app-sandbox and https://developer.apple.com/documentation/security/protecting-user-data-with-app-sandbox
- Windows Win32 app isolation: https://learn.microsoft.com/en-us/windows/win32/secauthz/app-isolation-overview
- Windows ProjFS: https://learn.microsoft.com/en-us/windows/win32/projfs/projected-file-system
- Windows Sandbox config / mapped folder semantics: https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-configure-using-wsb-file
- Linux Landlock: https://www.kernel.org/doc/html/latest/userspace-api/landlock.html

## Solution

Treat this as “real sandboxing for agent execution,” not just “more path checks.”

### Platform direction

macOS
- Add a native macOS sandbox backend based on App Sandbox / Seatbelt semantics for the helper process that executes agent commands.
- The goal is a kernel-enforced boundary around agent-launched processes, with access narrowed to the workspace, required temp/runtime paths, and explicitly granted capabilities only.
- Do not rely on Bubblewrap as the macOS strategy. The repo currently tries that path, but the intended macOS security model should be native.
- If `sandbox-exec` is used as an implementation technique for local/dev helper wrapping, keep that an implementation detail; the user-facing goal should be native macOS sandbox enforcement aligned with Apple’s sandbox model for embedded command-line tools.

Windows
- Add a native Windows sandbox backend instead of falling back to no-op.
- Split the problem in two:
  - execution boundary: Win32 app isolation / AppContainer-style process isolation, plus Job Objects for process-tree governance and cleanup
  - workspace boundary: a projected or virtualized workspace layer so the agent sees only the intended tree by default; ProjFS is the most relevant starting point for a virtual overlay view of the workspace
- If a full native isolated runtime is too large for the first cut, define an incremental path where Windows Sandbox is the heavyweight fallback while the AppContainer/ProjFS path is built out.

Linux
- Use Landlock as the preferred native backend, because it is kernel-level, unprivileged, and already has a foothold in the repo.
- Keep Bubblewrap / Firejail as fallbacks where Landlock is unavailable or the kernel ABI is too old.
- Tighten the existing implementation so sandbox application is child-process-scoped and actually integrated into command execution rather than being a mostly dormant abstraction.

### Product / architecture scope

- Make `sandbox_mode = "sandboxed"` mean: the agent’s side-effecting tools execute inside an OS-enforced jail, not just that some tools choose to behave conservatively.
- Distinguish `read_only` from `sandboxed` more strongly:
  - `read_only`: policy-level denial of write/admin actions
  - `sandboxed`: policy-level restrictions plus OS-enforced filesystem/process isolation
- Define a minimal allowed surface for sandboxed execution:
  - workspace root (or a projected overlay of it)
  - bounded temp/runtime directories
  - explicit network mode (`none`, `restricted`, `allowed`) rather than ambient host access
  - resource limits on process tree, wall clock, memory, and child-process spawning where the platform supports it
- Ensure the shell / node / npm / git execution path goes through the sandbox backend, not only direct filesystem tools.
- Add auditable logs that show which backend was selected, which policy profile was applied, and why execution fell back to a weaker mode if it did.
- Expose backend selection and effective isolation mode in settings / diagnostics so users can tell whether they are truly sandboxed.

### Rollout suggestion

Phase 1
- Wire the existing sandbox abstraction into actual command execution.
- Add backend observability and explicit “effective isolation” diagnostics.
- Close obvious no-op gaps in `shell`, `node_exec`, and related runtime launchers.

Phase 2
- macOS native backend
- Windows native backend
- upgrade `sandbox_mode = "sandboxed"` semantics for agent execution

Phase 3
- Linux hardening and parity cleanup (Landlock-first, Bubblewrap/Firejail fallback)
- network/resource policy tightening
- docs and user-facing trust model cleanup

## Acceptance criteria

- [ ] **Sandboxed agents are actually jailed** — an agent definition with `sandbox_mode = "sandboxed"` executes commands inside an OS-enforced isolation boundary, not just a logical mode flag.
- [ ] **macOS has a native backend** — OpenHuman supports a macOS-specific sandbox backend aligned with App Sandbox / Seatbelt semantics for agent-launched helper processes.
- [ ] **Windows has a native backend** — OpenHuman supports a Windows-specific isolation path for agent execution, with a clear process boundary and a gated workspace view.
- [ ] **Linux has a documented native strategy** — Landlock is the preferred Linux path, with explicit fallback behavior when unavailable.
- [ ] **Sandbox abstraction is wired into execution** — shell / node / npm / git / similar process-launching tools route through the selected sandbox backend instead of bypassing it.
- [ ] **Workspace access is consistently gated** — sandboxed execution cannot read or mutate arbitrary host paths outside the intended workspace surface, even if the tool path is shell-based rather than a native file helper.
- [ ] **Fallbacks are explicit and visible** — if the host cannot provide the requested sandbox backend, the app reports that clearly in diagnostics and logs instead of silently behaving as fully sandboxed.
- [ ] **Security tests exist per platform path** — add/update tests that verify sandboxed execution is confined to the workspace surface and that escape attempts fail in the expected way.
- [ ] **Docs match behavior** — privacy/security and native-tool docs clearly explain what is enforced on each platform, including any degraded/fallback mode.
- [ ] **Diff coverage ≥ 80%** — the implementing PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by [`.github/workflows/coverage.yml`](../../.github/workflows/coverage.yml)).

- Example verify-before-close scenario: run a `sandboxed` code executor on macOS or Windows, allow it to edit and test a repo inside the workspace, and confirm that direct attempts to read or mutate outside-host paths fail at the OS boundary even when attempted through `shell`.

## Related

- Current sandbox backend config: `src/openhuman/config/schema/channels.rs`
- Current backend auto-detect: `src/openhuman/security/detect.rs`
- Current sandbox trait: `src/openhuman/security/traits.rs`
- Current Landlock backend: `src/openhuman/security/landlock.rs`
- Current Bubblewrap backend: `src/openhuman/security/bubblewrap.rs`
- Current Firejail backend: `src/openhuman/security/firejail.rs`
- Current shell execution path: `src/openhuman/tools/impl/system/shell.rs`
- Current native runtime launcher: `src/openhuman/agent/host_runtime.rs`
- Current agent sandbox mode plumbing: `src/openhuman/agent/harness/sandbox_context.rs`
- Code executor definition: `src/openhuman/agent/agents/code_executor/agent.toml`
- Workspace-safe file tool example: `src/openhuman/tools/impl/filesystem/file_read.rs`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sandboxed agents use real OS-enforced workspace isolation #106

Summary

Problem

Solution

Platform direction

Product / architecture scope

Rollout suggestion

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Make sandboxed agents use real OS-enforced workspace isolation #106

Description

Summary

Problem

Solution

Platform direction

Product / architecture scope

Rollout suggestion

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions