Summary
Agent containers mount /tmp as a 100 MB noexec,nosuid RAM-backed tmpfs, hardcoded in src/backend/services/agent_service/capabilities.py (AGENT_TMPFS_MOUNT = {'/tmp': 'noexec,nosuid,size=100m'}). It fills easily — e.g. leftover gh CLI install artifacts (gh.tar.gz + extracted dir, ~38 MB) plus normal scratch usage. Once full, every /tmp write fails with No space left on device, including git's scratch during commit. Autonomous scheduled runs then "complete" but fail to persist (commit/push), leaving the run's output as stranded working-tree drift.
Because the size is a hardcoded literal, operators can't tune it without a code change + base-image rebuild. We should make the size an instance-level env var (deployment-tier config) with a safe default.
Context
Reported on a production agent (base image v0.6.0); the cause is in the base image, so it is fleet-wide.
#1098 already redirects heavy build scratch off /tmp via TMPDIR=/home/developer/.tmp (disk-backed, exec-capable), injected at create (crud.py) and recreate (lifecycle.py), dir created by docker/base-image/startup.sh. But TMPDIR only helps tools that honor $TMPDIR — install scripts that hardcode /tmp (the gh case here) bypass it entirely and still exhaust the 100 MB cap.
Diagnosis is hard because No space left on device points at the disk (which is ~42% full, looks fine via df -h /) rather than the tiny tmpfs (df -h /tmp).
The tmpfs is RAM-backed and noexec,nosuid by deliberate security design (a compromised agent can't stage/execute payloads there). That posture must be preserved — only the size should be tunable, and it must stay bounded since tmpfs size counts against the container memory cgroup.
Acceptance Criteria
Technical Notes
- Mount specs are creation-time: existing agents pick up a new size only on recreate, not restart. The env var lives on the
backend service (which builds the mount spec), not inside agents.
- Builds on
#1098 (TMPDIR redirect) — this closes the remaining gap for tools that hardcode /tmp.
- Out of scope (separate issue, agent-side
abilities repo): the git-sync stash → rebase → stash-pop hook leaves a permanent UU conflict and the autonomous run reports success while silently failing to persist. The full /tmp is only the trigger; the silent-failure-to-persist is a distinct reliability hole that lives in the agent's git-sync hook, not Trinity core.
Summary
Agent containers mount
/tmpas a 100 MBnoexec,nosuidRAM-backed tmpfs, hardcoded insrc/backend/services/agent_service/capabilities.py(AGENT_TMPFS_MOUNT = {'/tmp': 'noexec,nosuid,size=100m'}). It fills easily — e.g. leftoverghCLI install artifacts (gh.tar.gz+ extracted dir, ~38 MB) plus normal scratch usage. Once full, every/tmpwrite fails withNo space left on device, including git's scratch during commit. Autonomous scheduled runs then "complete" but fail to persist (commit/push), leaving the run's output as stranded working-tree drift.Because the size is a hardcoded literal, operators can't tune it without a code change + base-image rebuild. We should make the size an instance-level env var (deployment-tier config) with a safe default.
Context
Reported on a production agent (base image v0.6.0); the cause is in the base image, so it is fleet-wide.
#1098already redirects heavy build scratch off/tmpviaTMPDIR=/home/developer/.tmp(disk-backed, exec-capable), injected at create (crud.py) and recreate (lifecycle.py), dir created bydocker/base-image/startup.sh. ButTMPDIRonly helps tools that honor$TMPDIR— install scripts that hardcode/tmp(theghcase here) bypass it entirely and still exhaust the 100 MB cap.Diagnosis is hard because
No space left on devicepoints at the disk (which is ~42% full, looks fine viadf -h /) rather than the tiny tmpfs (df -h /tmp).The tmpfs is RAM-backed and
noexec,nosuidby deliberate security design (a compromised agent can't stage/execute payloads there). That posture must be preserved — only the size should be tunable, and it must stay bounded since tmpfs size counts against the container memory cgroup.Acceptance Criteria
/tmptmpfs size is read from a backend env var (e.g.AGENT_TMP_SIZE), defaulting to512m, in the single source of truth (capabilities.pyAGENT_TMPFS_MOUNT) so create + recreate paths can't driftnoexec,nosuidflags remain hardcoded — only size is configurable^\d+[mg]$); empty/invalid falls back to the default rather than producing a broken or unbounded mount specAGENT_TMP_SIZEdocumented in.env.exampleanddocker-compose(passes/validate-config)df -h /tmpdocs/memory/architecture.mdupdated to reflect the now-configurable sizeTechnical Notes
backendservice (which builds the mount spec), not inside agents.#1098(TMPDIR redirect) — this closes the remaining gap for tools that hardcode/tmp.abilitiesrepo): the git-syncstash → rebase → stash-pophook leaves a permanentUUconflict and the autonomous run reports success while silently failing to persist. The full/tmpis only the trigger; the silent-failure-to-persist is a distinct reliability hole that lives in the agent's git-sync hook, not Trinity core.