Skip to content

fix(agent): make agent /tmp tmpfs size configurable via AGENT_TMP_SIZE (#1231)#1233

Open
dolho wants to merge 1 commit into
devfrom
fix/1231-agent-tmp-size-configurable
Open

fix(agent): make agent /tmp tmpfs size configurable via AGENT_TMP_SIZE (#1231)#1233
dolho wants to merge 1 commit into
devfrom
fix/1231-agent-tmp-size-configurable

Conversation

@dolho

@dolho dolho commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Problem

Agent containers mount /tmp as a hardcoded 100 MB noexec,nosuid RAM-backed tmpfs (capabilities.py). It fills easily — e.g. leftover gh CLI install artifacts (~38 MB) plus normal scratch. Once full, every /tmp write fails with No space left on device, including git's commit scratch, so autonomous scheduled runs "complete" but silently fail to persist (commit/push). Because the size was a literal, operators couldn't tune it without a code change + base-image rebuild. Fleet-wide (base image).

#1098 already redirects $TMPDIR-honoring tools off /tmp, but install scripts that hardcode /tmp (the gh case) bypass it and still exhaust the cap.

Fix

  • capabilities.pyAGENT_TMPFS_MOUNT size is now read from AGENT_TMP_SIZE (env on the backend service, which builds the agent mount spec), default 512m, validated ^\d+[mg]$ with empty/invalid → default. noexec,nosuid stay hardcoded — only size is configurable, and it stays bounded (tmpfs counts against the agent memory cgroup). Single source of truth → create (crud.py) and recreate (lifecycle.py) can't drift.
  • Wire AGENT_TMP_SIZE=${AGENT_TMP_SIZE:-512m} on the backend service in both docker-compose.yml and docker-compose.prod.yml; document in .env.example.
  • architecture.md Container Security — note the now-configurable size.
AGENT_TMP_SIZE result
unset / "" / 512 / 512Mi / 0.5g → default 512m
256m, 2g, 1G → used (1G1g)

Acceptance criteria

  • size from AGENT_TMP_SIZE, default 512m, single source of truth (capabilities.py)
  • noexec,nosuid remain hardcoded — only size configurable
  • value validated (^\d+[mg]$); empty/invalid → default (never broken/unbounded)
  • documented in .env.example + both compose files
  • Container Security section of architecture.md updated
  • deploy-time: verify inside a freshly recreated agent with df -h /tmp (mount specs are creation-time — existing agents pick up the new size on recreate, not restart)

Verification

tests/unit/test_1231_agent_tmp_size.py — default / valid m,g / case-fold / invalid→default / security-flags-never-dropped. Full unit run 22 passed (new + existing test_capability_set). Both compose files parse with AGENT_TMP_SIZE present; py_compile clean.

Out of scope (separate abilities-repo issue): the agent-side git-sync hook that reports success while silently failing to persist. Full /tmp is only the trigger.

Related to #1231

🤖 Generated with Claude Code

#1231)

Agent containers mounted /tmp as a hardcoded 100 MB noexec,nosuid RAM-backed
tmpfs. It fills easily — e.g. `gh` CLI install artifacts (~38 MB) that hardcode
/tmp and bypass the #1098 TMPDIR redirect — after which every /tmp write fails
with "No space left on device", including git's commit scratch, so autonomous
scheduled runs "complete" but silently fail to persist. The size being a
literal meant operators couldn't tune it without a code change + base-image
rebuild.

- capabilities.py: AGENT_TMPFS_MOUNT size now read from AGENT_TMP_SIZE (env on
  the backend service, which builds the agent mount spec), default 512m,
  validated `^\d+[mg]$` with empty/invalid → default. noexec,nosuid stay
  hardcoded — only size is configurable, and it stays bounded (counts against
  the agent memory cgroup). Single source of truth, so create (crud.py) and
  recreate (lifecycle.py) can't drift.
- Wire AGENT_TMP_SIZE=${AGENT_TMP_SIZE:-512m} on the backend service in both
  docker-compose.yml and docker-compose.prod.yml; document in .env.example.
- architecture.md Container Security: note the now-configurable size.
- tests/unit/test_1231_agent_tmp_size.py: default, valid m/g, case-fold,
  invalid→default, and the security flags are never dropped.

Mount specs are creation-time: existing agents pick up a new size on recreate,
not restart. Builds on #1098 (TMPDIR redirect) — closes the gap for tools that
hardcode /tmp. The agent-side git-sync silent-persist-failure is a separate
issue in the abilities repo, per the ticket.

Related to #1231
@github-actions

Copy link
Copy Markdown

⚠️ Nightly unit-suite check skipped — merge conflict against dev.

Resolve by running git merge dev locally and pushing the result. The next nightly run will re-test once the conflict is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant