fix(agent): make agent /tmp tmpfs size configurable via AGENT_TMP_SIZE (#1231)#1233
Open
dolho wants to merge 1 commit into
Open
fix(agent): make agent /tmp tmpfs size configurable via AGENT_TMP_SIZE (#1231)#1233dolho wants to merge 1 commit into
dolho wants to merge 1 commit into
Conversation
#1231) Agent containers mounted /tmp as a hardcoded 100 MB noexec,nosuid RAM-backed tmpfs. It fills easily — e.g. `gh` CLI install artifacts (~38 MB) that hardcode /tmp and bypass the #1098 TMPDIR redirect — after which every /tmp write fails with "No space left on device", including git's commit scratch, so autonomous scheduled runs "complete" but silently fail to persist. The size being a literal meant operators couldn't tune it without a code change + base-image rebuild. - capabilities.py: AGENT_TMPFS_MOUNT size now read from AGENT_TMP_SIZE (env on the backend service, which builds the agent mount spec), default 512m, validated `^\d+[mg]$` with empty/invalid → default. noexec,nosuid stay hardcoded — only size is configurable, and it stays bounded (counts against the agent memory cgroup). Single source of truth, so create (crud.py) and recreate (lifecycle.py) can't drift. - Wire AGENT_TMP_SIZE=${AGENT_TMP_SIZE:-512m} on the backend service in both docker-compose.yml and docker-compose.prod.yml; document in .env.example. - architecture.md Container Security: note the now-configurable size. - tests/unit/test_1231_agent_tmp_size.py: default, valid m/g, case-fold, invalid→default, and the security flags are never dropped. Mount specs are creation-time: existing agents pick up a new size on recreate, not restart. Builds on #1098 (TMPDIR redirect) — closes the gap for tools that hardcode /tmp. The agent-side git-sync silent-persist-failure is a separate issue in the abilities repo, per the ticket. Related to #1231
|
Resolve by running |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Agent containers mount
/tmpas a hardcoded 100 MBnoexec,nosuidRAM-backed tmpfs (capabilities.py). It fills easily — e.g. leftoverghCLI install artifacts (~38 MB) plus normal scratch. Once full, every/tmpwrite fails withNo space left on device, including git's commit scratch, so autonomous scheduled runs "complete" but silently fail to persist (commit/push). Because the size was a literal, operators couldn't tune it without a code change + base-image rebuild. Fleet-wide (base image).#1098already redirects$TMPDIR-honoring tools off/tmp, but install scripts that hardcode/tmp(theghcase) bypass it and still exhaust the cap.Fix
capabilities.py—AGENT_TMPFS_MOUNTsize is now read fromAGENT_TMP_SIZE(env on the backend service, which builds the agent mount spec), default512m, validated^\d+[mg]$with empty/invalid → default.noexec,nosuidstay hardcoded — only size is configurable, and it stays bounded (tmpfs counts against the agent memory cgroup). Single source of truth → create (crud.py) and recreate (lifecycle.py) can't drift.AGENT_TMP_SIZE=${AGENT_TMP_SIZE:-512m}on thebackendservice in bothdocker-compose.ymlanddocker-compose.prod.yml; document in.env.example.architecture.mdContainer Security — note the now-configurable size.""/512/512Mi/0.5g512m256m,2g,1G1G→1g)Acceptance criteria
AGENT_TMP_SIZE, default512m, single source of truth (capabilities.py)noexec,nosuidremain hardcoded — only size configurable^\d+[mg]$); empty/invalid → default (never broken/unbounded).env.example+ both compose filesarchitecture.mdupdateddf -h /tmp(mount specs are creation-time — existing agents pick up the new size on recreate, not restart)Verification
tests/unit/test_1231_agent_tmp_size.py— default / validm,g/ case-fold / invalid→default / security-flags-never-dropped. Full unit run 22 passed (new + existingtest_capability_set). Both compose files parse withAGENT_TMP_SIZEpresent;py_compileclean.Related to #1231
🤖 Generated with Claude Code