fix(agents): validate template cpu/memory before container create (#1197)#1227
Open
dolho wants to merge 1 commit into
Open
fix(agents): validate template cpu/memory before container create (#1197)#1227dolho wants to merge 1 commit into
dolho wants to merge 1 commit into
Conversation
) A GitHub source repo whose template.yaml declares a fractional/Kubernetes-style resources block (cpu: "0.5", memory: "512Mi") aborted agent creation deep in container-create with an opaque `ValueError: invalid literal for int() with base 10: '0.5'`, after the MCP key was already minted — leaving an orphaned mcp_api_keys row per attempt (#1126/#1128 added the unguarded int(cpu) at three sites). - Add normalize_cpu/normalize_memory + canonical VALID_CPU/VALID_MEMORY in services/agent_service/capabilities.py (stdlib-only, the anti-drift home for container spec). routers/settings.py now imports these instead of duplicating the lists, so the API and the create paths can't drift. - crud.py create: normalize/validate config.resources BEFORE any side effect and raise a clear HTTP 400 on invalid input; write canonical values back so labels + limits use them. Roll back the agent-scoped MCP key in the failure path so a failed create leaves no orphan row. - lifecycle.py recreate + system_agent_service.py: same guard at their nano_cpus/mem_limit sites. - tests/unit/test_resource_normalization.py: pins the helpers (valid set, fractional/k8s rejection with actionable message, case-folding, default fallback, int()-castability, drift guard). Related to #1197
|
Resolve by running |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Creating an agent from a GitHub source repo whose
template.yamldeclares a fractional / Kubernetes-style resources block (cpu: "0.5",memory: "512Mi") aborts deep in container-create with an opaque error:Repo validation, MCP-key creation, subscription assignment and env setup all succeed first; the raw
int(cpu)for Docker'sNanoCpus(#1126/#1128) then crashes.agent_ownershiprolls back but an orphanedmcp_api_keysrow is left per attempt. The adjacentmem_limithas the same class of bug (a k8s memory string isn't a valid Docker mem string).Fix
capabilities.py(stdlib-only, the anti-drift home for container spec): addnormalize_cpu/normalize_memory+ canonicalVALID_CPU/VALID_MEMORY. They reject fractional/k8s values with an actionable message and case-fold memory (4G→4g).routers/settings.pynow imports those sets instead of duplicating the lists — the admin defaults endpoint and the create paths share one source of truth.crud.py(create): normalize/validateconfig.resourcesbefore any side effect and raise a clear HTTP 400 instead of a 500 from deep in Docker; canonical values written back for labels + limits. Roll back the agent-scoped MCP key in the failure path → no orphan rows.lifecycle.py(recreate) andsystem_agent_service.py: same guard at theirnano_cpus/mem_limitsites (all three unguarded sites from the issue).Verification
tests/unit/test_resource_normalization.py— 20+ cases: valid set accepted,0.5/512Mi/etc. rejected with "must be one of …", case-folding, empty→default fallback,int()-castability of every normalized cpu, and a drift guard. Full unit run: 34 passed (new + existingtest_capability_set).py_compileclean on all five changed modules.Related to #1197
🤖 Generated with Claude Code