feat(loops): loop-level wall-clock deadline (max_duration_seconds) (#1156)#1182
feat(loops): loop-level wall-clock deadline (max_duration_seconds) (#1156)#1182dolho wants to merge 2 commits into
Conversation
…1156) Adds an optional total-duration bound to sequential agent loops, the third hard stop alongside the max_runs iteration cap. A loop legally configured today (max_runs=100 × timeout_per_run up to 2h + delays) could run for days; max_duration_seconds caps total wall-clock time. - Schema: agent_loops.max_duration_seconds INTEGER (nullable) + idempotent migration (agent_loops_max_duration). - Runner (loop_service): deadline measured from started_at, checked only at iteration boundaries (before the next run + before/after the inter-run delay, which is capped to the remaining budget). An in-flight run is never killed mid-turn — overshoot is bounded by one timeout_per_run. Expiry stops with stop_reason="deadline_exceeded", terminal status "stopped". - Router: optional max_duration_seconds (1..604800); 400 when smaller than the effective per-run timeout (timeout_per_run, else agent execution_timeout). GET /api/loops/{id} returns max_duration_seconds + computed elapsed_seconds. - MCP run_agent_loop + UI Loops form/detail expose the parameter and deadline. - Tests: deadline stop at boundary, in-flight run completes (not killed), delay capped to remaining budget, no-deadline regression, and the four validation paths. Docs: architecture.md (feature + schema) and requirements.md §38.2. Closes #1156 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
be4c6f5 to
bc23675
Compare
|
Resolve by running |
|
@dependabot rebase |
vybe
left a comment
There was a problem hiding this comment.
Thanks — the feature itself looks good (clean SQLite migration, schema.py/tables.py updated, docs + tests present). One blocker before merge:
Missing the PostgreSQL Alembic revision (dual-track migration, CLAUDE.md §9 / Invariant #9).
This PR adds agent_loops.max_duration_seconds via the SQLite track (_migrate_agent_loops_max_duration in db/migrations.py) but there is no matching Alembic revision under src/backend/migrations/versions/. On a PostgreSQL deployment the column would never be created, so the loop deadline feature would crash there.
Please add a revision (down_revision = current head, e.g. 0001_baseline) that runs op.add_column('agent_loops', sa.Column('max_duration_seconds', sa.Integer(), nullable=True)). Once that's in, this is good to go.
Summary
Adds an optional loop-level wall-clock deadline (
max_duration_seconds) to sequential agent loops — the third hard stop alongside themax_runsiteration cap (and the separately-tracked cost budget). A loop legally configured today (max_runs=100×timeout_per_runup to 2h +delay_seconds) can run for days; this bounds total duration.Related to #1156.
What changed (end-to-end)
agent_loops.max_duration_seconds INTEGER(nullable) inschema.py+ idempotentagent_loops_max_durationmigration for existing DBs.loop_service._run): deadline measured fromstarted_at, checked only at iteration boundaries — before the next run and before/after the inter-run delay (thedelay_secondssleep is capped to the remaining budget, never sleeping past the deadline). An in-flight run is never killed mid-turn, so overshoot is bounded by onetimeout_per_run. Expiry → terminal statusstopped,stop_reason="deadline_exceeded".max_duration_seconds(1–604800 = 7d); 400 when smaller than the effective per-run timeout (timeout_per_run, else the agent'sexecution_timeout_seconds).GET /api/loops/{id}now returnsmax_duration_seconds+ a computedelapsed_seconds.run_agent_loop+ Loops UI (form field + deadline/elapsed in detail,deadline_exceededlabel).architecture.md(feature + schema) andrequirements.md§38.2.Acceptance criteria
POST /loopsaccepts optionalmax_duration_seconds; persisted onagent_loopsstarted_at, checked at iteration boundaries; in-flight run completesstop_reason="deadline_exceeded", statusstopped< timeout_per_run(or agent execution timeout) with 400GET /loops/{id}returns deadline + elapsed; MCPrun_agent_loopexposes the paramTesting
tests/unit/test_loop_service.py(deadline suite) +tests/unit/test_loops_router_validation.py— 24 passed locally. Also verified the migration is idempotent andcreate_loop/get_loopround-trips the column on a fresh schema and via the ALTER path.🤖 Generated with Claude Code