Skip to content

feat(loops): loop-level wall-clock deadline (max_duration_seconds) (#1156)#1182

Open
dolho wants to merge 2 commits into
devfrom
feature/1156-loop-max-duration
Open

feat(loops): loop-level wall-clock deadline (max_duration_seconds) (#1156)#1182
dolho wants to merge 2 commits into
devfrom
feature/1156-loop-max-duration

Conversation

@dolho

@dolho dolho commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an optional loop-level wall-clock deadline (max_duration_seconds) to sequential agent loops — the third hard stop alongside the max_runs iteration cap (and the separately-tracked cost budget). A loop legally configured today (max_runs=100 × timeout_per_run up to 2h + delay_seconds) can run for days; this bounds total duration.

Related to #1156.

What changed (end-to-end)

  • Schema/migration: agent_loops.max_duration_seconds INTEGER (nullable) in schema.py + idempotent agent_loops_max_duration migration for existing DBs.
  • Runner (loop_service._run): deadline measured from started_at, checked only at iteration boundaries — before the next run and before/after the inter-run delay (the delay_seconds sleep is capped to the remaining budget, never sleeping past the deadline). An in-flight run is never killed mid-turn, so overshoot is bounded by one timeout_per_run. Expiry → terminal status stopped, stop_reason="deadline_exceeded".
  • Router: optional max_duration_seconds (1–604800 = 7d); 400 when smaller than the effective per-run timeout (timeout_per_run, else the agent's execution_timeout_seconds). GET /api/loops/{id} now returns max_duration_seconds + a computed elapsed_seconds.
  • MCP run_agent_loop + Loops UI (form field + deadline/elapsed in detail, deadline_exceeded label).
  • Docs: architecture.md (feature + schema) and requirements.md §38.2.

Acceptance criteria

  • POST /loops accepts optional max_duration_seconds; persisted on agent_loops
  • Deadline from started_at, checked at iteration boundaries; in-flight run completes
  • Expired deadline → stop_reason="deadline_exceeded", status stopped
  • Validation: reject < timeout_per_run (or agent execution timeout) with 400
  • GET /loops/{id} returns deadline + elapsed; MCP run_agent_loop exposes the param
  • Loops UI shows the deadline when set
  • Schema migration + tests (deadline stop, boundary semantics, validation error)

Testing

tests/unit/test_loop_service.py (deadline suite) + tests/unit/test_loops_router_validation.py24 passed locally. Also verified the migration is idempotent and create_loop/get_loop round-trips the column on a fresh schema and via the ALTER path.

🤖 Generated with Claude Code

…1156)

Adds an optional total-duration bound to sequential agent loops, the third
hard stop alongside the max_runs iteration cap. A loop legally configured
today (max_runs=100 × timeout_per_run up to 2h + delays) could run for days;
max_duration_seconds caps total wall-clock time.

- Schema: agent_loops.max_duration_seconds INTEGER (nullable) + idempotent
  migration (agent_loops_max_duration).
- Runner (loop_service): deadline measured from started_at, checked only at
  iteration boundaries (before the next run + before/after the inter-run
  delay, which is capped to the remaining budget). An in-flight run is never
  killed mid-turn — overshoot is bounded by one timeout_per_run. Expiry stops
  with stop_reason="deadline_exceeded", terminal status "stopped".
- Router: optional max_duration_seconds (1..604800); 400 when smaller than the
  effective per-run timeout (timeout_per_run, else agent execution_timeout).
  GET /api/loops/{id} returns max_duration_seconds + computed elapsed_seconds.
- MCP run_agent_loop + UI Loops form/detail expose the parameter and deadline.
- Tests: deadline stop at boundary, in-flight run completes (not killed),
  delay capped to remaining budget, no-deadline regression, and the four
  validation paths.

Docs: architecture.md (feature + schema) and requirements.md §38.2.

Closes #1156

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dolho dolho force-pushed the feature/1156-loop-max-duration branch from be4c6f5 to bc23675 Compare June 12, 2026 11:50
@github-actions

Copy link
Copy Markdown

⚠️ Nightly unit-suite check skipped — merge conflict against dev.

Resolve by running git merge dev locally and pushing the result. The next nightly run will re-test once the conflict is gone.

@dolho dolho requested a review from vybe June 16, 2026 08:48
@vybe

vybe commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@dependabot rebase

@vybe vybe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the feature itself looks good (clean SQLite migration, schema.py/tables.py updated, docs + tests present). One blocker before merge:

Missing the PostgreSQL Alembic revision (dual-track migration, CLAUDE.md §9 / Invariant #9).

This PR adds agent_loops.max_duration_seconds via the SQLite track (_migrate_agent_loops_max_duration in db/migrations.py) but there is no matching Alembic revision under src/backend/migrations/versions/. On a PostgreSQL deployment the column would never be created, so the loop deadline feature would crash there.

Please add a revision (down_revision = current head, e.g. 0001_baseline) that runs op.add_column('agent_loops', sa.Column('max_duration_seconds', sa.Integer(), nullable=True)). Once that's in, this is good to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants