feat(org): configurable agent-run timeouts with env-clamped ceiling by MesoX · Pull Request #168 · willdady/platypus

MesoX · 2026-05-31T20:59:09Z

Summary

Adds a per-organization override of agent-run wall-clock + per-step timeouts for both chat-driven and trigger-driven runs. Defaults are unchanged; overrides only take effect when an org admin sets them in the organization settings UI. Every override is clamped server-side to a deployer-supplied environment ceiling — an admin can lower a timeout but never raise it past what the host operator allows.

Motivation

Long-running agent runs (multi-step research, slow MCP/tool calls) sometimes need more headroom than the hardcoded defaults, but the limit should stay under the deployer's control. This exposes a safe, bounded knob per organization.

Changes

Schema

Migration 0041 adds a nullable jsonb agent_run_settings column on organization (idempotent ADD COLUMN IF NOT EXISTS; existing rows keep null and behave exactly as before).
AgentRunSettings exported from @platypus/schemas as a strict zod object with four optional positive-int fields.

Backend

New services/agent-run-settings.ts — resolveRunTimeouts(orgId, kind) reads the org override, falls back to env, then to hardcoded defaults, and clamps to the env ceiling. Chat defaults are sourced from run-registry constants so they cannot drift.
PUT /organizations/:orgId (admin-only) rejects overrides above the env ceiling with 400 { error } naming the offending env vars.
New GET /organizations/:orgId/agent-run-settings/ceilings exposes the current ceilings for the UI.
routes/chat.ts and services/trigger-execution.ts resolve timeouts via the new service instead of reading env directly.

Frontend

OrganizationForm gains an "Agent run timeouts" section (edit only) with four minute-valued inputs, ceiling placeholders, client-side ceiling validation for inline feedback, and surfacing of the server error message.

Config

.env.example documents the four ceiling env vars and their defaults (RUN_PER_RUN_TIMEOUT_MS, RUN_PER_STEP_TIMEOUT_MS, TRIGGER_PER_RUN_TIMEOUT_MS, TRIGGER_PER_STEP_TIMEOUT_MS).

Behavior compatibility

Chat defaults (10 min run / 2 min step) are imported from run-registry's existing defaults, so a run with no env override and no org override behaves identically to before this PR.

Testing

services/agent-run-settings.test.ts — env defaults, env overrides, garbage-env rejection, DB-backed resolution including ceiling clamping, no-row fallback, chat-vs-trigger isolation.
routes/organization.test.ts — PUT path: 403 for non-admins, 400 (with error key) when over the ceiling, successful within-ceiling persist, and null clears the override.
Full suite: backend 795 pass, frontend 16 pass.

🤖 Generated with Claude Code

Adds a per-organization override of the agent-run wall-clock + per-step timeouts for both chat-driven and trigger-driven runs. Defaults remain the same; overrides only kick in when an org admin sets them in the organization settings UI. Schema - Migration 0041 adds a nullable jsonb 'agent_run_settings' column on 'organization'. Idempotent: existing rows keep the default null value and continue using env / hardcoded defaults. - 'AgentRunSettings' shape is exported from '@platypus/schemas' as a strict zod object with four optional positive-int fields: chatPerRunTimeoutMs, chatPerStepTimeoutMs, triggerPerRunTimeoutMs, triggerPerStepTimeoutMs. Backend - New 'services/agent-run-settings.ts' exposes 'resolveRunTimeouts(orgId, kind)' (chat | trigger). It reads the org override, falls back to env, and finally to documented hardcoded defaults. The result is clamped to the env-supplied ceiling so a misconfigured override can never exceed the deployer-allowed maximum. Chat defaults are sourced from run-registry's exported constants so the two cannot drift. - 'PUT /organizations/:orgId' (admin-only) validates incoming overrides against the env ceilings and returns 400 with a single 'error' message (per API conventions) listing the offending env vars — admins can lower but never raise. - New 'GET /organizations/:orgId/agent-run-settings/ceilings' returns the current chat + trigger ceilings so the UI can display them next to each input. - 'routes/chat.ts' and 'services/trigger-execution.ts' now invoke 'resolveRunTimeouts' instead of reading env directly, so the org override is honored by every active run. Frontend - 'OrganizationForm' gains an 'Agent run timeouts' section (only shown on edit, not create). Four minute-valued inputs map to the four override fields; placeholders show the current ceilings fetched from the new endpoint. Values are converted to milliseconds before saving, validated client-side against the fetched ceilings for inline feedback, and the server error message is surfaced on rejection. Config - '.env.example' documents the four ceiling env vars (RUN_PER_RUN_TIMEOUT_MS, RUN_PER_STEP_TIMEOUT_MS, TRIGGER_PER_RUN_TIMEOUT_MS, TRIGGER_PER_STEP_TIMEOUT_MS) and their defaults. Tests - 'services/agent-run-settings.test.ts' covers env-default fallback, env-override parsing, garbage-env rejection, and the DB-backed 'resolveRunTimeouts' lookup including ceiling clamping, the no-row fallback, and chat-vs-trigger isolation. - 'routes/organization.test.ts' covers the new PUT path: 403 for non-admins, 400 (with 'error' key) when an override exceeds the ceiling, and a successful within-ceiling persist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ports the reviewed version of the configurable agent-run timeout feature (upstream PR willdady#168) onto the deploy branch. Behavior unchanged; quality and convention fixes only. - Error responses use the singular `error` key per API conventions (was a custom `{ errors }` map). - Remove dead `clampRunTimeouts` / `__TEST_HOOKS__`; source chat defaults from run-registry's exported constants so they cannot drift. - Table-driven ceiling validation; server message reports minutes. - Frontend validates against the fetched ceilings for inline feedback and surfaces the server error message on rejection. - Document the four ceiling env vars in .env.example; migration trailing newline. - Add PUT /organizations/:orgId tests (403 / 400-over-ceiling / within-ceiling persist / null clears the override). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MesoX · 2026-06-15T20:40:41Z

Hey @willdady this would deliberately help with #261 - Chat compaction can take some time (even more on the locally used models on slower hardware (not a big problem to see even 5 minutes of compaction happening with long context and slow dense model). Even with faster model we have run into timeouts. I would update it to the latest main, just let me know whether this is something you want to add to the tool tiself, or is there any other way how to make these things a bit more configurable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(org): configurable agent-run timeouts with env-clamped ceiling#168

feat(org): configurable agent-run timeouts with env-clamped ceiling#168
MesoX wants to merge 1 commit into
willdady:mainfrom
MesoX:feature/configurable-agent-run-timeouts

MesoX commented May 31, 2026

Uh oh!

MesoX commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MesoX commented May 31, 2026

Summary

Motivation

Changes

Behavior compatibility

Testing

Uh oh!

MesoX commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant