feat(agent-server): add deferred-init / dormant mode#3287
Draft
tofarr wants to merge 4 commits into
Draft
Conversation
Implements the warm-pool agent-server proposal in #2523. When `Config.deferred_init=True` (env `OH_DEFERRED_INIT`) the server starts in *dormant* mode: * Stateless services (VSCode, desktop, tool preload) start as usual so the warm pod is immediately useful to whoever attaches next. * The conversation, event, and bash routers (everything under `/api/*`) return 503 via a new `require_initialized` dependency. * `/alive`, `/health`, `/ready`, `/server_info` and a new top-level `/init` router are reachable. `/ready` reports ready once the stateless services are up so an orchestrator can match the pod with a user and send its `/init` payload. * `POST /init` accepts an `InitRequest` (session API keys, workspace paths, webhooks, env vars, etc.), merges it with the dormant config, enters the `ConversationService` context, and flips the gate to `ready`. A second `/init` call gets 400; a failed init rolls back to dormant so the orchestrator can retry. * Bootstrap auth for `POST /init` is a separate `OH_INIT_API_KEY` (`X-Init-API-Key` header), distinct from `session_api_keys` because the session key is part of the per-user payload that arrives *inside* the init body. `GET /init` (status polling) is unauthenticated. The non-deferred path is unchanged — no `InitService` is attached to `app.state` and the dormant gate is a no-op. Tests cover: config defaults + env wiring, `InitRequest` → `Config` merging, state machine (dormant → initializing → ready, second-call 400), env var application, end-to-end over the FastAPI lifespan + `TestClient` (503 gating before init, 200 after, init key auth), and the regression that `deferred_init=False` still works exactly as today. Refs: #2523 Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
Contributor
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Contributor
Resolved conflict in openhands-agent-server/openhands/agent_server/api.py: - Kept retention_task cancellation logic added in main - Kept stop_stateless_services() helper from PR branch Co-authored-by: openhands <openhands@all-hands.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the warm-pool agent-server proposal in #2523. Foundation for letting K8s warm pods be matched with users after boot, without pre-attached PVCs.
What's in this PR
A new dormant-mode lifecycle for the agent server:
Config.deferred_init: bool(envOH_DEFERRED_INIT). When set, the server starts in dormant mode.Config.init_api_key: SecretStr | None(envOH_INIT_API_KEY). Bootstrap credential forPOST /init, sent via theX-Init-API-Keyheader. Distinct fromsession_api_keysbecause session keys are part of the per-user payload that arrives inside the init body.InitService(new moduleopenhands/agent_server/init_router.py) owns thedormant → initializing → readystate machine. Serialised by a singleasyncio.Lock; failed inits roll back todormantso the orchestrator can retry.require_initializeddependency added to the/api/*router. Returns 503 while notready. Zero overhead whendeferred_init=False./inittop-level router withGET(unauthenticated status poll) andPOST(auth-gated init).ConversationServiceis only entered as part of the lifespan in the legacy path; in dormant mode it's entered when/initsucceeds and torn down in the lifespan'sfinallyclause./readyflips to 200 as soon as the stateless services are up, so a warm-pool orchestrator can tell when the pod is available to receive/init.Behaviour matrix
/alive,/health,/ready,/server_info/initGET/initPOST/api/*deferred_init=False(today)dormantdormant)initializinginitializing)readyready)Scope notes (what this PR deliberately doesn't do)
Driven by the open questions I raised in #2523 (comment). I picked the smallest set that captures the dormant→ready transition cleanly; the rest can land incrementally:
/deinityet. Onceready, the server staysreadyfor the rest of the process lifetime. This is sufficient for the single-conversation-per-pod sandbox model. Recyclable/initis a follow-up that should arrive together with a clear story for flushing the workspace back to object storage between conversations.InitRequestaccepts aconversations_pathandbash_events_dir, so an orchestrator can mount or pre-populate a workspace before calling/init; that side of the contract is intentionally separate.Workspace-class integration in the SDK side yet. Per @enyst's and @xingyaoww's question in the issue, the cleanest API on the workspace side (probably a two-phasestart()thenattach(config)on the context manager) deserves its own PR once the server-side primitive is in place. This PR provides that primitive._add_api_routesruns, so session keys delivered via/initpopulateapp.state.configbut are not enforced by the auth dependency. Production deployments should setOH_SESSION_API_KEYS_0at pod start and use/initonly to deliver workspace + per-user runtime config. The dormant gate guarantees no traffic reaches gated routes before/initregardless.Tests
tests/agent_server/test_init_router.pycovers:OH_DEFERRED_INIT,OH_INIT_API_KEY).InitRequest → Configmerging (override only provided fields, secret-key fallback to first session key,deferred_initcleared on transition).api_lifespan+TestClient: 503 gating before/init, 200 after, init-key auth (401 on wrong/missing key, 200 on right key, GET unauthenticated).deferred_init=Falsedoes not attach anInitServiceand/api/*is live from the start.ConversationServicewhen/initran, and is a no-op when it didn't.All 16 new tests pass; the rest of
tests/agent_server/is unaffected (one pre-existing failure intest_terminal_service.py::test_terminal_does_not_expose_session_api_key_via_env_commandreproduces onmain, unrelated to this change).Why draft
This is the server-side foundation, not the full proposal. Posting as a draft so reviewers can sign off on the shape (
InitRequestsurface, auth model, gate placement, state-machine boundaries) before the follow-ups land (/deinit, workspace pull-on-init,Workspace-class integration, end-to-end docker/k8s example).This PR was opened by an AI agent (OpenHands) on behalf of @tofarr.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:1491d0b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
1491d0b-python) is a multi-arch manifest supporting both amd64 and arm641491d0b-python-amd64) are also available if needed