Skip to content

feat(agent-server): add deferred-init / dormant mode#3287

Draft
tofarr wants to merge 4 commits into
mainfrom
feat/agent-server-deferred-init
Draft

feat(agent-server): add deferred-init / dormant mode#3287
tofarr wants to merge 4 commits into
mainfrom
feat/agent-server-deferred-init

Conversation

@tofarr
Copy link
Copy Markdown
Collaborator

@tofarr tofarr commented May 17, 2026

Implements the warm-pool agent-server proposal in #2523. Foundation for letting K8s warm pods be matched with users after boot, without pre-attached PVCs.

What's in this PR

A new dormant-mode lifecycle for the agent server:

  • Config.deferred_init: bool (env OH_DEFERRED_INIT). When set, the server starts in dormant mode.
  • Config.init_api_key: SecretStr | None (env OH_INIT_API_KEY). Bootstrap credential for POST /init, sent via the X-Init-API-Key header. Distinct from session_api_keys because session keys are part of the per-user payload that arrives inside the init body.
  • InitService (new module openhands/agent_server/init_router.py) owns the dormant → initializing → ready state machine. Serialised by a single asyncio.Lock; failed inits roll back to dormant so the orchestrator can retry.
  • require_initialized dependency added to the /api/* router. Returns 503 while not ready. Zero overhead when deferred_init=False.
  • /init top-level router with GET (unauthenticated status poll) and POST (auth-gated init).
  • Lifespan refactor: stateless services (VSCode, desktop, tool preload) always start. The ConversationService is only entered as part of the lifespan in the legacy path; in dormant mode it's entered when /init succeeds and torn down in the lifespan's finally clause.
  • /ready flips to 200 as soon as the stateless services are up, so a warm-pool orchestrator can tell when the pod is available to receive /init.

Behaviour matrix

State /alive, /health, /ready, /server_info /init GET /init POST /api/*
deferred_init=False (today) 200 404 404 live
Deferred, dormant 200 200 (state: dormant) 200 (auth-gated) → ready 503
Deferred, initializing 200 200 (state: initializing) 400 503
Deferred, ready 200 200 (state: ready) 400 live

Scope notes (what this PR deliberately doesn't do)

Driven by the open questions I raised in #2523 (comment). I picked the smallest set that captures the dormant→ready transition cleanly; the rest can land incrementally:

  • No /deinit yet. Once ready, the server stays ready for the rest of the process lifetime. This is sufficient for the single-conversation-per-pod sandbox model. Recyclable /init is a follow-up that should arrive together with a clear story for flushing the workspace back to object storage between conversations.
  • No workspace-storage integration (rclone / tar / S3 pull-on-init / continuous sync). InitRequest accepts a conversations_path and bash_events_dir, so an orchestrator can mount or pre-populate a workspace before calling /init; that side of the contract is intentionally separate.
  • No Workspace-class integration in the SDK side yet. Per @enyst's and @xingyaoww's question in the issue, the cleanest API on the workspace side (probably a two-phase start() then attach(config) on the context manager) deserves its own PR once the server-side primitive is in place. This PR provides that primitive.
  • Session-key timing trade-off (documented in the test): session-key auth is bound to the dormant Config when _add_api_routes runs, so session keys delivered via /init populate app.state.config but are not enforced by the auth dependency. Production deployments should set OH_SESSION_API_KEYS_0 at pod start and use /init only to deliver workspace + per-user runtime config. The dormant gate guarantees no traffic reaches gated routes before /init regardless.
  • No new image / docker / k8s deployment changes. Same images, same entrypoint — toggled by an env var.

Tests

tests/agent_server/test_init_router.py covers:

  • Config defaults + env wiring (OH_DEFERRED_INIT, OH_INIT_API_KEY).
  • InitRequest → Config merging (override only provided fields, secret-key fallback to first session key, deferred_init cleared on transition).
  • State machine: dormant → ready, second-call 400, env-var application during init.
  • End-to-end over api_lifespan + TestClient: 503 gating before /init, 200 after, init-key auth (401 on wrong/missing key, 200 on right key, GET unauthenticated).
  • Regression: deferred_init=False does not attach an InitService and /api/* is live from the start.
  • Lifespan teardown releases the ConversationService when /init ran, and is a no-op when it didn't.

All 16 new tests pass; the rest of tests/agent_server/ is unaffected (one pre-existing failure in test_terminal_service.py::test_terminal_does_not_expose_session_api_key_via_env_command reproduces on main, unrelated to this change).

Why draft

This is the server-side foundation, not the full proposal. Posting as a draft so reviewers can sign off on the shape (InitRequest surface, auth model, gate placement, state-machine boundaries) before the follow-ups land (/deinit, workspace pull-on-init, Workspace-class integration, end-to-end docker/k8s example).


This PR was opened by an AI agent (OpenHands) on behalf of @tofarr.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:1491d0b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-1491d0b-python \
  ghcr.io/openhands/agent-server:1491d0b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:1491d0b-golang-amd64
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-golang-amd64
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-golang-amd64
ghcr.io/openhands/agent-server:1491d0b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:1491d0b-golang-arm64
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-golang-arm64
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-golang-arm64
ghcr.io/openhands/agent-server:1491d0b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:1491d0b-java-amd64
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-java-amd64
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-java-amd64
ghcr.io/openhands/agent-server:1491d0b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:1491d0b-java-arm64
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-java-arm64
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-java-arm64
ghcr.io/openhands/agent-server:1491d0b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:1491d0b-python-amd64
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-python-amd64
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-python-amd64
ghcr.io/openhands/agent-server:1491d0b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:1491d0b-python-arm64
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-python-arm64
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-python-arm64
ghcr.io/openhands/agent-server:1491d0b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:1491d0b-golang
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-golang
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-golang
ghcr.io/openhands/agent-server:1491d0b-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:1491d0b-java
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-java
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-java
ghcr.io/openhands/agent-server:1491d0b-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:1491d0b-python
ghcr.io/openhands/agent-server:1491d0b14c0e8438f223ca9d3cdad033af6feb33-python
ghcr.io/openhands/agent-server:feat-agent-server-deferred-init-python
ghcr.io/openhands/agent-server:1491d0b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 1491d0b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 1491d0b-python-amd64) are also available if needed

Implements the warm-pool agent-server proposal in #2523.

When `Config.deferred_init=True` (env `OH_DEFERRED_INIT`) the server
starts in *dormant* mode:

* Stateless services (VSCode, desktop, tool preload) start as usual so
  the warm pod is immediately useful to whoever attaches next.
* The conversation, event, and bash routers (everything under `/api/*`)
  return 503 via a new `require_initialized` dependency.
* `/alive`, `/health`, `/ready`, `/server_info` and a new top-level
  `/init` router are reachable. `/ready` reports ready once the
  stateless services are up so an orchestrator can match the pod with
  a user and send its `/init` payload.
* `POST /init` accepts an `InitRequest` (session API keys, workspace
  paths, webhooks, env vars, etc.), merges it with the dormant config,
  enters the `ConversationService` context, and flips the gate to
  `ready`. A second `/init` call gets 400; a failed init rolls back
  to dormant so the orchestrator can retry.
* Bootstrap auth for `POST /init` is a separate `OH_INIT_API_KEY`
  (`X-Init-API-Key` header), distinct from `session_api_keys` because
  the session key is part of the per-user payload that arrives *inside*
  the init body. `GET /init` (status polling) is unauthenticated.

The non-deferred path is unchanged — no `InitService` is attached to
`app.state` and the dormant gate is a no-op.

Tests cover: config defaults + env wiring, `InitRequest` → `Config`
merging, state machine (dormant → initializing → ready, second-call
400), env var application, end-to-end over the FastAPI lifespan +
`TestClient` (503 gating before init, 200 after, init key auth), and
the regression that `deferred_init=False` still works exactly as today.

Refs: #2523

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 17, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 17, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 17, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   api.py2672690%105, 107–112, 114, 116, 118, 153, 165, 180, 186, 239, 244, 253–255, 483, 486, 490–492, 494, 500
   config.py83297%29, 42
   init_router.py113496%152, 154, 156, 158
TOTAL283791220756% 

tofarr and others added 3 commits May 23, 2026 18:50
Resolved conflict in openhands-agent-server/openhands/agent_server/api.py:
- Kept retention_task cancellation logic added in main
- Kept stop_stateless_services() helper from PR branch

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants