Skip to content

Support open-source self-hosted deployments: pluggable execution backend + local database (converted to project) #62

@xingyaoww

Description

@xingyaoww

This issue has been converted to a project [Automations v5] Support open-source self-hosted deployments


Goal

Make the automation engine usable by open-source self-hosted users, not just OpenHands Cloud. Today two things tie the codebase exclusively to Cloud infrastructure:

  1. Execution backend — every run provisions a fresh Cloud sandbox. Self-hosted users need to point at their own persistent agent-server instead.
  2. Database — the codebase requires PostgreSQL (Cloud SQL). Self-hosted users need a zero-setup local option (SQLite).

Cloud users should see zero change in behavior; the new paths are opt-in via config.


Gap 1: Pluggable Execution Backend

Currently the engine creates a Cloud sandbox per run, waits for it, discovers the agent-server URL inside, runs the tarball, then deletes the sandbox. Both modes actually talk to the same agent-server HTTP APIs — the only difference is how you obtain the URL.

Cloud mode (existing) Agent-server mode (new)
Get agent-server URL Create sandbox → poll → extract from exposed_urls Read from config (AUTOMATION_AGENT_SERVER_URL)
Upload tarball Same Same
Start entrypoint Same Same
Cleanup Delete sandbox Nothing (persistent server)
Auth Mint per-user API key via service key Use config-level key

Config surface: agent_server_url and agent_server_api_key on ServiceSettings. Presence of agent_server_url is the mode flag — no separate boolean needed.

Preset scripts (sdk_main.py) need to detect AGENT_SERVER_URL env var and use RemoteWorkspace instead of OpenHandsCloudWorkspace.

Detailed codebase audit (per-file changes)

automation/execution.py — main refactor target

Sandbox provisioning and agent-server interaction are currently interleaved. The refactor separates them.

Cloud-mode-only functions (sandbox lifecycle):

Function What it does
_create_sandbox() POST /api/v1/sandboxes
_poll_sandbox() GET /api/v1/sandboxes?id=
_create_and_wait() Create + poll until RUNNING + extract agent-server URL
_find_agent_server_url() Parse exposed_urls for AGENT_SERVER

Shared functions (agent-server interaction, unchanged):

Function What it does
_upload() POST /api/file/upload/{path}
_bash() POST /api/bash/execute_bash_command
_start_bash() POST /api/bash/start_bash_command
_download_in_sandbox() curl download inside the runtime
build_tarball() Build tarball in memory

Branching needed in dispatch_automation() and run_automation(): Cloud → _create_and_wait(), agent-server → use config URL. Skip delete_sandbox() in agent-server mode.

Env var injection differences:

Env var Cloud mode Agent-server mode
SANDBOX_ID From sandbox creation Not applicable
SESSION_API_KEY From sandbox response From config
OPENHANDS_API_KEY Per-user key via service key May use config-level key
OPENHANDS_CLOUD_API_URL Cloud API URL May not be needed

automation/utils/sandbox.py — split

  • utils/sandbox.py — keep as-is for Cloud mode
  • utils/agent_server.py — new, shared agent-server queries (get_last_bash_command_result, verify_run_status)

automation/dispatcher.py

  • Pass execution mode to dispatch_automation()
  • In agent-server mode, skip get_api_key_for_automation_run() and use config-level auth
  • Store command_id from start_bash_command instead of sandbox_id

automation/watchdog.py

  • In agent-server mode, query configured URL directly instead of discovering sandbox
  • Skip cleanup_sandbox() in agent-server mode

automation/models.py / schemas.py

  • Add nullable command_id column (backward-compatible)
  • Add command_id to AutomationRunResponse

automation/router.py

Likely zero changes — existing if not run.keep_alive and run.sandbox_id guard already skips cleanup when sandbox_id is None.

Preset scripts (sdk_main.py)

Detect AGENT_SERVER_URL env var → use RemoteWorkspace; otherwise use OpenHandsCloudWorkspace.

Unchanged files

scheduler.py, event_router.py, webhook_router.py, trigger_matcher.py, filter_eval.py, event_schemas/, preset_router.py, uploads.py, storage/, db.py, logger.py, auth.py, utils/cron.py, utils/tarball_validation.py, utils/time.py, utils/run.py, exceptions.py


Gap 2: Local Database (SQLite)

The codebase requires PostgreSQL exclusively — it uses JSONB columns, FOR UPDATE SKIP LOCKED row locking, advisory locks in migrations, and hardcoded asyncpg/pg8000 drivers.

For self-hosted deployments, support SQLite as a lightweight local alternative:

  • New AUTOMATION_DB_URL config setting accepting sqlite+aiosqlite:/// URLs
  • Use SQLAlchemy's generic JSON type instead of PostgreSQL-specific JSONB
  • Skip FOR UPDATE SKIP LOCKED on SQLite (not needed for single-process)
  • Auto-create tables on startup for SQLite (bypassing PG-specific Alembic migrations)

SQLite is appropriate for local dev and small-scale deployments. PostgreSQL remains the default for production.


Implementation Plan

  1. Extract shared agent-server module — move shared functions out of utils/sandbox.py
  2. Add config settingsagent_server_url, agent_server_api_key, db_url
  3. **Branch **execution.py — Cloud vs. agent-server mode in dispatch/run functions
  4. Branch watchdog & dispatcher — mode-aware verification and cleanup
  5. Preset script dual-modeRemoteWorkspace vs. OpenHandsCloudWorkspace
  6. SQLite backend — generic JSON types, conditional row locking, auto-create tables
  7. DB migration + tests — add command_id column, test fixtures for both modes

Open Questions

  1. Working directory isolation — in agent-server mode, multiple runs share the filesystem. Use per-run dirs?
  2. Concurrency — does the agent-server support multiple concurrent start_bash_command calls?
  3. LLM/Secrets/MCP config — without Cloud API, how do preset scripts get LLM config? Env vars, config file, or hybrid mode?
  4. Cleanup between runs — who removes temp files? Per-run working directories?
  5. Auth model — does the persistent agent-server use X-Session-API-Key? Needs to be configurable.
  6. Hybrid mode — agent-server execution + Cloud API for get_llm() / get_secrets()?

This issue was created by an AI assistant (OpenHands) on behalf of @xingyaoww, based on a discussion about making the automation engine usable by open source self-hosted deployments alongside the existing Cloud sandbox mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions