DRAFT: feat: add agent-server execution mode for open source deployments#63
DRAFT: feat: add agent-server execution mode for open source deployments#63xingyaoww wants to merge 1 commit into
Conversation
Add support for a dual execution backend: 1. Cloud sandbox mode (existing, default): Per-run sandbox provisioning via Cloud API — unchanged behavior. 2. Agent-server mode (new): Connects directly to a persistent agent-server via AUTOMATION_AGENT_SERVER_URL config. No sandbox creation, polling, or cleanup. Aimed at open source / self-hosted deployments. Both modes share the same agent-server HTTP APIs for file upload and bash execution — only the connection setup differs. Changes: - config.py: Add agent_server_url, agent_server_api_key settings and is_agent_server_mode property - execution.py: Introduce AgentConnection abstraction and _connect_cloud_sandbox / _connect_agent_server helpers. Refactor dispatch_automation() and run_automation() to branch on mode. - dispatcher.py: Pass agent-server config through to dispatch call - watchdog.py: Pass agent-server config to verification, skip sandbox cleanup in agent-server mode - utils/sandbox.py: Update verify_run_status() to accept agent-server URL directly (bypasses sandbox discovery in agent-server mode) - app.py: Log which execution backend is active on startup Refs: #62 Co-authored-by: openhands <openhands@all-hands.dev>
|
🚀 Deploy Preview PR Created/Updated A deploy preview has been created/updated for this PR. Deploy PR: https://github.com/OpenHands/deploy/pull/3895 Once the deploy PR's CI passes, the automation service will be deployed to the feature environment. |
QA Instructions: Testing Agent-Server ModeThis PR adds a second execution backend where the automation engine connects to a persistent agent-server instead of creating Cloud sandboxes. Below are instructions to test it end-to-end. Prerequisites
OverviewThe test has 3 phases:
Step 1: Provision an Agent-Server via Runtime APIUse the Runtime API to spin up an agent-server. This gives us a running agent-server URL that the automation engine can connect to directly. # Provision an agent-server runtime
RUNTIME_RESPONSE=$(curl -s -X POST "https://runtime.eval.all-hands.dev/sessions" \
-H "Authorization: Bearer $RUNTIME_API" \
-H "Content-Type: application/json" \
-d '{"image": "ghcr.io/openhands/agent-server:latest-python"}')
echo "$RUNTIME_RESPONSE" | python3 -m json.tool
# Extract the session ID
SESSION_ID=$(echo "$RUNTIME_RESPONSE" | python3 -c "import json,sys; print(json.load(sys.stdin)['session_id'])")
echo "Session ID: $SESSION_ID"Poll until RUNNING and get the agent-server URL: # Poll until ready (may take 30-60s)
for i in $(seq 1 30); do
STATUS_RESPONSE=$(curl -s "https://runtime.eval.all-hands.dev/sessions/$SESSION_ID" \
-H "Authorization: Bearer $RUNTIME_API")
STATUS=$(echo "$STATUS_RESPONSE" | python3 -c "import json,sys; print(json.load(sys.stdin).get('status','UNKNOWN'))")
echo "[$i] Status: $STATUS"
if [ "$STATUS" = "running" ]; then
AGENT_SERVER_URL=$(echo "$STATUS_RESPONSE" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('url','') or d.get('session_url',''))")
SESSION_API_KEY=$(echo "$STATUS_RESPONSE" | python3 -c "import json,sys; print(json.load(sys.stdin).get('session_api_key',''))")
echo "Agent-server URL: $AGENT_SERVER_URL"
break
fi
sleep 5
doneVerify the agent-server is healthy: curl -s "$AGENT_SERVER_URL/api/bash/bash_events/search?limit=1" \
-H "X-Session-API-Key: $SESSION_API_KEY" | python3 -m json.toolStep 2: Run a Smoke TestCreate a minimal test script that calls import asyncio
import io
import os
import sys
import tarfile
from automation.execution import run_automation
def build_simple_tarball() -> bytes:
buf = io.BytesIO()
with tarfile.open(fileobj=buf, mode="w:gz") as tar:
content = (
b'import os\n'
b'print("=== ENV VARS ===")\n'
b'for k,v in sorted(os.environ.items()):\n'
b' if "AUTOMATION" in k or "AGENT" in k or "SANDBOX" in k or "SESSION" in k:\n'
b' print(f" {k}: {v[:20]}...")\n'
b'print("\\n=== RESULT ===")\n'
b'print("ALL_OK")\n'
)
info = tarfile.TarInfo(name="main.py")
info.size = len(content)
tar.addfile(info, io.BytesIO(content))
return buf.getvalue()
async def main():
agent_server_url = os.environ.get("AGENT_SERVER_URL")
agent_server_api_key = os.environ.get("SESSION_API_KEY", "")
if not agent_server_url:
print("ERROR: Set AGENT_SERVER_URL", file=sys.stderr)
sys.exit(1)
tarball = build_simple_tarball()
print(f"Testing agent-server mode against: {agent_server_url}")
result = await run_automation(
api_url="https://unused-in-agent-server-mode.example.com",
api_key="unused",
entrypoint="python main.py",
tarball_source=tarball,
env_vars={},
run_id="test-agent-server-001",
agent_server_url=agent_server_url,
agent_server_api_key=agent_server_api_key,
)
print(f"\n=== RESULT ===")
print(f" success: {result.success}")
print(f" sandbox_id: {result.sandbox_id}")
print(f" exit_code: {result.exit_code}")
if result.stdout:
print(f"--- stdout ---")
for line in result.stdout.splitlines():
print(f" {line}")
if result.error:
print(f"--- error ---\n {result.error}")
# Assertions
assert result.success, f"Expected success, got: {result.error}"
assert result.sandbox_id is None, f"sandbox_id should be None, got: {result.sandbox_id}"
assert "ALL_OK" in result.stdout, "Expected ALL_OK in stdout"
assert result.exit_code == 0
print("\nPASS: agent-server mode works correctly")
asyncio.run(main())Run it: AGENT_SERVER_URL="<url-from-step-1>" \
SESSION_API_KEY="<key-from-step-1>" \
uv run python scripts/test_agent_server_mode.pyWhat to verify
Step 3: Regression Check (Cloud Sandbox Mode)Run the existing E2E test to verify Cloud mode is unaffected: OPENHANDS_API_KEY=sk-oh-... uv run python scripts/test_automation.py \
--api-url https://staging.all-hands.devThis should pass identically to before. Step 4: Cleanupcurl -s -X DELETE "https://runtime.eval.all-hands.dev/sessions/$SESSION_ID" \
-H "Authorization: Bearer $RUNTIME_API"This QA instruction was created by an AI assistant (OpenHands). |
QA Instructions: Testing Agent-Server Mode (Updated)
This PR adds a second execution backend where the automation engine connects to a persistent agent-server instead of creating Cloud sandboxes. Below are instructions to test it end-to-end. Prerequisites
Step 1: Start an Agent-Server via Docker# Start Docker if not running
sudo dockerd > /tmp/docker.log 2>&1 &
sleep 3
# Run the agent-server container
docker run -d --name agent-server \
-p 3000:3000 \
ghcr.io/openhands/agent-server:latest-python
# Wait for it to start and check logs for the port
sleep 5
docker logs agent-server 2>&1 | tail -20Verify the agent-server is responding: # Try the bash events endpoint (should return an empty list or similar)
curl -s http://localhost:3000/api/bash/bash_events/search?limit=1
Step 2: Run the Smoke TestSave this as import asyncio
import io
import os
import sys
import tarfile
from automation.execution import run_automation
def build_simple_tarball() -> bytes:
buf = io.BytesIO()
with tarfile.open(fileobj=buf, mode="w:gz") as tar:
content = (
b'import os\n'
b'print("=== ENV VARS ===")\n'
b'for k,v in sorted(os.environ.items()):\n'
b' if "AUTOMATION" in k or "AGENT" in k or "SANDBOX" in k or "SESSION" in k:\n'
b' print(f" {k}: {v[:20]}...")\n'
b'print("\\n=== RESULT ===")\n'
b'print("ALL_OK")\n'
)
info = tarfile.TarInfo(name="main.py")
info.size = len(content)
tar.addfile(info, io.BytesIO(content))
return buf.getvalue()
async def main():
agent_server_url = os.environ.get("AGENT_SERVER_URL", "http://localhost:3000")
tarball = build_simple_tarball()
print(f"Testing agent-server mode against: {agent_server_url}")
result = await run_automation(
api_url="https://unused-in-agent-server-mode.example.com",
api_key="unused",
entrypoint="python main.py",
tarball_source=tarball,
env_vars={},
run_id="test-agent-server-001",
agent_server_url=agent_server_url,
agent_server_api_key="",
)
print(f"\n=== RESULT ===")
print(f" success: {result.success}")
print(f" sandbox_id: {result.sandbox_id}")
print(f" exit_code: {result.exit_code}")
if result.stdout:
print(f"--- stdout ---")
for line in result.stdout.splitlines():
print(f" {line}")
if result.error:
print(f"--- error ---\n {result.error}")
# Assertions
assert result.success, f"Expected success, got: {result.error}"
assert result.sandbox_id is None, f"sandbox_id should be None, got: {result.sandbox_id}"
assert "ALL_OK" in result.stdout, "Expected ALL_OK in stdout"
assert result.exit_code == 0
print("\nPASS: agent-server mode works correctly")
asyncio.run(main())uv run python scripts/test_agent_server_mode.pyWhat to verify
Step 3: Regression Check (Cloud Sandbox Mode)Run the existing E2E test to verify Cloud mode is unaffected: OPENHANDS_API_KEY=sk-oh-... uv run python scripts/test_automation.py \
--api-url https://staging.all-hands.devThis should pass identically to before. Step 4: Cleanupdocker stop agent-server && docker rm agent-serverThis QA instruction was created by an AI assistant (OpenHands) on behalf of @xingyaoww. |
QA Results: Agent-Server Mode (PR #63)Tested the agent-server execution mode end-to-end following the QA instructions. Environment
Step 1: Agent-Server Container ✅Started successfully. Server log confirmed: Health check passed: Step 2: Smoke Test ✅ All Checks PassRan the smoke test script from the QA instructions via
Full stdout from the sandbox: Step 3: Regression Check ✅Full test suite: 496 passed, 0 failed (5 warnings — all pre-existing deprecation notices). Key test files for changed code:
Additional Validation
SummaryAgent-server mode works correctly end-to-end. The abstraction cleanly separates the connection phase (sandbox vs direct) while sharing all agent-server HTTP calls. Cloud sandbox mode is unaffected (full regression suite green). Minor issues found:
This QA report was created by an AI assistant (OpenHands) on behalf of @xingyaoww. |
Summary
Adds a dual execution backend so the automation engine can run against either:
This lets open source users deploy the automation engine against their own agent-server (e.g., running in a k8s cluster) while Cloud users continue using sandboxes unchanged.
How it works
Both modes use the same agent-server HTTP APIs (
/api/file/upload,/api/bash/start_bash_command, etc.). The only difference is how the agent-server URL is obtained:AGENT_SERVERfromexposed_urlsAUTOMATION_AGENT_SERVER_URLconfigThe branching happens at connection time via an
AgentConnectionabstraction. Everything after (tarball upload, bash execution, callback) is shared.Config
Enable agent-server mode by setting two env vars:
When
AUTOMATION_AGENT_SERVER_URLis set, the engine:AGENT_SERVER_URLinstead ofSANDBOX_IDinto env varsFiles changed
config.pyagent_server_url,agent_server_api_key,is_agent_server_modeexecution.pyAgentConnectionabstraction + dual-mode connect helpers.dispatch_automation()andrun_automation()branch on mode.dispatcher.pydispatch_automation()watchdog.pyutils/sandbox.pyverify_run_status()accepts agent-server URL directlyapp.pyWhat's NOT in this draft (future work)
RemoteWorkspacevsOpenHandsCloudWorkspace)command_idcolumnRefs #62
This PR was created by an AI assistant (OpenHands) on behalf of @xingyaoww.