Summary
Add "example automation" regression tests — a suite of end-to-end tests that exercise specific automation features (MCP servers, secrets, skills, plugins, repo cloning, event triggers) by running real automations against a live deployment. This mirrors the example tests pattern in the SDK repo and extends the existing integration tests in the deploy repo.
Motivation
Issue #93 (MCP servers not working in automation conversations) revealed a gap in test coverage: the existing integration tests verify the automation API surface (CRUD, dispatch lifecycle, timeouts) but don't verify that specific SDK features actually work inside automation sandboxes. The MCP bug went undetected because no test exercised workspace.get_mcp_config() in an actual dispatched automation.
Currently tested (in deploy/automation/integration/):
- ✅ CRUD API lifecycle (
test_automation_api.py)
- ✅ Basic dispatch lifecycle — sandbox starts, runs, completes (
test_e2e_dispatch.py)
- ✅ Timeout behavior (
test_e2e_timeout.py)
- ✅ Preset prompt creation + dispatch (
test_preset_prompt_api.py)
- ✅ Upload API (
test_upload_api.py)
Not tested (features that could silently break):
- ❌ MCP server configuration is available and functional in automation conversations
- ❌ Secrets are actually injectable and resolvable (current test calls
get_secrets() but doesn't verify the agent can use them)
- ❌ Skills loading works correctly (public, user, project, org skills)
- ❌ Plugin preset E2E lifecycle
- ❌ Event-triggered automation E2E (webhook → dispatch → completion)
- ❌ Repository cloning works inside automations
- ❌ Repo-level skills (AGENTS.md) are loaded from cloned repos
Proposed Approach
Inspiration: SDK Example Tests
The SDK repo has a well-established pattern (tests/examples/test_examples.py):
- Example scripts in
examples/ each exercise one SDK feature
- Test runner discovers and runs each script as a subprocess
- Success criteria: exit code 0 +
EXAMPLE_COST: marker in stdout
- Results: per-example JSON files → markdown report
- CI: nightly schedule +
test-examples label on PRs + manual dispatch
- Reporting: results posted as PR/issue comments
Proposed: Example Automations
Apply the same philosophy to automations, building on the existing test infrastructure in deploy/automation/integration/:
1. Define Example Automation Scenarios
Each scenario is a prompt (or tarball) that exercises a specific feature and produces a verifiable outcome:
| Scenario |
What It Tests |
Verification |
mcp_config_available |
MCP servers from user settings are accessible |
Automation prints list of configured MCP server names |
secrets_injectable |
Secrets are injected and resolvable |
Automation reads a known secret and prints a hash/partial value |
skills_loaded |
Skills are loaded from agent server |
Automation prints loaded skill count > 0 |
repo_clone_and_skills |
Repo cloning + project skills from cloned repo |
Clone a test repo, verify AGENTS.md skills loaded |
plugin_preset_e2e |
Plugin preset lifecycle |
Create via /v1/preset/plugin, dispatch, verify COMPLETED |
prompt_with_repos |
Prompt preset with repo cloning |
Create prompt automation with repos field, dispatch, verify repo is cloned |
event_trigger_e2e |
Custom webhook → dispatch |
Register webhook, send test event, verify run created and completes |
2. Test Harness
Extend the existing parametrized test pattern in deploy/automation/integration/:
# test_e2e_example_automations.py
EXAMPLE_AUTOMATIONS = [
{
"name": "mcp-config-available",
"prompt": "List all configured MCP servers by calling workspace.get_mcp_config(). "
"Print 'MCP_SERVERS: <count>' where count is the number of servers found. "
"If no servers are configured, print 'MCP_SERVERS: 0'.",
"success_marker": "MCP_SERVERS:",
},
{
"name": "secrets-injectable",
"prompt": "List the names of all available secrets. "
"Print 'SECRETS_AVAILABLE: <count>' with the count.",
"success_marker": "SECRETS_AVAILABLE:",
},
# ... more scenarios
]
@pytest.mark.parametrize("scenario", EXAMPLE_AUTOMATIONS, ids=lambda s: s["name"])
class TestExampleAutomations:
"""Parametrized E2E tests that create, dispatch, and verify example automations."""
def test_example_automation_completes(self, scenario, client, base_url, auth_headers):
# 1. Create automation from prompt
# 2. Dispatch
# 3. Poll until COMPLETED (or FAILED)
# 4. Verify success (run status == COMPLETED)
# 5. Cleanup
...
3. CI Integration
Add to the existing automation-integration-tests.yaml workflow in the deploy repo, or create a separate workflow:
- Trigger: after deploy to staging succeeds, nightly schedule, manual dispatch,
test-examples label on automation repo PRs
- Gate releases: block new automation service versions from deploying to production until all example automations pass on staging
- Reporting: post results to a tracking issue (like SDK does with issue #976) or as PR comments
4. Where to Put It
The tests should live in deploy/automation/integration/ alongside the existing integration tests, since:
- They require a live deployment (same as existing tests)
- The CI infrastructure is already in the deploy repo
- They share the same
conftest.py fixtures (base_url, auth_headers, client, create_automation)
The example automation definitions (prompts/tarballs) could be:
- Inline in the test file (for prompt-based scenarios)
- In a
test_example_tarballs/ subdirectory (for tarball-based scenarios that need custom scripts)
Implementation Plan
-
Phase 1 — Prompt-based scenarios (low effort, high value)
- Add
test_e2e_example_automations.py with 3–4 prompt-based scenarios
- Each scenario: create from prompt → dispatch → poll → verify COMPLETED
- Start with: basic prompt, secrets availability, MCP config check
-
Phase 2 — Feature-specific tarballs (medium effort)
- Add custom test tarballs that exercise specific features with more precise verification
- Each tarball's
main.py prints specific markers (like SDK's EXAMPLE_COST:)
- Scenarios: MCP tool invocation, secret resolution, skill loading verification
-
Phase 3 — Advanced scenarios (higher effort)
- Plugin preset E2E
- Event-triggered automation E2E (requires webhook registration + event delivery)
- Repo cloning + project skills
- Cross-version compatibility (test against specific SDK versions)
-
Phase 4 — Release gating
- Integrate into the release pipeline so new versions can't ship unless all example automations pass
- Nightly runs to catch regressions from SDK or platform changes
Related
This issue was created by an AI agent (OpenHands) on behalf of the user.
Summary
Add "example automation" regression tests — a suite of end-to-end tests that exercise specific automation features (MCP servers, secrets, skills, plugins, repo cloning, event triggers) by running real automations against a live deployment. This mirrors the example tests pattern in the SDK repo and extends the existing integration tests in the deploy repo.
Motivation
Issue #93 (MCP servers not working in automation conversations) revealed a gap in test coverage: the existing integration tests verify the automation API surface (CRUD, dispatch lifecycle, timeouts) but don't verify that specific SDK features actually work inside automation sandboxes. The MCP bug went undetected because no test exercised
workspace.get_mcp_config()in an actual dispatched automation.Currently tested (in
deploy/automation/integration/):test_automation_api.py)test_e2e_dispatch.py)test_e2e_timeout.py)test_preset_prompt_api.py)test_upload_api.py)Not tested (features that could silently break):
get_secrets()but doesn't verify the agent can use them)Proposed Approach
Inspiration: SDK Example Tests
The SDK repo has a well-established pattern (
tests/examples/test_examples.py):examples/each exercise one SDK featureEXAMPLE_COST:marker in stdouttest-exampleslabel on PRs + manual dispatchProposed: Example Automations
Apply the same philosophy to automations, building on the existing test infrastructure in
deploy/automation/integration/:1. Define Example Automation Scenarios
Each scenario is a prompt (or tarball) that exercises a specific feature and produces a verifiable outcome:
mcp_config_availablesecrets_injectableskills_loadedrepo_clone_and_skillsplugin_preset_e2e/v1/preset/plugin, dispatch, verify COMPLETEDprompt_with_reposreposfield, dispatch, verify repo is clonedevent_trigger_e2e2. Test Harness
Extend the existing parametrized test pattern in
deploy/automation/integration/:3. CI Integration
Add to the existing
automation-integration-tests.yamlworkflow in the deploy repo, or create a separate workflow:test-exampleslabel on automation repo PRs4. Where to Put It
The tests should live in
deploy/automation/integration/alongside the existing integration tests, since:conftest.pyfixtures (base_url,auth_headers,client,create_automation)The example automation definitions (prompts/tarballs) could be:
test_example_tarballs/subdirectory (for tarball-based scenarios that need custom scripts)Implementation Plan
Phase 1 — Prompt-based scenarios (low effort, high value)
test_e2e_example_automations.pywith 3–4 prompt-based scenariosPhase 2 — Feature-specific tarballs (medium effort)
main.pyprints specific markers (like SDK'sEXAMPLE_COST:)Phase 3 — Advanced scenarios (higher effort)
Phase 4 — Release gating
Related
tests/examples/test_examples.py.github/workflows/run-examples.ymldeploy/automation/integration/This issue was created by an AI agent (OpenHands) on behalf of the user.