Add example automation regression tests (E2E feature coverage)

## Summary

Add "example automation" regression tests — a suite of end-to-end tests that exercise specific automation features (MCP servers, secrets, skills, plugins, repo cloning, event triggers) by running real automations against a live deployment. This mirrors the [example tests pattern in the SDK repo](https://github.com/OpenHands/software-agent-sdk/blob/main/tests/examples/test_examples.py) and extends the [existing integration tests in the deploy repo](https://github.com/OpenHands/deploy/tree/main/automation/integration).

## Motivation

Issue #93 (MCP servers not working in automation conversations) revealed a gap in test coverage: the existing integration tests verify the automation API surface (CRUD, dispatch lifecycle, timeouts) but don't verify that **specific SDK features actually work inside automation sandboxes**. The MCP bug went undetected because no test exercised `workspace.get_mcp_config()` in an actual dispatched automation.

Currently tested (in `deploy/automation/integration/`):
- ✅ CRUD API lifecycle (`test_automation_api.py`)
- ✅ Basic dispatch lifecycle — sandbox starts, runs, completes (`test_e2e_dispatch.py`)
- ✅ Timeout behavior (`test_e2e_timeout.py`)
- ✅ Preset prompt creation + dispatch (`test_preset_prompt_api.py`)
- ✅ Upload API (`test_upload_api.py`)

**Not tested** (features that could silently break):
- ❌ MCP server configuration is available and functional in automation conversations
- ❌ Secrets are actually injectable and resolvable (current test calls `get_secrets()` but doesn't verify the agent can use them)
- ❌ Skills loading works correctly (public, user, project, org skills)
- ❌ Plugin preset E2E lifecycle
- ❌ Event-triggered automation E2E (webhook → dispatch → completion)
- ❌ Repository cloning works inside automations
- ❌ Repo-level skills (AGENTS.md) are loaded from cloned repos

## Proposed Approach

### Inspiration: SDK Example Tests

The SDK repo has a well-established pattern ([`tests/examples/test_examples.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/tests/examples/test_examples.py)):

1. **Example scripts** in `examples/` each exercise one SDK feature
2. **Test runner** discovers and runs each script as a subprocess
3. **Success criteria**: exit code 0 + `EXAMPLE_COST:` marker in stdout
4. **Results**: per-example JSON files → markdown report
5. **CI**: nightly schedule + `test-examples` label on PRs + manual dispatch
6. **Reporting**: results posted as PR/issue comments

### Proposed: Example Automations

Apply the same philosophy to automations, building on the existing test infrastructure in `deploy/automation/integration/`:

#### 1. Define Example Automation Scenarios

Each scenario is a prompt (or tarball) that exercises a specific feature and produces a verifiable outcome:

| Scenario | What It Tests | Verification |
|----------|--------------|--------------|
| `mcp_config_available` | MCP servers from user settings are accessible | Automation prints list of configured MCP server names |
| `secrets_injectable` | Secrets are injected and resolvable | Automation reads a known secret and prints a hash/partial value |
| `skills_loaded` | Skills are loaded from agent server | Automation prints loaded skill count > 0 |
| `repo_clone_and_skills` | Repo cloning + project skills from cloned repo | Clone a test repo, verify AGENTS.md skills loaded |
| `plugin_preset_e2e` | Plugin preset lifecycle | Create via `/v1/preset/plugin`, dispatch, verify COMPLETED |
| `prompt_with_repos` | Prompt preset with repo cloning | Create prompt automation with `repos` field, dispatch, verify repo is cloned |
| `event_trigger_e2e` | Custom webhook → dispatch | Register webhook, send test event, verify run created and completes |

#### 2. Test Harness

Extend the existing parametrized test pattern in `deploy/automation/integration/`:

```python
# test_e2e_example_automations.py

EXAMPLE_AUTOMATIONS = [
    {
        "name": "mcp-config-available",
        "prompt": "List all configured MCP servers by calling workspace.get_mcp_config(). "
                  "Print 'MCP_SERVERS: <count>' where count is the number of servers found. "
                  "If no servers are configured, print 'MCP_SERVERS: 0'.",
        "success_marker": "MCP_SERVERS:",
    },
    {
        "name": "secrets-injectable", 
        "prompt": "List the names of all available secrets. "
                  "Print 'SECRETS_AVAILABLE: <count>' with the count.",
        "success_marker": "SECRETS_AVAILABLE:",
    },
    # ... more scenarios
]

@pytest.mark.parametrize("scenario", EXAMPLE_AUTOMATIONS, ids=lambda s: s["name"])
class TestExampleAutomations:
    """Parametrized E2E tests that create, dispatch, and verify example automations."""
    
    def test_example_automation_completes(self, scenario, client, base_url, auth_headers):
        # 1. Create automation from prompt
        # 2. Dispatch
        # 3. Poll until COMPLETED (or FAILED)
        # 4. Verify success (run status == COMPLETED)
        # 5. Cleanup
        ...
```

#### 3. CI Integration

Add to the existing `automation-integration-tests.yaml` workflow in the deploy repo, or create a separate workflow:

- **Trigger**: after deploy to staging succeeds, nightly schedule, manual dispatch, `test-examples` label on automation repo PRs
- **Gate releases**: block new automation service versions from deploying to production until all example automations pass on staging
- **Reporting**: post results to a tracking issue (like SDK does with issue #976) or as PR comments

#### 4. Where to Put It

The tests should live in `deploy/automation/integration/` alongside the existing integration tests, since:
- They require a live deployment (same as existing tests)
- The CI infrastructure is already in the deploy repo
- They share the same `conftest.py` fixtures (`base_url`, `auth_headers`, `client`, `create_automation`)

The example automation definitions (prompts/tarballs) could be:
- Inline in the test file (for prompt-based scenarios)
- In a `test_example_tarballs/` subdirectory (for tarball-based scenarios that need custom scripts)

## Implementation Plan

1. **Phase 1 — Prompt-based scenarios** (low effort, high value)
   - Add `test_e2e_example_automations.py` with 3–4 prompt-based scenarios
   - Each scenario: create from prompt → dispatch → poll → verify COMPLETED
   - Start with: basic prompt, secrets availability, MCP config check

2. **Phase 2 — Feature-specific tarballs** (medium effort)
   - Add custom test tarballs that exercise specific features with more precise verification
   - Each tarball's `main.py` prints specific markers (like SDK's `EXAMPLE_COST:`)
   - Scenarios: MCP tool invocation, secret resolution, skill loading verification

3. **Phase 3 — Advanced scenarios** (higher effort)
   - Plugin preset E2E
   - Event-triggered automation E2E (requires webhook registration + event delivery)
   - Repo cloning + project skills
   - Cross-version compatibility (test against specific SDK versions)

4. **Phase 4 — Release gating**
   - Integrate into the release pipeline so new versions can't ship unless all example automations pass
   - Nightly runs to catch regressions from SDK or platform changes

## Related

- #93 — MCP servers configured in user settings do not work in automation-triggered conversations (the bug this would catch)
- SDK example tests: [`tests/examples/test_examples.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/tests/examples/test_examples.py)
- SDK CI workflow: [`.github/workflows/run-examples.yml`](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/workflows/run-examples.yml)
- Existing automation integration tests: [`deploy/automation/integration/`](https://github.com/OpenHands/deploy/tree/main/automation/integration)

---

_This issue was created by an AI agent (OpenHands) on behalf of the user._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example automation regression tests (E2E feature coverage) #94

Summary

Motivation

Proposed Approach

Inspiration: SDK Example Tests

Proposed: Example Automations

1. Define Example Automation Scenarios

2. Test Harness

3. CI Integration

4. Where to Put It

Implementation Plan

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scenario	What It Tests	Verification
`mcp_config_available`	MCP servers from user settings are accessible	Automation prints list of configured MCP server names
`secrets_injectable`	Secrets are injected and resolvable	Automation reads a known secret and prints a hash/partial value
`skills_loaded`	Skills are loaded from agent server	Automation prints loaded skill count > 0
`repo_clone_and_skills`	Repo cloning + project skills from cloned repo	Clone a test repo, verify AGENTS.md skills loaded
`plugin_preset_e2e`	Plugin preset lifecycle	Create via `/v1/preset/plugin`, dispatch, verify COMPLETED
`prompt_with_repos`	Prompt preset with repo cloning	Create prompt automation with `repos` field, dispatch, verify repo is cloned
`event_trigger_e2e`	Custom webhook → dispatch	Register webhook, send test event, verify run created and completes

Add example automation regression tests (E2E feature coverage) #94

Description

Summary

Motivation

Proposed Approach

Inspiration: SDK Example Tests

Proposed: Example Automations

1. Define Example Automation Scenarios

2. Test Harness

3. CI Integration

4. Where to Put It

Implementation Plan

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions