Skip to content

Add Temporal workflow engine to automation chart#516

Draft
malhotra5 wants to merge 8 commits into
mainfrom
feat/automation-temporal
Draft

Add Temporal workflow engine to automation chart#516
malhotra5 wants to merge 8 commits into
mainfrom
feat/automation-temporal

Conversation

@malhotra5
Copy link
Copy Markdown
Member

Summary

This PR adds Temporal as a dependency for the automation service to provide durable workflow execution for automation runs.

Changes

charts/automation/Chart.yaml

charts/automation/values.yaml

  • Added temporal configuration section with:
    • external: Configuration for connecting to external Temporal cluster
    • server: Temporal subchart configuration (PostgreSQL persistence)
    • admintools: Admin tools pod configuration
    • web: Temporal Web UI configuration
  • Added worker deployment configuration

charts/automation/templates/_env.yaml

  • Added Temporal env vars (AUTOMATION_TEMPORAL_HOST, PORT, NAMESPACE, TASK_QUEUE)
  • Dynamically sets host based on whether Temporal is deployed as subchart

charts/automation/templates/temporal-init.yaml (new)

  • Pre-install hook Job that creates Temporal databases in PostgreSQL
  • Creates temporal and temporal_visibility databases
  • Creates temporal_user with appropriate permissions
  • Enables reusing parent chart's PostgreSQL instance

charts/automation/templates/worker-deployment.yaml (new)

  • Separate deployment for Temporal workers
  • Waits for Temporal to be ready before starting
  • Runs automation.temporal.worker module

charts/openhands/values.yaml

  • Added temporal configuration in automation subchart values
  • Pre-configured to use parent's PostgreSQL (oh-main-postgresql:5432)
  • Added worker configuration

DB Reuse Pattern

When automation is deployed as a subchart of openhands:

  1. temporal.enabled=true deploys Temporal pods
  2. temporal-init job creates databases in parent's PostgreSQL
  3. Temporal server connects to parent's PostgreSQL via connectAddr
  4. Automation worker connects to Temporal via internal service name

This follows the same pattern as other subcharts (e.g., keycloak) that reuse the parent's PostgreSQL instance.

Deployment Example

To enable Temporal when deploying automation as a subchart:

automation:
  enabled: true
  temporal:
    enabled: true
    server:
      config:
        persistence:
          datastores:
            default:
              sql:
                connectAddr: "oh-main-postgresql:5432"
            visibility:
              sql:
                connectAddr: "oh-main-postgresql:5432"

This PR was created by an AI assistant (OpenHands) on behalf of the user.

openhands-agent and others added 7 commits April 3, 2026 21:28
This PR adds Temporal as a dependency for the automation service to provide
durable workflow execution for automation runs.

## Changes

### charts/automation/Chart.yaml
- Added Temporal chart dependency (version >=1.0.0-0)
- Bumped chart version to 0.2.0

### charts/automation/values.yaml
- Added temporal configuration section with:
  - external: Configuration for connecting to external Temporal cluster
  - server: Temporal subchart configuration (PostgreSQL persistence)
  - admintools: Admin tools pod configuration
  - web: Temporal Web UI configuration
- Added worker deployment configuration

### charts/automation/templates/_env.yaml
- Added Temporal env vars (AUTOMATION_TEMPORAL_HOST, PORT, NAMESPACE, TASK_QUEUE)
- Dynamically sets host based on whether Temporal is deployed as subchart

### charts/automation/templates/temporal-init.yaml (new)
- Pre-install hook Job that creates Temporal databases in PostgreSQL
- Creates temporal and temporal_visibility databases
- Creates temporal_user with appropriate permissions
- Enables reusing parent chart's PostgreSQL instance

### charts/automation/templates/worker-deployment.yaml (new)
- Separate deployment for Temporal workers
- Waits for Temporal to be ready before starting
- Runs automation.temporal.worker module

### charts/openhands/values.yaml
- Added temporal configuration in automation subchart values
- Pre-configured to use parent's PostgreSQL (oh-main-postgresql:5432)
- Added worker configuration

## DB Reuse Pattern

When automation is deployed as a subchart of openhands:
1. temporal.enabled=true deploys Temporal pods
2. temporal-init job creates databases in parent's PostgreSQL
3. Temporal server connects to parent's PostgreSQL via connectAddr
4. automation worker connects to Temporal via internal service name

This follows the same pattern as other subcharts (e.g., keycloak) that
reuse the parent's PostgreSQL instance.

---
*This PR was created by an AI assistant (OpenHands) on behalf of the user.*

Co-authored-by: openhands <openhands@all-hands.dev>
- Bump openhands chart version from 0.3.11 to 0.3.12
- Update automation dependency version from 0.1.0 to 0.2.0

This is required since values.yaml was modified to add Temporal configuration.

Co-authored-by: openhands <openhands@all-hands.dev>
The temporal-db-init job was configured as a pre-install/pre-upgrade Helm hook,
which caused a deadlock during deployment:

1. Helm hooks run before main release resources are created
2. The job waits for PostgreSQL to be available
3. PostgreSQL is deployed as a subchart in the main release
4. Deadlock: hook waits for PostgreSQL, but PostgreSQL can't deploy until hooks complete

Fix: Remove the hook annotations so the job deploys alongside the main release.
The job already has a built-in wait loop (60 iterations × 5 seconds = 5 minutes)
that will wait for PostgreSQL to become available.

Co-authored-by: openhands <openhands@all-hands.dev>
…config

- Add connectProtocol: 'tcp' to both default and visibility datastores
  (required by Temporal chart 1.0.0-rc.3 for SQL connections)
- Set createDatabase: false since databases are created by the
  temporal-db-init job, avoiding permission issues with temporal_user

Co-authored-by: openhands <openhands@all-hands.dev>
The Temporal subchart creates services named {release-name}-frontend,
not {release-name}-temporal-frontend. Updated the templates to use
the correct service name pattern.

Co-authored-by: openhands <openhands@all-hands.dev>
Two changes to fix conflicts between ddtrace and Temporal's workflow sandbox:

1. Worker deployment: Remove ddtrace-run wrapper since it conflicts with
   Temporal's workflow sandbox import system (beartype circular import)

2. API deployment: Set AUTOMATION_SKIP_WORKER=true when temporal.enabled,
   so the API doesn't create an in-process worker (which would fail with
   ddtrace). The separate worker deployment handles workflow execution.

Co-authored-by: openhands <openhands@all-hands.dev>
Add a Job that runs after Temporal is deployed to create the 'default'
namespace (or whatever is configured in temporal.namespace). This ensures
the automation worker and API can connect to Temporal immediately.

The job:
- Waits for Temporal frontend to be healthy
- Checks if the namespace already exists
- Creates the namespace with 72h retention if it doesn't exist

Also added temporal.namespace to values.yaml for configurability.

Co-authored-by: openhands <openhands@all-hands.dev>
# When deployed as subchart of openhands:
# - Temporal reuses the parent's PostgreSQL instance
# - Set temporal.enabled=true with temporal.server.config.persistence pointing to parent DB
temporal:
Copy link
Copy Markdown
Contributor

@jlav jlav Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reconsider provisioning of the temporal cluster inside of this chart.

If we ever want to use Temporal in other parts of our stack (runtime-api, anyone?) we would want to share an instance across them.

Where we need isolation between different application's workflows, namespaces are a perfect fit for that.

As a side note, I'm advocating that we stop this pattern of provisioning dependencies in all of our charts. We generally don't use runtime-api and automation in standalone configurations where they are deployed without first deploying openhands, so I'd like to stop maintaining these additional paths that we never actually exercise.

Comment on lines +192 to +195
external:
host: "temporal-frontend"
port: 7233
namespace: "default"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move these outside the temporal values.

This pattern of adding values that don't actually exist happens elsewhere in this repo, but it's something we should get away from. It's confusing because I have to look at the chart source to understand whether or not a value is actually used in the dependency or just by us, and when we started adding linters to this repo it may fail should the dependency not allow additional properties.

Comment on lines +273 to +275
limits:
cpu: 200m
memory: 256Mi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What limits does Temporal set by default? Do we know why we need to override them, are the default limits too much, too little?

Comment on lines +11 to +15
Note: This is intentionally NOT a Helm hook because PostgreSQL is deployed
as part of the same release. Using pre-install/pre-upgrade hooks would create
a deadlock where this job waits for PostgreSQL, but PostgreSQL can't be
deployed until the hooks complete.
*/}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating jobs without hooks, and leaning on ttlSecondsAfterFinished for cleanup, means that if these jobs fail we have to wait 300 seconds before we can run another helm deploy.

Most fields on a Job are also immutable, so we can't update the job after it's been deployed. We have to wait for it to be cleaned up. Aaaand we have a running timer once the job fails to capture the logs.

Recommend either using an init container or you can create a 1 replica deployment that ends with a container that just runs pause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants