Include WCI connectivity health in worker deployment API#10778
Conversation
32f2623 to
bb3d98a
Compare
bb3d98a to
e442af1
Compare
…810) **What changed?** Added `ComputeStatus` to the Worker Deployment API with a `ProviderValidationStatus` that tracks the result of the most recent connectivity check between Temporal and a customer's compute resource. An empty error message means validation passed; a non-empty message describes what failed. **Why?** Enables surfacing the connectivity health between Temporal and a customer's compute resource (e.g. Lambda) through the Worker Deployment API, so the UI can show ongoing validations status of compute configs without additional queries. **Breaking changes** None. All additions are new optional fields; existing clients are unaffected. **Server PR** temporalio/temporal#10778
…(#810) **What changed?** Added `ComputeStatus` to the Worker Deployment API with a `ProviderValidationStatus` that tracks the result of the most recent connectivity check between Temporal and a customer's compute resource. An empty error message means validation passed; a non-empty message describes what failed. **Why?** Enables surfacing the connectivity health between Temporal and a customer's compute resource (e.g. Lambda) through the Worker Deployment API, so the UI can show ongoing validations status of compute configs without additional queries. **Breaking changes** None. All additions are new optional fields; existing clients are unaffected. **Server PR** temporalio/temporal#10778
8f0d9fe to
15c5335
Compare
15c5335 to
defa8c3
Compare
There was a problem hiding this comment.
Pull request overview
Expose Worker Controller Instance (WCI) connectivity/validation health as ComputeStatus on worker deployment version summaries, so list/describe APIs can surface compute reachability/validation results per version.
Changes:
- Add
compute_statusto internal deployment/version workflow state protos and regenerate Go protos. - Update version workflow to ingest WCI validation status signals and propagate
ComputeStatusinto deployment summaries/memo. - Plumb
ComputeStatusthrough worker deployment API response mapping; bump related module versions.
Reviewed changes
Copilot reviewed 6 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| service/worker/workerdeployment/workflow.go | Include ComputeStatus when mapping internal version summaries to API version summaries. |
| service/worker/workerdeployment/version_workflow.go | Listen for WCI validation status signals and sync updated compute status to the deployment workflow. |
| service/worker/workerdeployment/compute_util.go | Add helper converting WCI validation status to public deploymentpb.ComputeStatus. |
| service/worker/workerdeployment/client.go | Include ComputeStatus in DescribeWorkerDeployment response mapping for each version summary. |
| proto/internal/temporal/server/api/deployment/v1/message.proto | Add compute_status fields to internal workflow state protos. |
| api/deployment/v1/message.pb.go | Regenerated output for internal proto changes. |
| cmd/tools/getproto/files.go | Generated proto import map update (formatting issues introduced). |
| go.mod | Bump go.temporal.io/api and go.temporal.io/auto-scaled-workers versions. |
| go.sum | Update sums for bumped module versions. |
Files not reviewed (2)
- api/deployment/v1/message.pb.go: Generated file
- cmd/tools/getproto/files.go: Generated file
6e171dd to
0dc9e8c
Compare
0dc9e8c to
2d15ad9
Compare
| } | ||
|
|
||
| // Version gate for sync-validation-status signal to prevent NDEs during rollback | ||
| if workflow.GetVersion(ctx, "sync-validation-status-signal", workflow.DefaultVersion, 0) >= 0 { |
There was a problem hiding this comment.
It seems we've been doing this patchings for other signal handlers, but I don't think patching (GetVersion) really helps with NDEs related to the new signal. Because the handler registration itself does not create history events and is safe to hit during replay of a workflow ran in the previous version without signal handler.
There was a problem hiding this comment.
Good point. Will clean this up in a follow-up PR
rkannan82
left a comment
There was a problem hiding this comment.
Do we need to update existing e2e test to verify this?
Good question. The unit tests in |
What changed?
DescribeWorkerDeploymentnow returns compute status per version, checking whether Temporal can successfully interact with the version's compute resource.ListWorkerDeploymentsalso returns compute status on the current, ramping, and latest version summaries, fetched in parallel.When connectivity changes (e.g. Lambda becomes unreachable or is restored), WCI signals the version workflow, which propagates the update to the deployment workflow memo. The list view reads from the memo — versions that have been validated since deployment will show their status immediately; others will appear once the first validation runs.
Why?
Allows customers to see whether Temporal can successfully interact with their compute resource directly from the Worker Deployments list and detail views, without navigating into each individual version.
How did you test it?