Is your feature request related to a problem? Please describe.
When a deprecated worker version drains (all pinned workflows finish), the controller immediately deletes its Deployment. There is no way to retain the last N versions as a safety net after they drain.
This makes fast rollbacks harder than they should be, if a bad deploy is caught late, the previous version's pods are already gone and restoring them requires a full re-deploy cycle. Teams coming from traditional blue/green deployments expect a warm standby to exist. There is also no buffer for slow-draining edge cases where a version drains just before an issue is detected.
Describe the solution you'd like
Add an optional retention block to the TemporalWorkerDeployment spec:
apiVersion: temporal.io/v1alpha1
kind: TemporalWorkerDeployment
metadata:
name: my-worker
spec:
replicas: 3
retention:
minVersionsToKeep: 3 # keep at least the last 3 versions
minAgeDuration: 24h # optional: never delete a version younger than this
rollout:
strategy: Progressive
steps:
- rampPercentage: 10
pauseDuration: 5m
Proposed behaviour:
- When a version drains, the controller scales its
Deployment to 0 replicas (no compute cost) but does not delete it if fewer than minVersionsToKeep versions would remain after deletion. Oldest versions beyond the retention count are deleted first.
minAgeDuration is an optional companion: a drained version younger than this is preserved regardless of count, giving a time-based safety window after each deploy.
- Retained-but-drained versions appear in
.status with reason RetainedByPolicy so operators understand why they haven't been cleaned up.
- If neither field is set, behaviour is unchanged from today - delete on drain.
Additional context
Workarounds exist today but all have drawbacks:
- Keep a long-lived pinned workflow per version — couples infrastructure retention to business workflow state, which is fragile
- Manual rollout strategy + CI/CD bookkeeping — pushes version lifecycle policy outside the controller, fragmenting the operational model
- Pure TTL (
keepFor: 72h) — simpler to implement but less useful than count-based retention since deploy cadence varies
This change would be purely additive to the spec with no impact on existing behaviour when the field is absent. It only touches the cleanup reconciliation loop.
Is your feature request related to a problem? Please describe.
When a deprecated worker version drains (all pinned workflows finish), the controller immediately deletes its
Deployment. There is no way to retain the last N versions as a safety net after they drain.This makes fast rollbacks harder than they should be, if a bad deploy is caught late, the previous version's pods are already gone and restoring them requires a full re-deploy cycle. Teams coming from traditional blue/green deployments expect a warm standby to exist. There is also no buffer for slow-draining edge cases where a version drains just before an issue is detected.
Describe the solution you'd like
Add an optional
retentionblock to theTemporalWorkerDeploymentspec:Proposed behaviour:
Deploymentto 0 replicas (no compute cost) but does not delete it if fewer thanminVersionsToKeepversions would remain after deletion. Oldest versions beyond the retention count are deleted first.minAgeDurationis an optional companion: a drained version younger than this is preserved regardless of count, giving a time-based safety window after each deploy..statuswith reasonRetainedByPolicyso operators understand why they haven't been cleaned up.Additional context
Workarounds exist today but all have drawbacks:
keepFor: 72h) — simpler to implement but less useful than count-based retention since deploy cadence variesThis change would be purely additive to the spec with no impact on existing behaviour when the field is absent. It only touches the cleanup reconciliation loop.