Skip to content

[Feature Request] Version retention policy: keep last N versions after drain (minVersionsToKeep) #270

@dheeraj-gantala

Description

@dheeraj-gantala

Is your feature request related to a problem? Please describe.

When a deprecated worker version drains (all pinned workflows finish), the controller immediately deletes its Deployment. There is no way to retain the last N versions as a safety net after they drain.

This makes fast rollbacks harder than they should be, if a bad deploy is caught late, the previous version's pods are already gone and restoring them requires a full re-deploy cycle. Teams coming from traditional blue/green deployments expect a warm standby to exist. There is also no buffer for slow-draining edge cases where a version drains just before an issue is detected.

Describe the solution you'd like

Add an optional retention block to the TemporalWorkerDeployment spec:

apiVersion: temporal.io/v1alpha1
kind: TemporalWorkerDeployment
metadata:
  name: my-worker
spec:
  replicas: 3
  retention:
    minVersionsToKeep: 3   # keep at least the last 3 versions
    minAgeDuration: 24h    # optional: never delete a version younger than this
  rollout:
    strategy: Progressive
    steps:
      - rampPercentage: 10
        pauseDuration: 5m

Proposed behaviour:

  • When a version drains, the controller scales its Deployment to 0 replicas (no compute cost) but does not delete it if fewer than minVersionsToKeep versions would remain after deletion. Oldest versions beyond the retention count are deleted first.
  • minAgeDuration is an optional companion: a drained version younger than this is preserved regardless of count, giving a time-based safety window after each deploy.
  • Retained-but-drained versions appear in .status with reason RetainedByPolicy so operators understand why they haven't been cleaned up.
  • If neither field is set, behaviour is unchanged from today - delete on drain.

Additional context

Workarounds exist today but all have drawbacks:

  • Keep a long-lived pinned workflow per version — couples infrastructure retention to business workflow state, which is fragile
  • Manual rollout strategy + CI/CD bookkeeping — pushes version lifecycle policy outside the controller, fragmenting the operational model
  • Pure TTL (keepFor: 72h) — simpler to implement but less useful than count-based retention since deploy cadence varies

This change would be purely additive to the spec with no impact on existing behaviour when the field is absent. It only touches the cleanup reconciliation loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions