Skip to content

[FR-126] Rollback#274

Open
eniko-dif wants to merge 2 commits intotemporalio:mainfrom
eniko-dif:eniko/feature-rollback
Open

[FR-126] Rollback#274
eniko-dif wants to merge 2 commits intotemporalio:mainfrom
eniko-dif:eniko/feature-rollback

Conversation

@eniko-dif
Copy link
Copy Markdown

@eniko-dif eniko-dif commented Apr 14, 2026

What was changed

Add a separate rollback config to the deployment specification that can be used to have different rollback behavior from the rollout one. It supports AllAtOnce or Progressive and does not have a gate. Uses the LastCurrentTime of a deployment to decide if the target version supposed to be rolled out or rolled back.

Why?

Open feature issue: #126

Checklist

  1. Closes [Feature Request] Rollback mode #126

  2. How was this tested:

Unit and integration tests (ongoing actual test with local setup).

  1. Any docs updates needed?

Yes, but needs to be decided where.

@eniko-dif eniko-dif requested review from a team and jlegrone as code owners April 14, 2026 15:35
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.


// How to rollback to a previous version. If not specified, defaults to AllAtOnce strategy.
// +optional
RollbackStrategy *RollbackStrategy `json:"rollback,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add a comments clarifying a couple things:

  1. How/when a rollback is triggered. I think right now it's only when the worker's pod spec is updated, AND the build ID has been previously set as the default worker version?
  2. How rollback strategy applies to new vs. currently running workflows. Ie. running workflows that are pinned will remain on the version we are rolling back from, while new workflows will start on the rollback version.

Comment on lines +370 to +371
// RollbackProgressive gradually ramps traffic back to the previous version.
RollbackProgressive DefaultVersionRollbackStrategy = "Progressive"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we have a progressive rollout mode, we should also require that there be at least 1 step defined. It could be reasonable to assume that choosing the progressive rollout mode without also adding steps would reuse the existing steps from the rollout config. Without steps, progressive and all at once modes would behave the same way, right?


// DefaultVersionRollbackStrategy describes how to cut over during rollback to a previous version.
// +kubebuilder:validation:Enum=AllAtOnce;Progressive
type DefaultVersionRollbackStrategy string
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we have a mix of existing styles between SunsetStrategy and DefaultVersionUpdateStrategy, but I think I prefer dropping the Default prefix.

Suggested change
type DefaultVersionRollbackStrategy string
type VersionRollbackStrategy string

return false
}

l.Info("Detected rollback scenario using LastCurrentTime. "+
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an additional condition to only consider it a rollback if LastCurrentTime is recent (maybe defaulting to <24h before now). Otherwise accidentally deploying a version from weeks ago could skip the normal rollout steps, which might be unexpected.

// - No gate workflow (already trusted version)
// - No manual mode (rollbacks should be automatic)
// - Default to AllAtOnce for fast recovery
type RollbackStrategy struct {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a MaxRollbackAge field (struggling to come up with a better name)? I think a reasonable default would be 24h. If the LastCurrentTime is less than time.Now() - MaxRollbackAge, then the controller should treat the deployment as a standard rollout rather than a rollback.

Comment on lines +60 to +61
if s.RollbackStrategy == nil {
s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in behavior that should be called out in release notes.

Is there a way to disable rollbacks and keep the existing behavior of treating every version change as a new rollout? This should be documented as well.

Comment on lines +425 to +427
// Steps to execute progressive rollbacks. Only required when strategy is "Progressive".
// +optional
Steps []RolloutStep `json:"steps,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite convinced that progressive rollbacks are necessary. What are the reasons to prefer a progressive rollback vs. disabling rollback and using the progressive rollout strategy?

It's probably not a big deal to maintain this functionality, but it also looks straightforward to add in a future release without introducing breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Rollback mode

3 participants