Conversation
|
|
|
|
||
| // How to rollback to a previous version. If not specified, defaults to AllAtOnce strategy. | ||
| // +optional | ||
| RollbackStrategy *RollbackStrategy `json:"rollback,omitempty"` |
There was a problem hiding this comment.
I think it would be helpful to add a comments clarifying a couple things:
- How/when a rollback is triggered. I think right now it's only when the worker's pod spec is updated, AND the build ID has been previously set as the default worker version?
- How rollback strategy applies to new vs. currently running workflows. Ie. running workflows that are pinned will remain on the version we are rolling back from, while new workflows will start on the rollback version.
| // RollbackProgressive gradually ramps traffic back to the previous version. | ||
| RollbackProgressive DefaultVersionRollbackStrategy = "Progressive" |
There was a problem hiding this comment.
I think if we have a progressive rollout mode, we should also require that there be at least 1 step defined. It could be reasonable to assume that choosing the progressive rollout mode without also adding steps would reuse the existing steps from the rollout config. Without steps, progressive and all at once modes would behave the same way, right?
|
|
||
| // DefaultVersionRollbackStrategy describes how to cut over during rollback to a previous version. | ||
| // +kubebuilder:validation:Enum=AllAtOnce;Progressive | ||
| type DefaultVersionRollbackStrategy string |
There was a problem hiding this comment.
Nit: we have a mix of existing styles between SunsetStrategy and DefaultVersionUpdateStrategy, but I think I prefer dropping the Default prefix.
| type DefaultVersionRollbackStrategy string | |
| type VersionRollbackStrategy string |
| return false | ||
| } | ||
|
|
||
| l.Info("Detected rollback scenario using LastCurrentTime. "+ |
There was a problem hiding this comment.
I think we should add an additional condition to only consider it a rollback if LastCurrentTime is recent (maybe defaulting to <24h before now). Otherwise accidentally deploying a version from weeks ago could skip the normal rollout steps, which might be unexpected.
| // - No gate workflow (already trusted version) | ||
| // - No manual mode (rollbacks should be automatic) | ||
| // - Default to AllAtOnce for fast recovery | ||
| type RollbackStrategy struct { |
There was a problem hiding this comment.
Can we add a MaxRollbackAge field (struggling to come up with a better name)? I think a reasonable default would be 24h. If the LastCurrentTime is less than time.Now() - MaxRollbackAge, then the controller should treat the deployment as a standard rollout rather than a rollback.
| if s.RollbackStrategy == nil { | ||
| s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce} |
There was a problem hiding this comment.
This is a change in behavior that should be called out in release notes.
Is there a way to disable rollbacks and keep the existing behavior of treating every version change as a new rollout? This should be documented as well.
| // Steps to execute progressive rollbacks. Only required when strategy is "Progressive". | ||
| // +optional | ||
| Steps []RolloutStep `json:"steps,omitempty"` |
There was a problem hiding this comment.
I'm not quite convinced that progressive rollbacks are necessary. What are the reasons to prefer a progressive rollback vs. disabling rollback and using the progressive rollout strategy?
It's probably not a big deal to maintain this functionality, but it also looks straightforward to add in a future release without introducing breaking changes.
What was changed
Add a separate rollback config to the deployment specification that can be used to have different rollback behavior from the rollout one. It supports AllAtOnce or Progressive and does not have a gate. Uses the LastCurrentTime of a deployment to decide if the target version supposed to be rolled out or rolled back.
Why?
Open feature issue: #126
Checklist
Closes [Feature Request] Rollback mode #126
How was this tested:
Unit and integration tests (ongoing actual test with local setup).
Yes, but needs to be decided where.