OCPBUGS-62517: Fix DeploymentController to comply with OpenShift Available API contract#2058
OCPBUGS-62517: Fix DeploymentController to comply with OpenShift Available API contract#2058jianzhangbjz wants to merge 1 commit into
Conversation
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hi @joelanford , could you help approve it when you get a chance? Thanks! |
| // Per API contract, remain Available=True during normal operations | ||
| availableCondition = availableCondition. | ||
| WithStatus(opv1.ConditionTrue). | ||
| WithMessage("Deployment is rolling out"). |
There was a problem hiding this comment.
Maybe we want to set the message here "Waiting for Deployment" to be consistent with previous semantics and any tooling?
655055c to
af0c5a8
Compare
|
@jianzhangbjz: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: grokspawn, jianzhangbjz The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @bertinatto @p0lyn0mial, could you help approve it? Thanks! |
jsafrane
left a comment
There was a problem hiding this comment.
I am afraid that this will need far more thoughts and work. And a lot of unit tests.
| // Check if deployment is actively being updated (spec change being rolled out) | ||
| // This is the primary indicator of a "normal upgrade" in progress | ||
| if deployment.Generation != deployment.Status.ObservedGeneration { | ||
| // Spec has changed, deployment controller is working on rolling it out | ||
| // Per API contract, remain Available during this normal upgrade | ||
| return true | ||
| } |
There was a problem hiding this comment.
This comment + return true is wrong. Consider initial Deployment creation during cluster installation - the status is not populated yet (ObservedGeneration is 0) and there is no replica yet. The overall status of such operator is definitely not Available = true.
| // If we're actively rolling out, remain Available per API contract | ||
| if isRollingOut { | ||
| return true | ||
| } |
There was a problem hiding this comment.
Again, during installation the status should be Available = false.
| // Special case: brand new deployment with no status conditions yet | ||
| // This happens during initial deployment before Kubernetes has had a chance to update status | ||
| if len(deployment.Status.Conditions) == 0 && deployment.Status.ObservedGeneration == 0 { | ||
| // Deployment just created, is progressing normally | ||
| return true | ||
| } |
There was a problem hiding this comment.
This is just wrong.
- The code is not reachable
- The status should be
Available = Falseduring installation andAvailable = trueduring upgrade. This code has no clue about that.
|
BTW, many CSI driver operators use DeploymentController and we don't experience (With an exception of single node clusters - everyone gets |
Thanks for your info! Add PDB in operator-framework/operator-controller#2362, closed this PR first. |
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-62517. The bug has been updated to no longer refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The
DeploymentControllerwas violating the OpenShift API contract which states: 'A component must not report Available=False during the course of a normal upgrade'.This fix ensures Available remains True during normal rolling updates and only goes False for actual failures.
Assisted-by: Claude code