NO-JIRA: Fix races in machineset sync units#551
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@mdbooth: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThe PR enhances MachineSet-sync tests to wait for the controller-runtime manager informer cache (Status.ObservedGeneration and template labels), adds an ChangesMachineSet sync tests
Sequence DiagramsequenceDiagram
participant Test as Test Harness
participant APIServer as Kubernetes API Server
participant Manager as ctrl-runtime Manager (informer cache)
participant Controller as MachineSetSync Controller
Test->>APIServer: create/update MAPI MachineSet (Status.AuthoritativeAPI, ObservedGeneration)
Test->>Manager: Eventually(managerInformerCache) wait for observed state
Manager->>Controller: informer delivers updated object
Controller->>APIServer: read CAPI MachineSet / CAPA template (reconcile)
Controller->>APIServer: update/delete resources / finalizers
Test->>APIServer: simulate CAPI finalizer removal (slices.DeleteFunc)
Test->>Manager: wait for manager cache to observe finalizer removal
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
ee8c828 to
eb108fa
Compare
|
/test e2e-aws-capi-techpreview |
| }) | ||
|
|
||
| Context("when the CAPI machine set has a non-zero deletion timestamp", func() { | ||
| Context("when the CAPI machine set is deleted", func() { |
There was a problem hiding this comment.
nit: why the rename? lol
is non-zero deletion timestamp not more specific to what's going on here?
| mapiNamespace = corev1resourcebuilder.Namespace(). | ||
| WithGenerateName("openshift-machine-api-").Build() | ||
| Eventually(k8sClient.Create(ctx, mapiNamespace)).Should(Succeed(), "mapi namespace should be able to be created") | ||
| Eventually(kCreate(ctx, mapiNamespace)).Should(Succeed(), "mapi namespace should be able to be created") |
There was a problem hiding this comment.
🎉 nice on catching these
There was a problem hiding this comment.
I probably wouldn't have spotted them if they hadn't flaked in a test run, tbh. Not the first time I've made this mistake, either. It's a tricky api.
| capiMachineSet.Spec.Template.Spec.Version = testVersion | ||
|
|
||
| Eventually(k8sClient.Create(ctx, capiMachineSet), timeout).Should(MatchError(ContainSubstring(".version is a forbidden field"))) | ||
| Eventually(kCreate(ctx, capiMachineSet, client.DryRunAll), timeout).Should(MatchError(ContainSubstring(".version is a forbidden field"))) |
There was a problem hiding this comment.
Noice - this explains a flake I sometimes saw! 🎉
We use this same pattern in the machine sync VAP tests, could we change/ add it there too?
| Context("when the MAPI machine set has MachineAuthority set to Cluster API", func() { | ||
| // afterMAPIUpdate can be set in a sub-context's BeforeEach to register | ||
| // initialisation to happen after the MAPI machine set is updated. | ||
| var afterMAPIUpdate func() |
There was a problem hiding this comment.
This pattern ... I get the intent but makes understanding what code is executing where even worse (layer before each, justbefore each, contexts) then trace this through?
BeforeEach -> Subcontext BeforeEach where this actually gets set -> JustBeforeEach where this gets called 😢
Feels like we're hand rolling a fix to our (probably my) mis-structuring of tests :/ Can we break up the higher contexts so we avoid this?
There was a problem hiding this comment.
I'm very sympathetic to what you're saying. I was thinking the same. However, as you say the issue is structural: it's the nature of these deeply-nested tests. Fixing this structurally would amount to refactoring the entire test suite. I'm sympathetic to that, too, but it's not likely to happen in the near future.
This was honestly the simplest solution I could come up with without rewriting everything. It at least has a clear name, so it's explicit. If you can think of anything simpler I'd be in favour.
One change I'd be tempted to make to this, though, is to add a comment to explain this was added under duress, and should not be copy-pasta'd or extended. New tests should be structured so this is not necessary. Thoughts?
There was a problem hiding this comment.
Yeah... fair. It would be ugly and you're right.. wont be worked any time soon.
One change I'd be tempted to make to this, though, is to add a comment to explain this was added under duress, and should not be copy-pasta'd or extended. New tests should be structured so this is not necessary. Thoughts?
I think this would be a pragmatic middle ground.
I wonder if we can lint / catch a generalised version of this pattern somehow 🤔
| if afterMAPIUpdate != nil { | ||
| afterMAPIUpdate() | ||
|
|
||
| // afterMAPIUpdate is not reinitialised between tests in this |
There was a problem hiding this comment.
If we start this pattern I think someone / an agent will forget to unset it and we'll spend a day or two chasing our tails
There was a problem hiding this comment.
You worried that an agent might delete this line? Not following...
There was a problem hiding this comment.
I think you addressed this in this comment: #551 (comment)
My above comment wasn't all that clear, I meant if this becomes a pattern that gets copied - but a big do not copy this comment should work 🤞
| // behaviour of the controller in the presence of existing state. We need to | ||
| // ensure that the controller has had a chance to observe the state the test | ||
| // just created first. | ||
| eventuallyManagerInformerCache := func(obj client.Object) gomegatypes.AsyncAssertion { |
There was a problem hiding this comment.
This is good though :D
eb108fa to
16ba30f
Compare
|
@mdbooth: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit