NO-JIRA: simplify waiting in GetLastKeyMeta#2219
Conversation
|
@tjungblu: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tjungblu The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Walkthrough
ChangesPolling mechanism and list handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/library/encryption/helpers.go`:
- Around line 204-213: In the wait.PollUntilContextTimeout call's condition
function (the anonymous func passed to wait.PollUntilContextTimeout), don't
return (false, err) on transient List errors because that immediately stops
polling; instead log the error and return (false, nil) so polling continues
until the overall timeout, only allowing the PollUntilContextTimeout to surface
an error when the timeout is exceeded. Specifically update the error handling in
the function that calls secretsClient.List (inside wait.PollUntilContextTimeout)
to log the List error and return false, nil rather than false, err.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9c9fef02-14d9-4aed-8ad7-9bcefa092530
📒 Files selected for processing (1)
test/library/encryption/helpers.go
| if err != nil { | ||
| t.Logf("failed to list secrets, err = %v", err.Error()) | ||
| t.Logf("GetLastKeyMeta: Failed to list secrets: %v", err) | ||
| return false, err |
There was a problem hiding this comment.
Should we return err only when isTransientAPIError true? and the rest we should return false, nil?
There was a problem hiding this comment.
that's actually the thing that drove me to change it in the first place, isTransientAPIError doesn't cover everything. For example, auth can be offline for those said 5 minutes, but 401s are not covered. I also don't want to change the function because this is used in too many places.
There was a problem hiding this comment.
You are right. The error types not listed in isTransientAPIError does not make sense
Just one note: test/library/encryption/errors.go file won' be used anymore.
There was a problem hiding this comment.
KMS changes are disruptive for the cluster. After this comment, I think we should retry for such errors. Because for instance, on SNO, there can be a period of time that kube-apiserver is not functioning?, we should retry. Otherwise, test will flake.
There was a problem hiding this comment.
it is used in several places, ForceKeyRotation for example
| backOff := retry.DefaultBackoff | ||
| backOff.Steps = 9999 | ||
|
|
||
| // in theory the max time we tolerate disruption on an SNO cluster is 60 seconds |
There was a problem hiding this comment.
would it make sense to preserve this comment ? (it explains why 5m)
There was a problem hiding this comment.
the comment is still there
| selectedSecrets, err = secretsClient.List(context.TODO(), metav1.ListOptions{LabelSelector: labelSelector}) | ||
| err := wait.PollUntilContextTimeout(t.Context(), 30*time.Second, time.Minute*5, true, func(ctx context.Context) (bool, error) { | ||
| labelSecrets, err := secretsClient.List(ctx, metav1.ListOptions{LabelSelector: labelSelector}) | ||
| if err != nil { |
There was a problem hiding this comment.
let's use transientAPIError - makes the test less brittle
There was a problem hiding this comment.
as mentioned earlier on slack and on the other thread with the ai rabbit: it doesn't cover all transient errors
There was a problem hiding this comment.
Okay, but right now we have a mix. Some tests use transientAPIError and some don’t.
Are there any known transient errors we’re currently missing? Would it make sense to add them to transientAPIError?
In general, do we currently have issues with transient errors?
I think it would be better not to hide any errors. At the moment, we treat all errors as transient. Using transientAPIError instead would raise the bar for the entire platform, and I think we’re already doing a good job since the tests are generally stable :)
WDYT?
| } | ||
| return | ||
|
|
||
| selectedSecrets = labelSecrets |
| }, func() (err error) { | ||
| selectedSecrets, err = secretsClient.List(context.TODO(), metav1.ListOptions{LabelSelector: labelSelector}) | ||
| err := wait.PollUntilContextTimeout(t.Context(), 30*time.Second, time.Minute*5, true, func(ctx context.Context) (bool, error) { | ||
| labelSecrets, err := secretsClient.List(ctx, metav1.ListOptions{LabelSelector: labelSelector}) |
There was a problem hiding this comment.
it' s compilation error due to the scoping of the function.
Unused variable 'selectedSecrets'
I gave the return type a variable name and removed the assignment.
This replaces the existing logic with a simpler wait.PollUntilContextTimeout. Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
|
@tjungblu: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Simplifies the changes made in #1188 to just do a dumb 5m context timeout wait with 30s polling in between.
Summary by CodeRabbit