NO-ISSUE: remove etcd dependency from failing cases#1241
Conversation
|
/retitle NO-ISSUE: remove etcd dependency from failing cases |
|
@bandrade: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hi @bandrade , could you help list the test log in the FIPS cluster? Thanks! |
|
And, other test cases( |
|
/retest-required |
81b8410 to
c11bf70
Compare
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughTests modified to dynamically resolve PackageManifest default channels and switch to a new "learn" operator/catalog context; CatalogSource and Subscription values were updated, multi-CSV readiness polling and template-driven Prometheus resource application were added, node-gating and cleanup/defer flows were introduced across two OLM test specs. (41 words) Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester as Test Runner
participant K8s as Kubernetes API
participant PM as PackageManifest
participant CS as CatalogSource
participant OLM as OLM (Subscription/CSV)
participant Node as NodeChecker
Tester->>PM: query PackageManifest for package & defaultChannel
PM-->>Tester: package info or not found (skip)
Tester->>Node: check schedulable Linux worker nodes (if required)
Node-->>Tester: present / absent (skip if absent)
Tester->>K8s: (optional) reference/create CatalogSource (`cs.Name`/`cs.Namespace`)
Tester->>OLM: create Subscription (CatalogSourceName/Namespace, Channel, Package, StartingCSV)
OLM-->>Tester: InstalledCSV name
Tester->>K8s: poll CSV(s) status until Succeeded or timeout (multiple operators)
K8s-->>Tester: CSV statuses (Succeeded / Failure / timeout)
Tester->>K8s: apply template-driven Prometheus resources (ServiceMonitor/Rules)
K8s-->>Tester: resource creation confirmation
Tester->>OLM: cleanup: delete Subscription(s) and CSV(s)
Tester->>CS: cleanup CatalogSource references (if dynamically created)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@jianzhangbjz here is the command output: |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests-extension/test/qe/specs/olmv0_defaultoption.go`:
- Around line 3307-3315: The current poll in wait.PollUntilContextTimeout only
checks that CSV names appear in oc get csv output; change the predicate to
verify each CSV's .status.phase == "Succeeded" (e.g., by invoking
oc.AsAdmin().WithoutNamespace().Run("get").Args("csv", "<csv-name>", "-n",
sub.Namespace, "-o", "jsonpath={.status.phase}") or by parsing structured JSON
for both "ditto-operator" and "planetscale-operator") so the function
wait.PollUntilContextTimeout only returns true when both CSVs are Succeeded;
keep the existing olmv0util.LogDebugInfo(oc, sub.Namespace, ...) and
exutil.AssertWaitPollNoErr(waitErr, ...) behavior for failure reporting.
- Line 3266: The test currently calls exutil.SkipIfDisableDefaultCatalogsource
which incorrectly skips the entire self-contained CatalogSource scenario; remove
that call from olmv0_defaultoption.go so the test always runs (or replace it
with a short comment asserting the test creates and uses its own CatalogSource)
and make sure the code paths using create/consume of the test-local
CatalogSource (the surrounding test in olmv0_defaultoption.go) do not rely on
operatorhub/cluster or default sources; reference
exutil.SkipIfDisableDefaultCatalogsource to locate and remove/replace the guard.
- Around line 809-824: The test is creating a duplicate subscription to the same
package ("learn") so it triggers the same-package conflict instead of the
intended "dependency CSV already installed" scenario; update the second
SubscriptionDescription (SubName "sub-32613") so its OperatorPackage and
StartingCSV target a package that depends on the already-installed CSV (or swap
to the provided fixture that preserves the dependency relationship) and ensure
CatalogSourceName/CatalogSourceNamespace remain correct for that package; make
the same change for the other occurrence around the second block (the similar
SubscriptionDescription at the other location) so the test covers the
dependency-installed path rather than duplicate-subscription.
In `@tests-extension/test/qe/specs/olmv0_nonallns.go`:
- Around line 406-435: The code currently swallows errors from
olmv0util.ClusterPackageExists, the packagemanifest get, and
exutil.GetSchedulableLinuxWorkerNodes and fabricates defaults (e.g., setting
sub.Channel = "stable") or skips, which hides transient failures; change the
handling so that errors from ClusterPackageExists, the
oc.AsAdmin().WithoutNamespace().Run("get") call that fetches packagemanifest,
and exutil.GetSchedulableLinuxWorkerNodes are checked and cause the test to fail
with the actual error (use g.Fatalf or the test framework's fail function)
instead of ignoring them or defaulting the channel, and only proceed when the
packagemanifest lookup returns a non-empty defaultChannel (otherwise fail with
the error/output), and similarly fail with the worker node lookup error or a
clear failure message if no schedulable READY worker is found.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aeee2c3e-d796-4487-bf74-8f5dad3fbb0d
📒 Files selected for processing (2)
tests-extension/test/qe/specs/olmv0_defaultoption.gotests-extension/test/qe/specs/olmv0_nonallns.go
c11bf70 to
ef86b52
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests-extension/test/qe/specs/olmv0_defaultoption.go`:
- Around line 826-838: The test uses SubscriptionDescription (sub) with
OperatorPackage "sample-operator" and expects "learn-operator.v0.0.3" in the
external catalog but those packages/versions don't exist; either point the
subscription to available operator bundles or add the missing bundles. Update
the SubscriptionDescription fields (OperatorPackage, Channel,
CatalogSourceName/Namespace or Template) used by sub.CreateWithoutCheck and the
subsequent olmv0util.NewCheck invocation to reference existing packages (e.g.,
learn-operator v0.0.1 or v0.0.2) or create and publish the missing
sample-operator and learn-operator v0.0.3 bundles to
quay.io/olmqe/learn-operator-index:v25 so the subscription check
(ConstraintsNotSatisfiable) targets valid catalog content.
In `@tests-extension/test/qe/specs/olmv0_nonallns.go`:
- Around line 419-447: The code selects firstNode using worker.Name and
re-checks Ready conditions unnecessarily; change it to use the node's
kubernetes.io/hostname label value for NODE_NAME so the Prometheus affinity
matches the template: after calling GetSchedulableLinuxWorkerNodes(oc) pick the
first worker (no extra Ready condition loop) and extract its
labels["kubernetes.io/hostname"] (falling back to worker.Name only if the label
is missing) and pass that value into olmv0util.ApplyResourceFromTemplate for
NODE_NAME when rendering prometheusTemplate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 33abb12e-0049-4175-a7b7-955f1aab5571
📒 Files selected for processing (2)
tests-extension/test/qe/specs/olmv0_defaultoption.gotests-extension/test/qe/specs/olmv0_nonallns.go
| workerNodes, workerErr := exutil.GetSchedulableLinuxWorkerNodes(oc) | ||
| o.Expect(workerErr).NotTo(o.HaveOccurred(), "failed to list schedulable linux worker nodes") | ||
| firstNode := "" | ||
| for _, worker := range workerNodes { | ||
| for _, con := range worker.Status.Conditions { | ||
| if con.Type == "Ready" && con.Status == "True" { | ||
| firstNode = worker.Name | ||
| break | ||
| } | ||
| } | ||
| if firstNode != "" { | ||
| break | ||
| } | ||
| } | ||
| if firstNode == "" { | ||
| e2e.Failf("no schedulable worker node in READY state found") | ||
| } | ||
|
|
||
| g.By("Install the Prometheus operator with Automatic approval") | ||
| defer sub.Delete(itName, dr) | ||
| defer sub.DeleteCSV(itName, dr) | ||
| sub.Create(oc, itName, dr) | ||
| olmv0util.NewCheck("expect", exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "Succeeded", exutil.Ok, []string{"csv", csvName, "-n", ns, "-o=jsonpath={.status.phase}"}).Check(oc) | ||
| olmv0util.NewCheck("expect", exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "Succeeded", exutil.Ok, []string{"csv", sub.InstalledCSV, "-n", ns, "-o=jsonpath={.status.phase}"}).Check(oc) | ||
|
|
||
| g.By("Assert that prometheus dependency is resolved") | ||
| msg, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("csv", "-n", ns).Output() | ||
| g.By("Create a Prometheus resource that relies on Prometheus bundle types") | ||
| defer func() { | ||
| _ = oc.AsAdmin().WithoutNamespace().Run("delete").Args("prometheus", "example", "-n", ns, "--ignore-not-found").Execute() | ||
| }() | ||
| err = olmv0util.ApplyResourceFromTemplate(oc, "--ignore-unknown-parameters=true", "-f", prometheusTemplate, "-p", fmt.Sprintf("NAMESPACE=%s", ns), fmt.Sprintf("NODE_NAME=%s", firstNode)) |
There was a problem hiding this comment.
Use the hostname label value for NODE_NAME.
Line 425 passes worker.Name, but the template matches on kubernetes.io/hostname. Those values are not guaranteed to be identical, so this can generate a Prometheus CR whose affinity matches no worker on some platforms. Also, GetSchedulableLinuxWorkerNodes already filters to Ready workers, so the extra condition scan is redundant.
♻️ Suggested fix
workerNodes, workerErr := exutil.GetSchedulableLinuxWorkerNodes(oc)
o.Expect(workerErr).NotTo(o.HaveOccurred(), "failed to list schedulable linux worker nodes")
firstNode := ""
for _, worker := range workerNodes {
- for _, con := range worker.Status.Conditions {
- if con.Type == "Ready" && con.Status == "True" {
- firstNode = worker.Name
- break
- }
- }
+ firstNode = strings.TrimSpace(worker.Labels["kubernetes.io/hostname"])
if firstNode != "" {
break
}
}
if firstNode == "" {
- e2e.Failf("no schedulable worker node in READY state found")
+ e2e.Failf("no schedulable worker node with kubernetes.io/hostname label found")
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/qe/specs/olmv0_nonallns.go` around lines 419 - 447, The
code selects firstNode using worker.Name and re-checks Ready conditions
unnecessarily; change it to use the node's kubernetes.io/hostname label value
for NODE_NAME so the Prometheus affinity matches the template: after calling
GetSchedulableLinuxWorkerNodes(oc) pick the first worker (no extra Ready
condition loop) and extract its labels["kubernetes.io/hostname"] (falling back
to worker.Name only if the label is missing) and pass that value into
olmv0util.ApplyResourceFromTemplate for NODE_NAME when rendering
prometheusTemplate.
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-extended-f2 |
|
@jianzhangbjz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1a09b100-1ce4-11f1-8f8f-7ba332b0ae18-0 |
|
Hi @bandrade , test failed. Please address those failures: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-operator-framework-olm-1241-periodics-e2e-aws-ovn-fips-extended-f2/2031532831309565952 |
1bb1ce2 to
a629db7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tests-extension/test/qe/specs/olmv0_nonallns.go (1)
419-447:⚠️ Potential issue | 🟠 MajorUse hostname label for
NODE_NAME(and drop redundant Ready scan).At Line 425,
worker.Nameis used forNODE_NAME, but the template affinity matcheskubernetes.io/hostname. These values can differ on some platforms, causing false targeting. Also,GetSchedulableLinuxWorkerNodesalready guarantees Ready workers, so the extra condition loop is redundant.♻️ Proposed fix
workerNodes, workerErr := exutil.GetSchedulableLinuxWorkerNodes(oc) o.Expect(workerErr).NotTo(o.HaveOccurred(), "failed to list schedulable linux worker nodes") firstNode := "" for _, worker := range workerNodes { - for _, con := range worker.Status.Conditions { - if con.Type == "Ready" && con.Status == "True" { - firstNode = worker.Name - break - } - } + firstNode = strings.TrimSpace(worker.Labels["kubernetes.io/hostname"]) + if firstNode == "" { + firstNode = strings.TrimSpace(worker.Name) + } if firstNode != "" { break } } if firstNode == "" { - e2e.Failf("no schedulable worker node in READY state found") + e2e.Failf("no schedulable worker node with hostname value found") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/qe/specs/olmv0_nonallns.go` around lines 419 - 447, The code currently iterates over workerNodes from exutil.GetSchedulableLinuxWorkerNodes and sets firstNode = worker.Name after an extra Ready-condition scan (redundant because GetSchedulableLinuxWorkerNodes already returns ready schedulable nodes) and uses that Name for NODE_NAME even though the pod affinity in the template matches the kubernetes.io/hostname label; update the logic in the block that computes firstNode to stop the manual Ready scan and instead read the hostname label (worker.Labels["kubernetes.io/hostname"]) from the first worker in workerNodes, assign that value to firstNode, and only fallback to worker.Name if the hostname label is missing so the ApplyResourceFromTemplate call uses the correct NODE_NAME for affinity matching (refer to variables/funcs: workerNodes, firstNode, exutil.GetSchedulableLinuxWorkerNodes, ApplyResourceFromTemplate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests-extension/test/qe/specs/olmv0_defaultoption.go`:
- Around line 3306-3315: The poll callback used by wait.PollUntilContextTimeout
currently calls o.Expect(err).NotTo(o.HaveOccurred()) which will abort the test
on any transient oc.AsAdmin().WithoutNamespace().Run(...).Output() error;
instead, change the callback to not call o.Expect on the Output() error—if err
!= nil then return false, nil so the poll will retry (optionally log the error)
and only use o.Expect assertions after the poll succeeds (e.g., when evaluating
hasDittoReady/hasPlanetscaleReady or after the wait returns); keep the same
logic inside the callback but replace the o.Expect line with a transient-error
handling return to ensure retries.
- Around line 3274-3283: The CatalogSourceDescription instance `cs` is missing
the Secret field which causes the template to be rendered with an empty SECRET
value making `secrets: - ""` invalid; update the
`olmv0util.CatalogSourceDescription` literal (the `cs` variable) to set `Secret`
to a valid Kubernetes secret name (or switch to a template that does not require
SECRET) so that the `Create` call receives a non-empty secret name when applying
the `Template` (`csImageTemplate`).
---
Duplicate comments:
In `@tests-extension/test/qe/specs/olmv0_nonallns.go`:
- Around line 419-447: The code currently iterates over workerNodes from
exutil.GetSchedulableLinuxWorkerNodes and sets firstNode = worker.Name after an
extra Ready-condition scan (redundant because GetSchedulableLinuxWorkerNodes
already returns ready schedulable nodes) and uses that Name for NODE_NAME even
though the pod affinity in the template matches the kubernetes.io/hostname
label; update the logic in the block that computes firstNode to stop the manual
Ready scan and instead read the hostname label
(worker.Labels["kubernetes.io/hostname"]) from the first worker in workerNodes,
assign that value to firstNode, and only fallback to worker.Name if the hostname
label is missing so the ApplyResourceFromTemplate call uses the correct
NODE_NAME for affinity matching (refer to variables/funcs: workerNodes,
firstNode, exutil.GetSchedulableLinuxWorkerNodes, ApplyResourceFromTemplate).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 56b121c5-3e11-4011-8ddc-ca666ed48079
📒 Files selected for processing (2)
tests-extension/test/qe/specs/olmv0_defaultoption.gotests-extension/test/qe/specs/olmv0_nonallns.go
| csImageTemplate := filepath.Join(buildPruningBaseDir, "catalogsource-image-extract.yaml") | ||
| cs := olmv0util.CatalogSourceDescription{ | ||
| Name: "ocp-47181", | ||
| Namespace: oc.Namespace(), | ||
| DisplayName: "ocp-47181", | ||
| Publisher: "OLM QE", | ||
| SourceType: "grpc", | ||
| Address: "quay.io/olmqe/etcd-47181:1.0", | ||
| Address: "quay.io/olmqe/ditto-index:41565-cache", | ||
| Template: csImageTemplate, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check CatalogSourceDescription struct definition
rg -nA 30 'type CatalogSourceDescription struct' tests-extension/test/qe/util/olmv0util/
# Check how the template is applied and if SECRET can be empty
rg -n 'SECRET' tests-extension/test/qe/testdata/olm/catalogsource-image-extract.yamlRepository: openshift/operator-framework-olm
Length of output: 3613
🏁 Script executed:
#!/bin/bash
# Get the Create method implementation to see how Secret is handled
rg -nA 50 'func (catsrc \*CatalogSourceDescription) Create' tests-extension/test/qe/util/olmv0util/catalog_source.go
# Get the full catalogsource-image-extract.yaml template
cat -n tests-extension/test/qe/testdata/olm/catalogsource-image-extract.yamlRepository: openshift/operator-framework-olm
Length of output: 1118
🏁 Script executed:
#!/bin/bash
# Get more of the Create method implementation
sed -n '46,150p' tests-extension/test/qe/util/olmv0util/catalog_source.goRepository: openshift/operator-framework-olm
Length of output: 5395
Missing Secret field in CatalogSourceDescription.
The CatalogSourceDescription struct has a Secret field, but the test code does not set it. When the Create method applies the template, it passes SECRET= (empty string), which results in:
secrets:
- ""An empty string is not a valid Kubernetes secret name and will cause the CatalogSource creation to fail.
Set the Secret field in the CatalogSourceDescription struct, or use a template that does not require a secret parameter.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/qe/specs/olmv0_defaultoption.go` around lines 3274 -
3283, The CatalogSourceDescription instance `cs` is missing the Secret field
which causes the template to be rendered with an empty SECRET value making
`secrets: - ""` invalid; update the `olmv0util.CatalogSourceDescription` literal
(the `cs` variable) to set `Secret` to a valid Kubernetes secret name (or switch
to a template that does not require SECRET) so that the `Create` call receives a
non-empty secret name when applying the `Template` (`csImageTemplate`).
| waitErr := wait.PollUntilContextTimeout(context.TODO(), 10*time.Second, 360*time.Second, false, func(ctx context.Context) (bool, error) { | ||
| csvList, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("csv", "-n", sub.Namespace, "-o=jsonpath={range .items[*]}{@.metadata.name}{\",\"}{@.status.phase}{\"\\n\"}{end}").Output() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| hasDittoReady := false | ||
| hasPlanetscaleReady := false | ||
| for _, line := range strings.Split(strings.TrimSpace(csvList), "\n") { | ||
| parts := strings.Split(line, ",") | ||
| if len(parts) != 2 { | ||
| continue | ||
| } |
There was a problem hiding this comment.
o.Expect inside poll function will abort on transient errors.
Line 3309 uses o.Expect(err).NotTo(o.HaveOccurred()) inside the polling function. If oc.Run fails transiently (e.g., temporary API unavailability), the test will fail immediately rather than retrying. The poll pattern should handle transient errors gracefully.
🛠️ Suggested fix
waitErr := wait.PollUntilContextTimeout(context.TODO(), 10*time.Second, 360*time.Second, false, func(ctx context.Context) (bool, error) {
csvList, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("csv", "-n", sub.Namespace, "-o=jsonpath={range .items[*]}{@.metadata.name}{\",\"}{@.status.phase}{\"\\n\"}{end}").Output()
- o.Expect(err).NotTo(o.HaveOccurred())
+ if err != nil {
+ e2e.Logf("Failed to get CSV list, retrying: %v", err)
+ return false, nil
+ }
hasDittoReady := false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/qe/specs/olmv0_defaultoption.go` around lines 3306 -
3315, The poll callback used by wait.PollUntilContextTimeout currently calls
o.Expect(err).NotTo(o.HaveOccurred()) which will abort the test on any transient
oc.AsAdmin().WithoutNamespace().Run(...).Output() error; instead, change the
callback to not call o.Expect on the Output() error—if err != nil then return
false, nil so the poll will retry (optionally log the error) and only use
o.Expect assertions after the poll succeeds (e.g., when evaluating
hasDittoReady/hasPlanetscaleReady or after the wait returns); keep the same
logic inside the callback but replace the o.Expect line with a transient-error
handling return to ensure retries.
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-extended-f2 |
|
@bandrade: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d0467c50-1e60-11f1-9352-ada36cddc92f-0 |
|
Hi @bandrade , some test cases failed. Please address it: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-operator-framework-olm-1241-periodics-e2e-aws-ovn-fips-extended-f2/2032218660960997376 |
a629db7 to
d3d68f7
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests-extension/test/qe/specs/olmv0_nonallns.go (1)
419-435:⚠️ Potential issue | 🟠 MajorUse hostname label for
NODE_NAMEto avoid affinity mismatches.At Line 425,
firstNode = worker.Namecan diverge fromkubernetes.io/hostname, which can make the rendered Prometheus node affinity unschedulable on some platforms.♻️ Suggested fix
workerNodes, workerErr := exutil.GetSchedulableLinuxWorkerNodes(oc) o.Expect(workerErr).NotTo(o.HaveOccurred(), "failed to list schedulable linux worker nodes") firstNode := "" for _, worker := range workerNodes { - for _, con := range worker.Status.Conditions { - if con.Type == "Ready" && con.Status == "True" { - firstNode = worker.Name - break - } - } + firstNode = strings.TrimSpace(worker.Labels["kubernetes.io/hostname"]) + if firstNode == "" { + firstNode = worker.Name + } if firstNode != "" { break } } if firstNode == "" { - e2e.Failf("no schedulable worker node in READY state found") + e2e.Failf("no schedulable worker node with hostname label found") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/qe/specs/olmv0_nonallns.go` around lines 419 - 435, The code selects a node by assigning worker.Name to firstNode, but Pod node affinity uses the kubernetes.io/hostname label and this can differ from the Node.Name; change the assignment in the loop to read the hostname label from the node object (use worker.Labels["kubernetes.io/hostname"]) and set firstNode to that value, falling back to worker.Name only if the label is missing; update references that construct NODE_NAME to use this hostname-derived firstNode so affinity matches actual node labels.
🧹 Nitpick comments (1)
tests-extension/test/qe/specs/olmv0_defaultoption.go (1)
3389-3410: Consider adding debug logging on failure for consistency.Unlike the PolarionID:47181 test at lines 3340-3342, this test doesn't log debug info when the poll fails. Adding
olmv0util.LogDebugInfohere would improve troubleshooting consistency across similar tests.♻️ Add debug logging on failure
return hasDittoReady && hasPlanetscaleReady, nil }) +if waitErr != nil { + olmv0util.LogDebugInfo(oc, sub.Namespace, "pod", "ip", "csv", "events") +} exutil.AssertWaitPollNoErr(waitErr, "csv ditto-operator or planetscale-operator was not Succeeded nor Installing")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/qe/specs/olmv0_defaultoption.go` around lines 3389 - 3410, The poll loop can fail without emitting debug info; before calling exutil.AssertWaitPollNoErr add a call to olmv0util.LogDebugInfo so failures get the same troubleshooting output as the other test. Specifically, after waitErr is set (the variable from wait.PollUntilContextTimeout) and before exutil.AssertWaitPollNoErr, invoke olmv0util.LogDebugInfo passing the subscription namespace and the oc client (use sub.Namespace and oc) so logs are captured when the CSV readiness check fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests-extension/test/qe/specs/olmv0_nonallns.go`:
- Around line 419-435: The code selects a node by assigning worker.Name to
firstNode, but Pod node affinity uses the kubernetes.io/hostname label and this
can differ from the Node.Name; change the assignment in the loop to read the
hostname label from the node object (use
worker.Labels["kubernetes.io/hostname"]) and set firstNode to that value,
falling back to worker.Name only if the label is missing; update references that
construct NODE_NAME to use this hostname-derived firstNode so affinity matches
actual node labels.
---
Nitpick comments:
In `@tests-extension/test/qe/specs/olmv0_defaultoption.go`:
- Around line 3389-3410: The poll loop can fail without emitting debug info;
before calling exutil.AssertWaitPollNoErr add a call to olmv0util.LogDebugInfo
so failures get the same troubleshooting output as the other test. Specifically,
after waitErr is set (the variable from wait.PollUntilContextTimeout) and before
exutil.AssertWaitPollNoErr, invoke olmv0util.LogDebugInfo passing the
subscription namespace and the oc client (use sub.Namespace and oc) so logs are
captured when the CSV readiness check fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3cb5a9a9-78d2-4a70-a02c-dfacaa33527f
📒 Files selected for processing (2)
tests-extension/test/qe/specs/olmv0_defaultoption.gotests-extension/test/qe/specs/olmv0_nonallns.go
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-extended-f2 |
|
@bandrade: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9a983770-2224-11f1-8063-3f702d4b503e-0 |
|
@jianzhangbjz looks good now, can you please take a look? Thank you |
d3d68f7 to
8bb4202
Compare
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
tests-extension/test/qe/specs/olmv0_nonallns.go (1)
419-447:⚠️ Potential issue | 🟠 MajorUse hostname label for
NODE_NAMEto avoid unschedulable PrometheusLine 425 uses
worker.Name, but the applied template matches onkubernetes.io/hostname. Those values are not guaranteed to match, so this can fail scheduling on some platforms.Suggested fix
workerNodes, workerErr := exutil.GetSchedulableLinuxWorkerNodes(oc) o.Expect(workerErr).NotTo(o.HaveOccurred(), "failed to list schedulable linux worker nodes") firstNode := "" for _, worker := range workerNodes { - for _, con := range worker.Status.Conditions { - if con.Type == "Ready" && con.Status == "True" { - firstNode = worker.Name - break - } - } + firstNode = strings.TrimSpace(worker.Labels["kubernetes.io/hostname"]) + if firstNode == "" { + firstNode = worker.Name + } if firstNode != "" { break } } if firstNode == "" { - e2e.Failf("no schedulable worker node in READY state found") + e2e.Failf("no schedulable worker node with usable hostname found") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/qe/specs/olmv0_nonallns.go` around lines 419 - 447, The code sets NODE_NAME using worker.Name (firstNode) but the Prometheus podSelector matches the kubernetes.io/hostname label; change the node selection to use the node's kubernetes.io/hostname label: when iterating workerNodes in the block that sets firstNode (and before calling olmv0util.ApplyResourceFromTemplate with NODE_NAME and prometheusTemplate), read worker.Labels["kubernetes.io/hostname"] and assign that to firstNode (with a safe fallback to worker.Name only if the hostname label is missing) so the applied template uses the label value the scheduler matches.tests-extension/test/qe/specs/olmv0_defaultoption.go (2)
818-847:⚠️ Potential issue | 🟠 MajorThis has drifted back to the duplicate-subscription path.
Both subscriptions target
learnfrom the same catalog and namespace, so PolarionID:32613 now overlaps with the same-package conflict case already covered later in this file at Lines 2198-2277 instead of exercising “dependency CSV already installed.” Point the second subscription at a package with a real dependency on the first install, or reuse a fixture that preserves that relationship.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/qe/specs/olmv0_defaultoption.go` around lines 818 - 847, The second SubscriptionDescription (currently SubName "sub-32613-conflict") is pointing to the same OperatorPackage ("learn") and thus hits the duplicate-subscription path; update the test so the second subscription targets a package that has a declared dependency on the first installed CSV (or reuse an existing fixture that preserves that dependency relationship) by changing the OperatorPackage and any related fields in the SubscriptionDescription used in the second block (the instance created with sub.CreateWithoutCheck and referenced in the Contain check); ensure the jsonpath check still asserts for "ConstraintsNotSatisfiable" against the correct subscription name.
3252-3273:⚠️ Potential issue | 🟠 MajorMake these CSV polls retry-friendly and wait for
Succeededonly.
o.Expectinside the callback turns a transientoc get csvfailure into an immediate spec failure, and treatingInstallingas success lets the test pass before the dependency chain actually finishes.Suggested change
waitErr := wait.PollUntilContextTimeout(context.TODO(), 10*time.Second, 360*time.Second, false, func(ctx context.Context) (bool, error) { csvList, err := oc.AsAdmin().WithoutNamespace().Run("get").Args( "csv", "-n", sub.Namespace, "-o=jsonpath={range .items[*]}{@.metadata.name}{\",\"}{@.status.phase}{\"\\n\"}{end}", ).Output() - o.Expect(err).NotTo(o.HaveOccurred()) + if err != nil { + e2e.Logf("failed to list CSVs in %s, retrying: %v", sub.Namespace, err) + return false, nil + } hasDittoReady := false hasPlanetscaleReady := false ... - if strings.Contains(name, "ditto-operator") && (phase == "Succeeded" || phase == "Installing") { + if strings.Contains(name, "ditto-operator") && phase == "Succeeded" { hasDittoReady = true } - if strings.Contains(name, "planetscale-operator") && (phase == "Succeeded" || phase == "Installing") { + if strings.Contains(name, "planetscale-operator") && phase == "Succeeded" { hasPlanetscaleReady = true } return hasDittoReady && hasPlanetscaleReady, nil }) -exutil.AssertWaitPollNoErr(waitErr, "csv ditto-operator or planetscale-operator was not Succeeded nor Installing") +exutil.AssertWaitPollNoErr(waitErr, "csv ditto-operator or planetscale-operator was not Succeeded")Also applies to: 3320-3344, 3390-3411
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-extension/test/qe/specs/olmv0_defaultoption.go` around lines 3252 - 3273, The callback passed to wait.PollUntilContextTimeout should be retry-friendly and only consider CSV phase "Succeeded" as success: remove the o.Expect(err).NotTo... assertion inside the callback (do not fail the spec on transient oc get errors); instead, if err != nil return false, nil so the poll retries, and only set hasDittoReady/hasPlanetscaleReady when phase == "Succeeded" (remove "Installing" as a success state). Update the code around wait.PollUntilContextTimeout, the oc.AsAdmin().WithoutNamespace().Run("get").Args(...).Output() call, and the variables hasDittoReady/hasPlanetscaleReady accordingly; apply the same changes to the similar blocks at the other locations referenced (lines 3320-3344 and 3390-3411).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests-extension/test/qe/specs/olmv0_nonallns.go`:
- Around line 449-451: The assertion created by olmv0util.NewCheck is expecting
the wrong combined metadata/name+replica string: it passes "example2" but the
template sets spec.replicas: 1 (so the jsonpath {.metadata.name}{.spec.replicas}
will produce "example1"); update the NewCheck call to use the correct expected
value (change "example2" to "example1") or, if the template was meant to be
different, update the template to set spec.replicas to 2 so the current
expectation remains valid; locate the call to olmv0util.NewCheck("expect",
exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "example2", ...) and
make the corresponding change.
---
Duplicate comments:
In `@tests-extension/test/qe/specs/olmv0_defaultoption.go`:
- Around line 818-847: The second SubscriptionDescription (currently SubName
"sub-32613-conflict") is pointing to the same OperatorPackage ("learn") and thus
hits the duplicate-subscription path; update the test so the second subscription
targets a package that has a declared dependency on the first installed CSV (or
reuse an existing fixture that preserves that dependency relationship) by
changing the OperatorPackage and any related fields in the
SubscriptionDescription used in the second block (the instance created with
sub.CreateWithoutCheck and referenced in the Contain check); ensure the jsonpath
check still asserts for "ConstraintsNotSatisfiable" against the correct
subscription name.
- Around line 3252-3273: The callback passed to wait.PollUntilContextTimeout
should be retry-friendly and only consider CSV phase "Succeeded" as success:
remove the o.Expect(err).NotTo... assertion inside the callback (do not fail the
spec on transient oc get errors); instead, if err != nil return false, nil so
the poll retries, and only set hasDittoReady/hasPlanetscaleReady when phase ==
"Succeeded" (remove "Installing" as a success state). Update the code around
wait.PollUntilContextTimeout, the
oc.AsAdmin().WithoutNamespace().Run("get").Args(...).Output() call, and the
variables hasDittoReady/hasPlanetscaleReady accordingly; apply the same changes
to the similar blocks at the other locations referenced (lines 3320-3344 and
3390-3411).
In `@tests-extension/test/qe/specs/olmv0_nonallns.go`:
- Around line 419-447: The code sets NODE_NAME using worker.Name (firstNode) but
the Prometheus podSelector matches the kubernetes.io/hostname label; change the
node selection to use the node's kubernetes.io/hostname label: when iterating
workerNodes in the block that sets firstNode (and before calling
olmv0util.ApplyResourceFromTemplate with NODE_NAME and prometheusTemplate), read
worker.Labels["kubernetes.io/hostname"] and assign that to firstNode (with a
safe fallback to worker.Name only if the hostname label is missing) so the
applied template uses the label value the scheduler matches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e2a0a0f7-0afb-48e6-8cb8-d952ea91664d
📒 Files selected for processing (2)
tests-extension/test/qe/specs/olmv0_defaultoption.gotests-extension/test/qe/specs/olmv0_nonallns.go
| // The package used for this test guarantees Prometheus types are served, but may not report | ||
| // status.conditions[0].type=Available for this standalone CR in all environments. | ||
| olmv0util.NewCheck("expect", exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "example2", exutil.Ok, []string{"Prometheus", "example", "-n", ns, "-o=jsonpath={.metadata.name}{.spec.replicas}"}).Check(oc) |
There was a problem hiding this comment.
Prometheus assertion expects the wrong replicas value
Line 451 expects "example2", but the referenced template sets spec.replicas: 1; {.metadata.name}{.spec.replicas} should be "example1" unless the template is intentionally changed.
Suggested fix
- olmv0util.NewCheck("expect", exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "example2", exutil.Ok, []string{"Prometheus", "example", "-n", ns, "-o=jsonpath={.metadata.name}{.spec.replicas}"}).Check(oc)
+ olmv0util.NewCheck("expect", exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "example1", exutil.Ok, []string{"Prometheus", "example", "-n", ns, "-o=jsonpath={.metadata.name}{.spec.replicas}"}).Check(oc)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // The package used for this test guarantees Prometheus types are served, but may not report | |
| // status.conditions[0].type=Available for this standalone CR in all environments. | |
| olmv0util.NewCheck("expect", exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "example2", exutil.Ok, []string{"Prometheus", "example", "-n", ns, "-o=jsonpath={.metadata.name}{.spec.replicas}"}).Check(oc) | |
| // The package used for this test guarantees Prometheus types are served, but may not report | |
| // status.conditions[0].type=Available for this standalone CR in all environments. | |
| olmv0util.NewCheck("expect", exutil.AsAdmin, exutil.WithoutNamespace, exutil.Compare, "example1", exutil.Ok, []string{"Prometheus", "example", "-n", ns, "-o=jsonpath={.metadata.name}{.spec.replicas}"}).Check(oc) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests-extension/test/qe/specs/olmv0_nonallns.go` around lines 449 - 451, The
assertion created by olmv0util.NewCheck is expecting the wrong combined
metadata/name+replica string: it passes "example2" but the template sets
spec.replicas: 1 (so the jsonpath {.metadata.name}{.spec.replicas} will produce
"example1"); update the NewCheck call to use the correct expected value (change
"example2" to "example1") or, if the template was meant to be different, update
the template to set spec.replicas to 2 so the current expectation remains valid;
locate the call to olmv0util.NewCheck("expect", exutil.AsAdmin,
exutil.WithoutNamespace, exutil.Compare, "example2", ...) and make the
corresponding change.
8bb4202 to
0a7924a
Compare
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-extended-f2 |
|
@bandrade: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5414f450-2319-11f1-9011-3a8f50c57933-0 |
0a7924a to
ed5bd37
Compare
ed5bd37 to
b51eec3
Compare
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-extended-f2 |
|
@bandrade: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0d868480-2388-11f1-9b9a-41e3499472d7-0 |
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-extended-f2 |
|
@bandrade: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/295447f0-2393-11f1-9a6d-aef1914bbbc6-0 |
|
/payload-job periodic-ci-openshift-operator-framework-olm-release-4.22-periodics-e2e-aws-ovn-fips-techpreview-extended-f1 |
|
@bandrade: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e03f0f30-239e-11f1-92fc-13ba1dfdca73-0 |
|
/retest |
|
@jianzhangbjz, would you mind please taking another look at this one? I replaced all cases that use etcd-operator with another. I changed 24387, 32613, 47149, 47179, 47181, 27680 |
|
/approve |
|
@jianzhangbjz: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bandrade, jianzhangbjz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@bandrade: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Assisted-By: Claude