Skip to content

CNTRLPLANE-3515: test(e2e): add day-2 label/taint no-rollout verification#8595

Open
mgencur wants to merge 1 commit into
openshift:mainfrom
mgencur:node_labels_and_taints
Open

CNTRLPLANE-3515: test(e2e): add day-2 label/taint no-rollout verification#8595
mgencur wants to merge 1 commit into
openshift:mainfrom
mgencur:node_labels_and_taints

Conversation

@mgencur
Copy link
Copy Markdown
Contributor

@mgencur mgencur commented May 27, 2026

What this PR does / why we need it:

Extend NodePoolDay2TagsTest to patch nodeLabels and taints day-2 and verify no rolling upgrade is triggered via NodePool conditions and MachineDeployment generation.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-3515

Special notes for your reviewer:

The original test also scaled up the NodePool and verified that the new nodes actually have the labels/taints. But I think it's not necessary to test it. The existing NodePool upgrade tests already set labels/taints at the beginning and verify these are properly applied to nodes after initial start. These tests together with the new one from this PR cover the whole original test

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Tests
    • Enhanced end-to-end testing for NodePool day‑2 scenarios.
    • Validate applying EC2 resource tags, node labels, and taints without triggering a rolling update.
    • Monitor update-related NodePool conditions and assert rolling-update is not triggered.
    • Re-check MachineDeployment to ensure its generation remains unchanged after the updates.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 27, 2026

@mgencur: This pull request references CNTRLPLANE-3515 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Extend NodePoolDay2TagsTest to patch nodeLabels and taints day-2 and verify no rolling upgrade is triggered via NodePool conditions and MachineDeployment generation.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-3515

Special notes for your reviewer:

The original test also scaled up the NodePool and verified that the new nodes actually have the labels/taints. But I think it's not necessary to test it. The existing NodePool upgrade tests already set labels/taints at the beginning and verify these are properly applied to nodes after initial start. These tests together with the new one from this PR cover the whole original test

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/testing Indicates the PR includes changes for e2e testing labels May 27, 2026
@openshift-ci openshift-ci Bot requested review from devguyio and enxebre May 27, 2026 10:06
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8e7e552d-a4b2-4ebe-8341-38bc0cd39fc7

📥 Commits

Reviewing files that changed from the base of the PR and between 4f8b5f2 and b5d3554.

📒 Files selected for processing (1)
  • test/e2e/nodepool_day2_tags_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/nodepool_day2_tags_test.go

📝 Walkthrough

Walkthrough

This PR updates the nodepool_day2_tags e2e test: it adds the k8s sets import, captures the MachineDeployment.Generation before applying changes, applies EC2 resource tag updates plus node label and taint changes, polls NodePool status while tracking updating-related condition types (using a set and a rollingUpdateTriggered flag), asserts rollingUpdateTriggered is false, and re-fetches the MachineDeployment to verify its Generation is unchanged.


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Container-Privileges ❌ Error PR adds container manifests with privileged:true, hostPID, hostNetwork, and allowPrivilegeEscalation settings in new files without documented justification. Document justification for privilege requirements in kubelet-config-daemonset.yaml and KubeVirt CSI daemonset manifests.
Test Structure And Quality ⚠️ Warning Two assertions lack meaningful failure messages: lines 128 (HaveKeyWithValue) and 138-140 (ContainElement) have no messages to diagnose failures. Add failure messages to lines 128 and 138-140 assertions following the pattern of other assertions in the test (e.g., "expected day-2 tag to be applied to AWSMachine" and "expected EC2 instance tags to contain day-2 tag").
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding day-2 label/taint no-rollout verification to an E2E test, which aligns with the core purpose of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test uses static name "TestNodePoolDay2Tags" with no dynamic values. All dynamic resource names correctly placed in test body, not test titles.
Microshift Test Compatibility ✅ Passed HyperShift (hosted control planes) and MicroShift (edge distribution) are separate projects. This check is not applicable to HyperShift repository tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Test creates single-replica NodePool and doesn't assume multiple nodes, multi-node communication, node scaling, or scheduling across different nodes—all operations work on SNO.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only a test file (nodepool_day2_tags_test.go), not deployment manifests, operators, or controllers. No scheduling constraints are introduced in the code changes.
Ote Binary Stdout Contract ✅ Passed File contains no process-level stdout writes. All code runs within test methods; new sets import has no stdout side effects; no init(), TestMain(), or top-level initializers present.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Test contains no hardcoded IPv4 addresses, IPv4-only IP parsing, IPv4 CIDRs, or external public internet connectivity; AWS API calls are account-internal, and test properly skips on non-AWS platforms.
No-Weak-Crypto ✅ Passed The PR modifies a Kubernetes e2e test file with no cryptographic operations, weak crypto algorithms, custom crypto implementations, or non-constant-time secret comparisons.
No-Sensitive-Data-In-Logs ✅ Passed Test file contains one logging call with a static message and no sensitive data. No credentials, tokens, API keys, PII, or other sensitive data are logged anywhere in the code.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.68%. Comparing base (7ac2953) to head (b5d3554).
⚠️ Report is 298 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8595      +/-   ##
==========================================
+ Coverage   37.44%   40.68%   +3.24%     
==========================================
  Files         751      755       +4     
  Lines       91969    93363    +1394     
==========================================
+ Hits        34435    37985    +3550     
+ Misses      54894    52645    -2249     
- Partials     2640     2733      +93     

see 79 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.70% <ø> (+2.06%) ⬆️
cpo-hostedcontrolplane 41.80% <ø> (+5.32%) ⬆️
cpo-other 41.39% <ø> (+3.66%) ⬆️
hypershift-operator 50.82% <ø> (+2.89%) ⬆️
other 31.61% <ø> (+3.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/nodepool_day2_tags_test.go`:
- Around line 79-88: The test currently replaces NodePool.Spec.NodeLabels and
NodePool.Spec.Taints outright; instead modify the test to merge the day-2 values
into existing fields: ensure NodePool.Spec.NodeLabels is non-nil (create if nil)
and set the "e2e.day2.validation" key to "true" without dropping other keys, and
for NodePool.Spec.Taints append the new hyperv1.Taint (Key:"e2e-day2",
Value:"test", Effect:corev1.TaintEffectPreferNoSchedule) only if an equivalent
taint (same Key, Value, Effect) does not already exist, so existing taints are
preserved rather than overwritten.
- Around line 153-154: The MachineDeployment generation assertion is hardcoded
to 1; instead capture the pre-update generation (e.g., store md.Generation in a
variable like initialGeneration or preUpdateGen before performing the day-2
tag/label/taint updates) and then assert that md.Generation is still equal to
that stored value after the update; update the test in
test/e2e/nodepool_day2_tags_test.go to use the stored pre-update generation for
the final assertion against md.Generation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9ed09c00-4205-40e5-b9d2-1c5734b44f66

📥 Commits

Reviewing files that changed from the base of the PR and between 89e19f8 and 9ce5563.

📒 Files selected for processing (1)
  • test/e2e/nodepool_day2_tags_test.go

Comment thread test/e2e/nodepool_day2_tags_test.go Outdated
Comment thread test/e2e/nodepool_day2_tags_test.go Outdated
@mgencur mgencur force-pushed the node_labels_and_taints branch from 9ce5563 to 4f8b5f2 Compare May 27, 2026 10:44
Comment thread test/e2e/nodepool_day2_tags_test.go Outdated
Extend NodePoolDay2TagsTest to patch nodeLabels and taints day-2
and verify no rolling upgrade is triggered via NodePool conditions
and MachineDeployment generation.

Ref: CNTRLPLANE-3515

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mgencur mgencur force-pushed the node_labels_and_taints branch from 4f8b5f2 to b5d3554 Compare May 29, 2026 04:37
@muraee
Copy link
Copy Markdown
Contributor

muraee commented May 29, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mgencur, muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2026
@jiezhao16
Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 29, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 29, 2026

Test Results

e2e-aws

e2e-aks

@mgencur
Copy link
Copy Markdown
Contributor Author

mgencur commented Jun 2, 2026

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

@mgencur: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mgencur
Copy link
Copy Markdown
Contributor Author

mgencur commented Jun 2, 2026

/retest

3 similar comments
@mgencur
Copy link
Copy Markdown
Contributor Author

mgencur commented Jun 2, 2026

/retest

@mgencur
Copy link
Copy Markdown
Contributor Author

mgencur commented Jun 3, 2026

/retest

@mgencur
Copy link
Copy Markdown
Contributor Author

mgencur commented Jun 4, 2026

/retest

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented Jun 4, 2026

Now I have all the evidence I need. Let me compile the final report.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

task sast-shell-check has the status "TaskRunImagePullFailed":
the step "use-trusted-artifact" in TaskRun "hypershift-operator-main-on-pull-request-pfbvq-sast-shell-check"
failed to pull the image "". The pod errored with the message:
"Back-off pulling image "quay.io/konflux-ci/build-trusted-artifacts@sha256:ab064e9763b62d99da5ee9653370da86ffd9d3e770e1aad7a935e88b64a0b6ac"."

Summary

The Konflux pipeline failed in the sast-shell-check task due to a transient container image pull failure — not due to any issue in the PR code. The use-trusted-artifact init step could not pull quay.io/konflux-ci/build-trusted-artifacts@sha256:ab064e.... The image exists and is accessible on quay.io (HTTP 200), but no active tags currently point to this digest — the latest tag has been moved to sha256:5307eb... (updated June 2). This was likely a transient pull failure on the Konflux build cluster node, possibly residual instability from a resolved Quay.io Push/Pull Degraded incident (June 1–2, 2026). All 15 other pipeline tasks succeeded, including sast-snyk-check and build-images. The PR itself only modifies a single Go test file (test/e2e/nodepool_day2_tags_test.go) and has zero relation to shell code or container infrastructure.

Root Cause

The root cause is a transient image pull failure in the Konflux CI infrastructure, completely unrelated to the PR changes:

  1. Immediate cause: The sast-shell-check Tekton task's use-trusted-artifact step failed to pull quay.io/konflux-ci/build-trusted-artifacts@sha256:ab064e9763b62d99da5ee9653370da86ffd9d3e770e1aad7a935e88b64a0b6ac. Kubernetes entered "Back-off pulling image" state after retries were exhausted within the 11-second task duration.

  2. Image status: The digest sha256:ab064e... still resolves (HTTP 200 from quay.io v2 manifest API), but no active tags point to it — the latest tag was moved to sha256:5307eb... on June 2. The Konflux task definition pins this image by digest, so tag movement should not affect pullability. The image manifest exists and is a valid multi-arch OCI index (amd64, arm64, s390x, ppc64le).

  3. Environmental context: Quay.io experienced a major Push/Pull Degraded incident from June 1 08:34 UTC through June 2 02:17 UTC. While this was marked resolved ~29 hours before the pipeline ran (June 4 07:08 UTC), transient pull failures on specific Konflux cluster nodes (e.g., stale image layer caches, DNS resolution hiccups, or node-level container runtime issues) could persist after a registry incident.

  4. Scope: Only 1 of 16 tasks failed. The failure is purely infrastructure — the PR modifies a single Go e2e test file with no shell scripts, Dockerfiles, or CI configuration changes.

Recommendations
  1. Re-trigger the pipeline — This is a transient infrastructure failure. Simply re-run the Red Hat Konflux / hypershift-operator-main-on-pull-request check. The image is currently accessible and the failure is unlikely to recur.

  2. No code changes needed — The PR (test/e2e/nodepool_day2_tags_test.go) is completely unrelated to the sast-shell-check task or its container image dependency.

  3. If failure recurs, investigate whether the Konflux build cluster nodes have stale container image caches or DNS issues by checking:

    • Pod events on the sast-shell-check TaskRun pod for specific pull error details (e.g., timeout, TLS, DNS, 429 rate-limit)
    • Whether other repos' Konflux pipelines are also experiencing sast-shell-check failures with the same image digest
Evidence
Evidence Detail
Failed task sast-shell-check — 1 of 16 tasks, all others succeeded
Failure status TaskRunImagePullFailed — Kubernetes could not pull the init container image
Failing image quay.io/konflux-ci/build-trusted-artifacts@sha256:ab064e9763b62d99da5ee9653370da86ffd9d3e770e1aad7a935e88b64a0b6ac
Image accessible now ✅ Yes — HTTP 200 from quay.io v2 manifest API; valid multi-arch OCI index
Active tags on digest ❌ None — latest now points to sha256:5307eb0dd71e... (updated June 2)
Recent Quay.io incident Major "Push/Pull Degraded" incident June 1–2 2026, resolved ~29h before this run
PR changes Single file: test/e2e/nodepool_day2_tags_test.go (Go e2e test only)
Other PRs with same check PR #8594 (June 3): ✅ succeeded; PR #8591: ✅ succeeded; PR #8590: ✅ succeeded
Task duration 11 seconds (too short for meaningful back-off retry)
Quay.io current status All systems operational (Downloads, Registry, Container Registries)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants