Skip to content

NO-JIRA: Fix Azure private/topology CEL validation rules#8490

Merged
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
enxebre:enxebre/fix-azure-private-topology-cel
Jun 3, 2026
Merged

NO-JIRA: Fix Azure private/topology CEL validation rules#8490
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
enxebre:enxebre/fix-azure-private-topology-cel

Conversation

@enxebre
Copy link
Copy Markdown
Member

@enxebre enxebre commented May 12, 2026

Description

Fix two CEL validation gaps on Azure AzurePlatformSpec and AzurePrivateSpec:

  1. private was settable without topology: The existing rule !has(self.topology) || (...) short-circuited to true when topology was omitted, allowing private to be set without topology. Fixed to has(self.topology) && (...) ? has(self.private) : !has(self.private) — now correctly forbids private when topology is absent or Public.

  2. privateLink struct not required when type is PrivateLink: Only the negative constraint existed (forbid privateLink when type is not PrivateLink). Added self.type != 'PrivateLink' || has(self.privateLink) to require the struct.

Changes

  • api/hypershift/v1beta1/azure.go: Fix topology/private CEL rule, add privateLink requirement rule
  • Envtest cases: added "private without topology should fail", "private with Public topology should fail", "PrivateLink without privateLink config should fail"; fixed existing tests to include privateLink struct

Test plan

  • make test-envtest-ocp — all 630 specs pass

Summary by CodeRabbit

  • Bug Fixes
    • Azure topology validation tightened: private configuration is now consistently required for Private or PublicAndPrivate topologies and forbidden otherwise, preventing ambiguous or missing private settings.
    • PrivateLink validation strengthened: PrivateLink-specific settings are now required when PrivateLink is selected and disallowed for other types, reducing misconfiguration risk.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 12, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This pull request explicitly references no jira issue.

Details

In response to this:

Description

Fix two CEL validation gaps on Azure AzurePlatformSpec and AzurePrivateSpec:

  1. private was settable without topology: The existing rule !has(self.topology) || (...) short-circuited to true when topology was omitted, allowing private to be set without topology. Fixed to has(self.topology) && (...) ? has(self.private) : !has(self.private) — now correctly forbids private when topology is absent or Public.

  2. privateLink struct not required when type is PrivateLink: Only the negative constraint existed (forbid privateLink when type is not PrivateLink). Added self.type != 'PrivateLink' || has(self.privateLink) to require the struct.

Changes

  • api/hypershift/v1beta1/azure.go: Fix topology/private CEL rule, add privateLink requirement rule
  • Envtest cases: added "private without topology should fail", "private with Public topology should fail", "PrivateLink without privateLink config should fail"; fixed existing tests to include privateLink struct

Test plan

  • make test-envtest-ocp — all 630 specs pass

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels May 12, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 344c7833-57f9-4c6b-9701-06fec1624290

📥 Commits

Reviewing files that changed from the base of the PR and between ca317d6 and 8716def.

⛔ Files ignored due to path filters (33)
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/TLSAdherence.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/TLSAdherence.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • cmd/install/assets/crds/hypershift-operator/tests/hostedclusters.hypershift.openshift.io/stable.hostedclusters.azure.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
📒 Files selected for processing (1)
  • api/hypershift/v1beta1/azure.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/hypershift/v1beta1/azure.go

📝 Walkthrough

Walkthrough

Kubebuilder XValidation rules were tightened for two Azure API types in api/hypershift/v1beta1/azure.go. AzurePlatformSpec now requires topology to be present before evaluating whether private must be set (Private or PublicAndPrivate requires private; other values forbid it). AzurePrivateSpec now requires privateLink when type == 'PrivateLink' and forbids it otherwise.

🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: fixing Azure private/topology CEL validation rules in the azure.go file.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable and deterministic. YAML validation tests use descriptive static names, and Ginkgo tests contain no dynamic values or generated content.
Test Structure And Quality ✅ Passed Test suite meets all requirements: single responsibility, proper setup/cleanup, timeouts, meaningful messages, codebase consistency.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies CEL validation rules for Azure API types, not scheduling constraints. No pod affinity, node selectors, tolerations, topology spread, or replica count logic is introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds only YAML-based CRD validation tests to envtest suite, not Ginkgo e2e tests. Custom check applies only to new Ginkgo e2e tests (It(), Describe(), etc. in Go code).
No-Weak-Crypto ✅ Passed PR only modifies kubebuilder validation rules in azure.go; no cryptographic code or weak crypto algorithms detected.
Container-Privileges ✅ Passed PR modifies only azure.go with CEL validation rule updates; no container privilege, hostPID, hostNetwork, hostIPC, SYS_ADMIN, or allowPrivilegeEscalation settings found.
No-Sensitive-Data-In-Logs ✅ Passed No logging statements exposing sensitive data are added. Changes involve validation rules, tests, and label handling only.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/platform/azure PR/issue for Azure (AzurePlatform) platform approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.69%. Comparing base (6a6d8c1) to head (8716def).
⚠️ Report is 109 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8490   +/-   ##
=======================================
  Coverage   40.68%   40.69%           
=======================================
  Files         755      755           
  Lines       93368    93373    +5     
=======================================
+ Hits        37985    37994    +9     
+ Misses      52649    52646    -3     
+ Partials     2734     2733    -1     

see 15 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.70% <ø> (ø)
cpo-hostedcontrolplane 41.80% <ø> (ø)
cpo-other 41.39% <ø> (ø)
hypershift-operator 50.84% <ø> (+0.02%) ⬆️
other 31.61% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@bryan-cox bryan-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@enxebre enxebre force-pushed the enxebre/fix-azure-private-topology-cel branch from 59e049c to 9cafe87 Compare May 12, 2026 11:58
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@enxebre enxebre force-pushed the enxebre/fix-azure-private-topology-cel branch from 9cafe87 to 723185b Compare May 12, 2026 12:53
Copy link
Copy Markdown
Member

@bryan-cox bryan-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox, enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

//
// +kubebuilder:validation:XValidation:rule="self.type != 'PrivateLink' ? !has(self.privateLink) : true",message="privateLink is forbidden when type is not PrivateLink"
// +kubebuilder:validation:XValidation:rule="self.type != 'PrivateLink' || has(self.privateLink)",message="privateLink is required when type is PrivateLink"
// +union
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't those 2 rules be combined?

// +kubebuilder:validation:XValidation:rule="(self.type == 'PrivateLink') == has(self.privateLink)",message="privateLink must be set if and only if type is PrivateLink"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, @JoelSpeed is there any preference / convention for this? what's the impact on budget?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one we typically use for this is

// +kubebuilder:validation:XValidation:rule="self.type == 'PrivateLink' ? has(self.privateLink) : !has(self.privateLink)",message="privateLink is required when type is PrivateLink, and forbidden otherwise"

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 12, 2026

Test Results

e2e-aks

e2e-aws

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2054184559926317056 | Cost: $2.5378969999999996 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 13, 2026
@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented May 13, 2026

/label tide/merge-method-squash

enxebre and others added 3 commits June 2, 2026 10:02
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… rule

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@enxebre enxebre force-pushed the enxebre/fix-azure-private-topology-cel branch from ca317d6 to 8716def Compare June 2, 2026 08:03
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jun 2, 2026
@openshift-ci openshift-ci Bot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This pull request explicitly references no jira issue.

Details

In response to this:

Description

Fix two CEL validation gaps on Azure AzurePlatformSpec and AzurePrivateSpec:

  1. private was settable without topology: The existing rule !has(self.topology) || (...) short-circuited to true when topology was omitted, allowing private to be set without topology. Fixed to has(self.topology) && (...) ? has(self.private) : !has(self.private) — now correctly forbids private when topology is absent or Public.

  2. privateLink struct not required when type is PrivateLink: Only the negative constraint existed (forbid privateLink when type is not PrivateLink). Added self.type != 'PrivateLink' || has(self.privateLink) to require the struct.

Changes

  • api/hypershift/v1beta1/azure.go: Fix topology/private CEL rule, add privateLink requirement rule
  • Envtest cases: added "private without topology should fail", "private with Public topology should fail", "PrivateLink without privateLink config should fail"; fixed existing tests to include privateLink struct

Test plan

  • make test-envtest-ocp — all 630 specs pass

Summary by CodeRabbit

  • Bug Fixes
  • Azure topology validation tightened: private configuration is now consistently required for Private or PublicAndPrivate topologies and forbidden otherwise, preventing ambiguous or missing private settings.
  • PrivateLink validation strengthened: PrivateLink-specific settings are now required when PrivateLink is selected and disallowed for other types, reducing misconfiguration risk.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2026
@enxebre enxebre marked this pull request as ready for review June 2, 2026 08:04
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2026
@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 2, 2026

/test e2e-aks

@openshift-ci openshift-ci Bot requested review from muraee and sdminonne June 2, 2026 08:05
@devguyio
Copy link
Copy Markdown
Contributor

devguyio commented Jun 2, 2026

/lgtm yolo

@muraee
Copy link
Copy Markdown
Contributor

muraee commented Jun 2, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 2, 2026

/pipeline required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2061831107124400128 | Cost: $2.8484239500000004 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@enxebre
Copy link
Copy Markdown
Member Author

enxebre commented Jun 3, 2026

/retest
/verified by tests @enxebre

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 3, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@enxebre: This PR has been marked as verified by tests @enxebre.

Details

In response to this:

/retest
/verified by tests @enxebre

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have all the evidence needed. Let me produce the final report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

TestCreateCluster/Main/EnsureGlobalPullSecret/When_management-cluster_hostedCluster.Spec.PullSecret_is_updated_in-place_it_should_propagate_to_guest_without_rollout (1205.11s)

failed to wait for DaemonSet global-pull-secret-syncer to be ready: context deadline exceeded

DaemonSet global-pull-secret-syncer not ready: 2/3 pods ready

Summary

This is a pre-existing flaky test unrelated to PR #8490. The PR exclusively changes Azure CEL validation rules in api/hypershift/v1beta1/azure.go and CRD manifests, which have zero overlap with the TestCreateCluster/Main/EnsureGlobalPullSecret test that failed. The failure occurred because the global-pull-secret-syncer DaemonSet was stuck at 2/3 pods ready on the hosted cluster's 3 worker nodes for the entire 20-minute timeout. One pod on one of the three nodes (across AZs us-east-1a, 1b, 1c) never became ready. The same DaemonSet recovered and showed 3/3 ready in subsequent checks within the same test run, confirming the issue was a transient infrastructure condition (likely a node temporarily under pressure or a pod scheduling delay on one node). The secondary failure (Check_if_the_config.json_is_correct_in_all_of_the_nodes) is a cascade — the kubelet-config-verifier DaemonSet created in the first subtest was not cleaned up, so the second subtest fails with AlreadyExists.

Root Cause

The root cause is a transient infrastructure issue causing one global-pull-secret-syncer DaemonSet pod to remain unready on one of the three hosted cluster worker nodes. Key evidence:

  1. The DaemonSet was stuck at 2/3 pods ready from the moment it was first checked (line 2148 in build-log.txt) through the entire 20-minute deadline (line 7222), reporting 2/3 pods ready over 360 times.

  2. The DaemonSet recovered in subsequent checks — in the "first check" subtest (line 3130), the same DaemonSet initially showed a generation lag (status has not observed generation 4 yet (current 2)) but then recovered to 3/3 pods ready. Similarly, konnectivity-agent showed the same 2/3 → 3/3 pattern, suggesting one node had temporary pod readiness issues.

  3. Other DaemonSets showed similar transient 2/3 patternskonnectivity-agent was 2/3 then recovered to 3/3 at lines 3168-3171 and again at lines 6430-6433, reinforcing that one node was transiently impacted.

  4. The PR changes are completely unrelated — PR NO-JIRA: Fix Azure private/topology CEL validation rules #8490 modifies Azure CEL validation rules (azure.go) and CRD manifests. The failing test (EnsureGlobalPullSecret) operates on AWS hosted clusters and tests pull secret propagation. There is zero code path overlap.

  5. The secondary failure is a test cleanup issueCheck_if_the_config.json_is_correct_in_all_of_the_nodes fails because the kubelet-config-verifier DaemonSet created in the first (failing) subtest was not deleted before the second subtest tried to create it, resulting in HTTP 409 AlreadyExists.

Recommendations
  1. Retry / re-trigger the job — This failure is a transient infrastructure flake unrelated to the PR's code changes. A re-run should pass.

  2. Consider filing a test improvement issue — The EnsureGlobalPullSecret test has two robustness gaps:

    • The kubelet-config-verifier DaemonSet is not cleaned up between subtests, causing cascading AlreadyExists failures. The test should use CreateOrUpdate or delete-before-create semantics.
    • The 20-minute timeout for DaemonSet readiness is long but the test does not log which specific pod/node is unready, making diagnosis harder when it does time out.
  3. No code changes needed in PR NO-JIRA: Fix Azure private/topology CEL validation rules #8490 — The Azure CEL validation changes have no bearing on this AWS E2E test failure.

Evidence
Evidence Detail
Failed test TestCreateCluster/Main/EnsureGlobalPullSecret/When_management-cluster_hostedCluster.Spec.PullSecret_is_updated_in-place_it_should_propagate_to_guest_without_rollout
Failure mode DaemonSet global-pull-secret-syncer stuck at 2/3 pods ready for 1205s (context deadline exceeded)
Cascading failure Check_if_the_config.json_is_correct_in_all_of_the_nodeskubelet-config-verifier DaemonSet AlreadyExists (HTTP 409)
Node topology 3 NodePools (us-east-1a, 1b, 1c) × 1 replica each = 3 worker nodes
Recovery evidence Same DaemonSet recovered to 3/3 in subsequent checks (build-log.txt lines 3143, 3164, 6406)
Similar transient issues konnectivity-agent also showed 2/3 → 3/3 pattern (lines 3168-3171, 6430-6433)
PR #8490 changes Azure CEL validation only: api/hypershift/v1beta1/azure.go, CRD manifests, envtest cases
PR-to-failure overlap None — PR modifies Azure platform spec validation; failure is in AWS pull secret propagation test
Test statistics 601 tests total, 596 passed, 25 skipped, 5 failures (all in same EnsureGlobalPullSecret subtree)
Failure duration Primary subtest ran for 1205.11s (~20 min) before timeout

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 860b695 and 2 for PR HEAD 8716def in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

@enxebre: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 7461f85 into openshift:main Jun 3, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/platform/azure PR/issue for Azure (AzurePlatform) platform jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants