Skip to content

CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1)#8537

Open
muraee wants to merge 7 commits into
openshift:mainfrom
muraee:api-driven-azure-topology
Open

CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1)#8537
muraee wants to merge 7 commits into
openshift:mainfrom
muraee:api-driven-azure-topology

Conversation

@muraee
Copy link
Copy Markdown
Contributor

@muraee muraee commented May 18, 2026

Summary

  • Extend AzurePrivateType enum with Swift variant and add AzureSwiftSpec struct with immutable podNetworkInstance field
  • Add per-cluster API-driven helpers (UseSwiftNetworkingHCP/HC, UseSharedIngressHCP/HC, IsAroHCPByHCP/HC) that derive behavior from Topology and Private API fields instead of isAroHCP() env-var checks
  • Replace all per-cluster isAroHCP() / UseSharedIngress() calls in visibility, router, shared ingress, infra, nodepool, and CPO v2 component controllers with API-driven helpers
  • Add PublicEndpointExposed condition type for tracking public endpoint exposure state
  • Phase 1 backward compatibility: legacy annotation fallback in UseSwiftNetworkingHCP() for unmigrated clusters

This is Phase 1 of the API-driven Azure topology and private connectivity enhancement. Phase 2 (ARO-RP data migration) and Phase 3 (legacy cleanup) are out of scope.

Key Changes

API (api/hypershift/v1beta1/azure.go)

  • AzurePrivateType: Enum=PrivateLink;Swift
  • AzureSwiftSpec: podNetworkInstance (required, immutable, max 255 chars)
  • AzurePrivateSpec: struct-level type immutability (!has(oldSelf.type) || self.type == oldSelf.type) enabling Phase 2 migration, positive/negative union member rules

Helper Functions (support/netutil/visibility.go, support/azureutil/azureutil.go)

  • UseSwiftNetworkingHCP/HC() — API field first, env var + annotation fallback
  • UseSharedIngressHCP/HC()UseSwiftNetworking && IsPublic
  • IsAroHCPByHCP/HC() — per-cluster ARO wrapper for component predicates
  • IsPrivateHCP/HC() — Phase 1 fallback for Topology="" + Swift annotation
  • LabelHCPRoutes()UseSharedIngress || UseSwiftNetworking

Controller Changes

  • Config-generator: cluster-level UseSharedIngressHC() filter (security-critical — prevents Private cluster endpoints from leaking into public HAProxy config)
  • CPO v2 components (CNO, CCM, storage, registry, ingress, KAS): IsAroHCPByHCP(cpContext.HCP) replaces env-var predicates
  • Infra: UseSwiftNetworkingHCP() for ClusterIP vs LB router service decision
  • NodePool HAProxy, network policies, KAS healthcheck: per-cluster UseSharedIngressHCP/HC()

Not Changed (management-cluster-level, kept as env-var)

  • hypershift-operator/main.go — controller startup
  • control-plane-operator/main.go — controller startup
  • cmd/cluster/core/dump.go — dump command

Test plan

  • Unit tests: visibility helpers, serialization compatibility
  • Envtest CEL validation: 10 test cases (5 onCreate, 5 onUpdate) for Swift union rules
  • E2E: existing ARO HCP tests pass (CI)
  • Manual: verify UseSharedIngressHC() correctly filters Private+Swift clusters from HAProxy config

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Azure Swift networking added as an alternative to PrivateLink.
    • New PublicEndpointExposed condition to track public API reachability via shared ingress.
  • Improvements

    • Per-cluster Swift/shared-ingress and ARO detection for more accurate behavior in mixed environments.
    • Routing, ingress, router and storage flows now respect Swift/shared-ingress decisions.
  • Behavior/Validation

    • Azure private spec supports a Swift member with exclusive-selection and immutability rules.
  • Tests

    • Added serialization compatibility and public-endpoint condition tests.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 18, 2026

@muraee: This pull request references CNTRLPLANE-3250 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Extend AzurePrivateType enum with Swift variant and add AzureSwiftSpec struct with immutable podNetworkInstance field
  • Add per-cluster API-driven helpers (UseSwiftNetworkingHCP/HC, UseSharedIngressHCP/HC, IsAroHCPByHCP/HC) that derive behavior from Topology and Private API fields instead of isAroHCP() env-var checks
  • Replace all per-cluster isAroHCP() / UseSharedIngress() calls in visibility, router, shared ingress, infra, nodepool, and CPO v2 component controllers with API-driven helpers
  • Add PublicEndpointExposed condition type for tracking public endpoint exposure state
  • Phase 1 backward compatibility: legacy annotation fallback in UseSwiftNetworkingHCP() for unmigrated clusters

This is Phase 1 of the API-driven Azure topology and private connectivity enhancement. Phase 2 (ARO-RP data migration) and Phase 3 (legacy cleanup) are out of scope.

Key Changes

API (api/hypershift/v1beta1/azure.go)

  • AzurePrivateType: Enum=PrivateLink;Swift
  • AzureSwiftSpec: podNetworkInstance (required, immutable, max 255 chars)
  • AzurePrivateSpec: struct-level type immutability (!has(oldSelf.type) || self.type == oldSelf.type) enabling Phase 2 migration, positive/negative union member rules

Helper Functions (support/netutil/visibility.go, support/azureutil/azureutil.go)

  • UseSwiftNetworkingHCP/HC() — API field first, env var + annotation fallback
  • UseSharedIngressHCP/HC()UseSwiftNetworking && IsPublic
  • IsAroHCPByHCP/HC() — per-cluster ARO wrapper for component predicates
  • IsPrivateHCP/HC() — Phase 1 fallback for Topology="" + Swift annotation
  • LabelHCPRoutes()UseSharedIngress || UseSwiftNetworking

Controller Changes

  • Config-generator: cluster-level UseSharedIngressHC() filter (security-critical — prevents Private cluster endpoints from leaking into public HAProxy config)
  • CPO v2 components (CNO, CCM, storage, registry, ingress, KAS): IsAroHCPByHCP(cpContext.HCP) replaces env-var predicates
  • Infra: UseSwiftNetworkingHCP() for ClusterIP vs LB router service decision
  • NodePool HAProxy, network policies, KAS healthcheck: per-cluster UseSharedIngressHCP/HC()

Not Changed (management-cluster-level, kept as env-var)

  • hypershift-operator/main.go — controller startup
  • control-plane-operator/main.go — controller startup
  • cmd/cluster/core/dump.go — dump command

Test plan

  • Unit tests: visibility helpers, serialization compatibility
  • Envtest CEL validation: 10 test cases (5 onCreate, 5 onUpdate) for Swift union rules
  • E2E: existing ARO HCP tests pass (CI)
  • Manual: verify UseSharedIngressHC() correctly filters Private+Swift clusters from HAProxy config

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds Azure Swift support (AzureSwiftSpec) and XValidation for AzurePrivateSpec, JSON N-1 compatibility tests, netutil helpers for per-HCP/HC Swift and shared-ingress detection, per-cluster azureutil ARO helpers, migrates many ARO/shared-ingress checks to HCP/HC-scoped predicates across control-plane and hypershift controllers, and introduces a PublicEndpointExposed condition with probing logic and unit tests.

Sequence Diagram(s)

sequenceDiagram
  participant Reconciler as HostedClusterReconciler
  participant Router as SharedIngressService
  participant Probe as ProbeSharedIngressEndpoint
  participant HC as HostedCluster
  Reconciler->>Router: fetch router Service (ClusterIP / LB status)
  Router-->>Reconciler: service IP / LB hostname
  Reconciler->>Probe: probe(serviceIP, port, kasHostname)
  Probe-->>Reconciler: probe result (success/fail)
  Reconciler->>HC: update PublicEndpointExposed condition (Status/Reason)
Loading
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning public_endpoint_condition_test.go has 3 assertions without meaningful failure messages (lines 181, 185, 202). These bare assertions hamper debugging when failures occur. Add assertion messages to public_endpoint_condition_test.go lines 181, 185, 202. Other tests adequately follow all quality guidelines.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names across 3 test files use stable, descriptive static strings with no dynamic content like pod names, UUIDs, IPs, timestamps, or generated identifiers.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. All test files are standard Go unit tests using the testing.T framework.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All new test files use standard Go unit testing patterns (*testing.T), not Ginkgo patterns (Describe/It/When). SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR does not introduce any scheduling constraints. Changes are API definitions, helper functions, and controller logic refactoring that maintain topology compatibility.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations detected. PR adds no process-level code that writes to stdout. Test files properly structured. All logging isolated within function code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added. Three new test files are standard Go unit tests using testing.T, not Ginkgo Describe/It patterns. Check not applicable.
Title check ✅ Passed The title clearly summarizes the main change: adding API-driven Azure topology and private connectivity support with Phase 1 implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: muraee
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/azure PR/issue for Azure (AzurePlatform) platform and removed do-not-merge/needs-area labels May 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
hypershift-operator/controllers/hostedcluster/network_policies.go (1)

80-87: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Delete SharedIngressNetworkPolicy when shared ingress is no longer used.

This branch only creates the policy. With per-HC gating, a true→false transition leaves a stale policy that still permits ingress from the shared-ingress namespace.

Suggested fix
 	if netutil.UseSharedIngressHC(hcluster) {
 		policy := networkpolicy.SharedIngressNetworkPolicy(controlPlaneNamespaceName)
 		if _, err := createOrUpdate(ctx, r.Client, policy, func() error {
 			return reconcileSharedIngressNetworkPolicy(policy, hcluster)
 		}); err != nil {
 			return fmt.Errorf("failed to reconcile sharedingress network policy: %w", err)
 		}
+	} else {
+		policy := networkpolicy.SharedIngressNetworkPolicy(controlPlaneNamespaceName)
+		if _, err := k8sutil.DeleteIfNeeded(ctx, r.Client, policy); err != nil {
+			return fmt.Errorf("failed to delete sharedingress network policy: %w", err)
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go` around
lines 80 - 87, The code only creates/updates SharedIngressNetworkPolicy when
netutil.UseSharedIngressHC(hcluster) is true, so when that function returns
false existing policies remain; update the logic in the enclosing function to
handle the false branch by locating the SharedIngressNetworkPolicy (use
networkpolicy.SharedIngressNetworkPolicy(controlPlaneNamespaceName) or the same
name/labels used by reconcileSharedIngressNetworkPolicy) and delete it via the
controller client (e.g., r.Client.Delete) if it exists, ignoring NotFound errors
and returning other errors; keep the existing createOrUpdate/reconcile path
unchanged for the true branch and ensure error messages reference the same
resources (SharedIngressNetworkPolicy) so failures to delete are surfaced.
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

970-988: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear stale Azure PLS conditions when Swift networking is active.

At Line 970, Swift-enabled clusters skip this condition sync path, so previously set AzurePrivateLinkService* conditions can linger and misreport topology state after migration or fallback changes.

Suggested fix
-	if hcluster.Spec.Platform.Type == hyperv1.AzurePlatform && !netutil.UseSwiftNetworkingHC(hcluster) {
+	if hcluster.Spec.Platform.Type == hyperv1.AzurePlatform && !netutil.UseSwiftNetworkingHC(hcluster) {
 		hcpNamespace := manifests.HostedControlPlaneNamespace(hcluster.Namespace, hcluster.Name)
 		var azPLSList hyperv1.AzurePrivateLinkServiceList
 		if err := r.List(ctx, &azPLSList, &client.ListOptions{Namespace: hcpNamespace}); err != nil {
 			condition := metav1.Condition{
 				Type:    string(hyperv1.AzurePrivateLinkServiceAvailable),
 				Status:  metav1.ConditionUnknown,
 				Reason:  hyperv1.NotFoundReason,
 				Message: fmt.Sprintf("error listing AzurePrivateLinkService in namespace %s: %v", hcpNamespace, err),
 			}
 			meta.SetStatusCondition(&hcluster.Status.Conditions, condition)
 		} else if len(azPLSList.Items) > 0 {
 			meta.SetStatusCondition(&hcluster.Status.Conditions, computeAzurePLSCondition(azPLSList, hyperv1.AzurePrivateLinkServiceAvailable))
 			meta.SetStatusCondition(&hcluster.Status.Conditions, computeAzurePLSCondition(azPLSList, hyperv1.AzurePLSCreated))
 			meta.SetStatusCondition(&hcluster.Status.Conditions, computeAzurePLSCondition(azPLSList, hyperv1.AzureInternalLoadBalancerAvailable))
 			meta.SetStatusCondition(&hcluster.Status.Conditions, computeAzurePLSCondition(azPLSList, hyperv1.AzurePrivateEndpointAvailable))
 			meta.SetStatusCondition(&hcluster.Status.Conditions, computeAzurePLSCondition(azPLSList, hyperv1.AzurePrivateDNSAvailable))
 		}
+	} else if hcluster.Spec.Platform.Type == hyperv1.AzurePlatform {
+		for _, t := range []hyperv1.ConditionType{
+			hyperv1.AzurePrivateLinkServiceAvailable,
+			hyperv1.AzurePLSCreated,
+			hyperv1.AzureInternalLoadBalancerAvailable,
+			hyperv1.AzurePrivateEndpointAvailable,
+			hyperv1.AzurePrivateDNSAvailable,
+		} {
+			meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(t))
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 970 - 988, When Swift networking is active the controller currently
skips the Azure PLS sync and can leave stale AzurePrivateLinkService*
conditions; update the logic around netutil.UseSwiftNetworkingHC(hcluster) so
that when it returns true you explicitly clear/reset the Azure PLS conditions
(AzurePrivateLinkServiceAvailable, AzurePLSCreated,
AzureInternalLoadBalancerAvailable, AzurePrivateEndpointAvailable,
AzurePrivateDNSAvailable) from hcluster.Status.Conditions instead of leaving
them unchanged—use meta.RemoveStatusCondition(...) for each condition type (or
set them to a neutral/Unknown condition via meta.SetStatusCondition if you
prefer a reset message) in the hostedcluster_controller.go code path that checks
UseSwiftNetworkingHC to ensure stale state is removed.
🧹 Nitpick comments (1)
support/netutil/visibility_test.go (1)

106-118: ⚡ Quick win

Remove duplicated Azure topology table cases.

These two entries duplicate scenarios already covered earlier in baseVisibilityCases() and add noise without new behavior coverage.

Proposed cleanup
-		{
-			name:          "When Azure topology is PublicAndPrivate it should be public and private",
-			platformType:  hyperv1.AzurePlatform,
-			azureTopology: hyperv1.AzureTopologyPublicAndPrivate,
-			wantPrivate:   true,
-			wantPublic:    true,
-		},
-		{
-			name:          "When Azure topology is Private it should be private and not public",
-			platformType:  hyperv1.AzurePlatform,
-			azureTopology: hyperv1.AzureTopologyPrivate,
-			wantPrivate:   true,
-			wantPublic:    false,
-		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@support/netutil/visibility_test.go` around lines 106 - 118, Remove the
duplicated Azure topology table cases from the test table: delete the two
entries with names "When Azure topology is PublicAndPrivate it should be public
and private" and "When Azure topology is Private it should be private and not
public" in visibility_test.go because they duplicate scenarios already covered
by baseVisibilityCases(); rely on baseVisibilityCases() for those cases to avoid
redundant test rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 970-988: When Swift networking is active the controller currently
skips the Azure PLS sync and can leave stale AzurePrivateLinkService*
conditions; update the logic around netutil.UseSwiftNetworkingHC(hcluster) so
that when it returns true you explicitly clear/reset the Azure PLS conditions
(AzurePrivateLinkServiceAvailable, AzurePLSCreated,
AzureInternalLoadBalancerAvailable, AzurePrivateEndpointAvailable,
AzurePrivateDNSAvailable) from hcluster.Status.Conditions instead of leaving
them unchanged—use meta.RemoveStatusCondition(...) for each condition type (or
set them to a neutral/Unknown condition via meta.SetStatusCondition if you
prefer a reset message) in the hostedcluster_controller.go code path that checks
UseSwiftNetworkingHC to ensure stale state is removed.

In `@hypershift-operator/controllers/hostedcluster/network_policies.go`:
- Around line 80-87: The code only creates/updates SharedIngressNetworkPolicy
when netutil.UseSharedIngressHC(hcluster) is true, so when that function returns
false existing policies remain; update the logic in the enclosing function to
handle the false branch by locating the SharedIngressNetworkPolicy (use
networkpolicy.SharedIngressNetworkPolicy(controlPlaneNamespaceName) or the same
name/labels used by reconcileSharedIngressNetworkPolicy) and delete it via the
controller client (e.g., r.Client.Delete) if it exists, ignoring NotFound errors
and returning other errors; keep the existing createOrUpdate/reconcile path
unchanged for the true branch and ensure error messages reference the same
resources (SharedIngressNetworkPolicy) so failures to delete are surfaced.

---

Nitpick comments:
In `@support/netutil/visibility_test.go`:
- Around line 106-118: Remove the duplicated Azure topology table cases from the
test table: delete the two entries with names "When Azure topology is
PublicAndPrivate it should be public and private" and "When Azure topology is
Private it should be private and not public" in visibility_test.go because they
duplicate scenarios already covered by baseVisibilityCases(); rely on
baseVisibilityCases() for those cases to avoid redundant test rows.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d1a66cca-7f97-4534-947f-4982768bfce8

📥 Commits

Reviewing files that changed from the base of the PR and between cf2b91f and 340ccd7.

⛔ Files ignored due to path filters (42)
  • api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/TLSAdherence.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/TLSAdherence.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • client/applyconfiguration/hypershift/v1beta1/azureprivatespec.go is excluded by !client/**
  • client/applyconfiguration/hypershift/v1beta1/azureswiftspec.go is excluded by !client/**
  • client/applyconfiguration/utils.go is excluded by !client/**
  • cmd/install/assets/crds/hypershift-operator/tests/hostedclusters.hypershift.openshift.io/stable.hostedclusters.azure.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/azure.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*
📒 Files selected for processing (35)
  • api/hypershift/v1beta1/azure.go
  • api/hypershift/v1beta1/azure_private_spec_test.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/ingress/router.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/healthcheck.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/controlplaneoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/controlplaneoperator/role.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/registryoperator/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/registryoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/util/util.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/storage/azure.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/storage/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/storage/deployment.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
  • hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go
  • sharedingress-config-generator/config.go
  • support/azureutil/azureutil.go
  • support/netutil/visibility.go
  • support/netutil/visibility_test.go

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 May 18, 2026 15:37 Inactive
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 67.40088% with 74 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.47%. Comparing base (f13c62d) to head (189a75a).

Files with missing lines Patch % Lines
...trollers/hostedcluster/hostedcluster_controller.go 66.66% 27 Missing and 6 partials ⚠️
...controllers/hostedcontrolplane/v2/storage/azure.go 0.00% 6 Missing ⚠️
...lplane/v2/cloud_controller_manager/azure/config.go 0.00% 3 Missing and 2 partials ⚠️
support/netutil/visibility.go 94.02% 3 Missing and 1 partial ⚠️
...ne/v2/cloud_controller_manager/azure/deployment.go 0.00% 3 Missing ⚠️
...rconfigoperator/controllers/resources/resources.go 25.00% 2 Missing and 1 partial ⚠️
...ane/v2/cloud_controller_manager/azure/component.go 0.00% 2 Missing ⚠️
...controlplane/v2/controlplaneoperator/deployment.go 0.00% 2 Missing ⚠️
...rollers/hostedcontrolplane/v2/router/deployment.go 0.00% 2 Missing ⚠️
...ostedcontrolplane/hostedcontrolplane_controller.go 0.00% 1 Missing ⚠️
... and 13 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8537      +/-   ##
==========================================
+ Coverage   41.43%   41.47%   +0.04%     
==========================================
  Files         756      756              
  Lines       93647    93796     +149     
==========================================
+ Hits        38802    38906     +104     
- Misses      52124    52161      +37     
- Partials     2721     2729       +8     
Files with missing lines Coverage Δ
...ator/controllers/hostedcontrolplane/infra/infra.go 50.54% <100.00%> (ø)
.../controllers/hostedcontrolplane/kas/healthcheck.go 100.00% <100.00%> (ø)
...ator/controllers/hostedcontrolplane/kas/service.go 40.36% <100.00%> (ø)
...trollers/hostedcontrolplane/v2/router/util/util.go 100.00% <100.00%> (ø)
...ator/controllers/hostedcluster/network_policies.go 77.19% <100.00%> (ø)
.../controllers/nodepool/apiserver-haproxy/haproxy.go 62.72% <100.00%> (ø)
...erator/controllers/nodepool/nodepool_controller.go 43.06% <100.00%> (-0.07%) ⬇️
sharedingress-config-generator/config.go 75.38% <100.00%> (-6.65%) ⬇️
support/azureutil/azureutil.go 44.93% <100.00%> (+0.70%) ⬆️
...ostedcontrolplane/hostedcontrolplane_controller.go 45.70% <0.00%> (ø)
... and 22 more
Flag Coverage Δ
cmd-support 34.98% <94.36%> (+0.10%) ⬆️
cpo-hostedcontrolplane 43.50% <24.44%> (ø)
cpo-other 42.74% <25.00%> (ø)
hypershift-operator 51.62% <68.57%> (+0.05%) ⬆️
other 31.56% <100.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 5360-5362: The lbReady calculation only checks
svc.Status.LoadBalancer.Ingress[0] and can miss readiness on other ingress
entries; change the lbReady logic (where lbReady is computed) to iterate over
svc.Status.LoadBalancer.Ingress and set lbReady true if any ingress element has
a non-empty Hostname or IP, preserving the existing short-circuit with
netutil.UseSharedIngressHC(hc) usage and any surrounding logic that depends on
lbReady.
- Around line 2092-2096: The current code swallows errors from
reconcilePublicEndpointExposedCondition by only logging them; instead propagate
the error so the controller-runtime retries properly. In the reconcile loop
where reconcilePublicEndpointExposedCondition(ctx, hcluster) is called, change
the error branch to return the error (e.g., return ctrl.Result{}, err) rather
than just logging, and keep the existing logic that updates requeueAfter when
pubEndpointRequeue is non-nil; reference
reconcilePublicEndpointExposedCondition, pubEndpointRequeue, and requeueAfter to
locate and update the error-handling path.
- Around line 5342-5358: The early returns when kasStrategy is nil or missing
route info and when the shared ingress Service is missing or has no ClusterIP
leave the PublicEndpointExposed condition stale; change these code paths (the
checks around kasStrategy/kasHostname and the svc/svcKey lookup and clusterIP
check) to explicitly set the HostedCluster status condition
PublicEndpointExposed to False (with a clear Reason like "NoSharedIngress" or
"RouteNotReady"), persist the status update before returning, and return a
non-error requeue (so the controller will reconcile again) instead of silently
returning nil; ensure you use the existing status update helpers in this
controller to update the condition so retries and removal transitions observe
the explicit state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5724ac97-9b6c-412c-88a0-22faceb6d3b5

📥 Commits

Reviewing files that changed from the base of the PR and between 340ccd7 and 6448520.

⛔ Files ignored due to path filters (1)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (2)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/hypershift/v1beta1/hostedcluster_conditions.go

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 May 18, 2026 16:41 Inactive
@muraee muraee force-pushed the api-driven-azure-topology branch from 6448520 to a4756a0 Compare May 18, 2026 16:47
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 May 18, 2026 16:50 Inactive
@muraee muraee force-pushed the api-driven-azure-topology branch from a4756a0 to 2b04196 Compare May 18, 2026 16:52
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 May 18, 2026 16:55 Inactive
@muraee muraee force-pushed the api-driven-azure-topology branch from 2b04196 to 589ce0d Compare May 18, 2026 17:06
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 May 18, 2026 17:09 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/infra/infra.go (1)

458-464: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the standard router service reconciliation in the Swift path.

This branch bypasses ingress.ReconcileRouterService and only mutates a few Service fields, so the private router service no longer gets the usual ownership/metadata setup and can be orphaned on HostedControlPlane deletion.

One way to keep the existing contract
 	if netutil.UseSwiftNetworkingHCP(hcp) {
 		if _, err := k8sutil.DeleteIfNeeded(ctx, r.Client, pubSvc); err != nil {
 			return fmt.Errorf("failed to delete public router service: %w", err)
 		}
 		if _, err := createOrUpdate(ctx, r.Client, privSvc, func() error {
-			return reconcileAROPrivateRouterService(privSvc, hcp)
+			if err := ingress.ReconcileRouterService(privSvc, true, true, hcp); err != nil {
+				return err
+			}
+			return reconcileAROPrivateRouterService(privSvc, hcp)
 		}); err != nil {
 			return fmt.Errorf("failed to reconcile private router service: %w", err)
 		}

As per coding guidelines, "Follow owner reference patterns for proper garbage collection of resources".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control-plane-operator/controllers/hostedcontrolplane/infra/infra.go` around
lines 458 - 464, The Swift networking branch skips the standard
ingress.ReconcileRouterService flow and instead manually mutates the Service
(privSvc), losing ownership/metadata so the Service can be orphaned on
HostedControlPlane deletion; restore the normal contract by invoking
ingress.ReconcileRouterService (or reusing its logic) for the router Services in
the netutil.UseSwiftNetworkingHCP(hcp) path (rather than only calling
reconcileAROPrivateRouterService), and ensure you use
createOrUpdate/DeleteIfNeeded around pubSvc/privSvc while preserving owner
references and standard metadata so the HostedControlPlane becomes the
controller for garbage collection.
♻️ Duplicate comments (2)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (2)

5365-5366: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Evaluate all LoadBalancer ingress entries for readiness.

At Line 5365-5366, readiness checks only Ingress[0]. If another ingress entry has IP/hostname, this produces false negatives.

💡 Suggested fix
- lbReady := len(svc.Status.LoadBalancer.Ingress) > 0 &&
-   (svc.Status.LoadBalancer.Ingress[0].Hostname != "" || svc.Status.LoadBalancer.Ingress[0].IP != "")
+ lbReady := false
+ for _, ing := range svc.Status.LoadBalancer.Ingress {
+   if ing.Hostname != "" || ing.IP != "" {
+     lbReady = true
+     break
+   }
+ }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 5365 - 5366, The lbReady check currently only inspects
svc.Status.LoadBalancer.Ingress[0], causing false negatives; change the logic
that sets lbReady to scan all entries in svc.Status.LoadBalancer.Ingress and set
lbReady to true if any entry has a non-empty Hostname or IP. Update the code
that assigns lbReady (reference variable lbReady and the slice
svc.Status.LoadBalancer.Ingress) to iterate over the slice (handle nil/empty
slice as false) and break early when an ingress with Hostname != "" or IP != ""
is found. Ensure the new logic preserves existing behavior when the ingress list
is empty.

5347-5363: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set PublicEndpointExposed explicitly on missing prerequisites.

At Line 5348 and Line 5356-5362, the function returns without updating condition state. That can leave PublicEndpointExposed stale during convergence/removal transitions. Set it to False with a transition reason and requeue when shared ingress is expected.

💡 Suggested fix direction
- if kasStrategy == nil || kasStrategy.Route == nil || kasStrategy.Route.Hostname == "" {
-   return nil, nil
- }
+ if kasStrategy == nil || kasStrategy.Route == nil || kasStrategy.Route.Hostname == "" {
+   cond := metav1.Condition{
+     Type:               string(hyperv1.PublicEndpointExposed),
+     Status:             metav1.ConditionFalse,
+     Reason:             hyperv1.PublicEndpointConvergenceInProgressReason,
+     Message:            "API server route hostname is not yet available",
+     ObservedGeneration: hc.Generation,
+   }
+   if meta.SetStatusCondition(&hc.Status.Conditions, cond) {
+     if err := r.Client.Status().Update(ctx, hc); err != nil {
+       return nil, fmt.Errorf("failed to update PublicEndpointExposed condition: %w", err)
+     }
+   }
+   d := 10 * time.Second
+   return &d, nil
+ }

As per coding guidelines, **/{hypershift-operator,control-plane-operator}/controllers/**/*.go: Follow standard controller-runtime patterns for operator controllers with proper error handling and requeuing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 5347 - 5363, The controller early-returns when shared ingress is
missing (checks using ServicePublishingStrategyByTypeByHC and the Service
fetched via r.Client.Get) without updating the HostedCluster condition
PublicEndpointExposed; modify the control flow so that when kasStrategy is nil
or missing Route/Hostname, or when the shared ingress Service is not found or
has no ClusterIP, you explicitly set the HostedCluster's PublicEndpointExposed
condition to False with an appropriate transition reason/message (e.g.,
"SharedIngressUnavailable"/"MissingServiceClusterIP"), persist the status
update, and requeue the reconciliation (rather than returning nil,nil) so state
converges during creation/removal; ensure updates occur where kasStrategy, svc
retrieval (r.Client.Get) and svc.Spec.ClusterIP checks currently return early.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/hypershift/v1beta1/azure.go`:
- Around line 665-680: AzurePrivateSpec currently allows AzurePrivateTypeSwift
even when the cluster is self-managed; add an XValidation rule to gate Swift to
managed Azure auth by ensuring when AzurePrivateType == "Swift" the
azureAuthenticationConfigType is the managed value (e.g. not
"WorkloadIdentities" / must equal "Managed"). Concretely, add a
kubebuilder:validation:XValidation rule to the AzurePrivateSpec similar to the
existing rules that enforces self.type != 'Swift' || has(self.swift) AND
self.type != 'Swift' || (self.azureAuthenticationConfigType == 'Managed') (or
equivalent != 'WorkloadIdentities') so Swift is only valid for managed-auth
clusters; reference AzurePrivateTypeSwift, AzurePrivateSpec, and
azureAuthenticationConfigType when making the change.

---

Outside diff comments:
In `@control-plane-operator/controllers/hostedcontrolplane/infra/infra.go`:
- Around line 458-464: The Swift networking branch skips the standard
ingress.ReconcileRouterService flow and instead manually mutates the Service
(privSvc), losing ownership/metadata so the Service can be orphaned on
HostedControlPlane deletion; restore the normal contract by invoking
ingress.ReconcileRouterService (or reusing its logic) for the router Services in
the netutil.UseSwiftNetworkingHCP(hcp) path (rather than only calling
reconcileAROPrivateRouterService), and ensure you use
createOrUpdate/DeleteIfNeeded around pubSvc/privSvc while preserving owner
references and standard metadata so the HostedControlPlane becomes the
controller for garbage collection.

---

Duplicate comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 5365-5366: The lbReady check currently only inspects
svc.Status.LoadBalancer.Ingress[0], causing false negatives; change the logic
that sets lbReady to scan all entries in svc.Status.LoadBalancer.Ingress and set
lbReady to true if any entry has a non-empty Hostname or IP. Update the code
that assigns lbReady (reference variable lbReady and the slice
svc.Status.LoadBalancer.Ingress) to iterate over the slice (handle nil/empty
slice as false) and break early when an ingress with Hostname != "" or IP != ""
is found. Ensure the new logic preserves existing behavior when the ingress list
is empty.
- Around line 5347-5363: The controller early-returns when shared ingress is
missing (checks using ServicePublishingStrategyByTypeByHC and the Service
fetched via r.Client.Get) without updating the HostedCluster condition
PublicEndpointExposed; modify the control flow so that when kasStrategy is nil
or missing Route/Hostname, or when the shared ingress Service is not found or
has no ClusterIP, you explicitly set the HostedCluster's PublicEndpointExposed
condition to False with an appropriate transition reason/message (e.g.,
"SharedIngressUnavailable"/"MissingServiceClusterIP"), persist the status
update, and requeue the reconciliation (rather than returning nil,nil) so state
converges during creation/removal; ensure updates occur where kasStrategy, svc
retrieval (r.Client.Get) and svc.Spec.ClusterIP checks currently return early.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 16b9a6b0-a8fd-4187-9d8f-bea523d0ff8a

📥 Commits

Reviewing files that changed from the base of the PR and between 2b04196 and 589ce0d.

⛔ Files ignored due to path filters (42)
  • api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/TLSAdherence.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AAA_ungated.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDC.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HCPEtcdBackup.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/KMSEncryptionProvider.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/OpenStack.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/TLSAdherence.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • client/applyconfiguration/hypershift/v1beta1/azureprivatespec.go is excluded by !client/**
  • client/applyconfiguration/hypershift/v1beta1/azureswiftspec.go is excluded by !client/**
  • client/applyconfiguration/utils.go is excluded by !client/**
  • cmd/install/assets/crds/hypershift-operator/tests/hostedclusters.hypershift.openshift.io/stable.hostedclusters.azure.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/azure.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*
📒 Files selected for processing (36)
  • api/hypershift/v1beta1/azure.go
  • api/hypershift/v1beta1/azure_private_spec_test.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/ingress/router.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/healthcheck.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/controlplaneoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/controlplaneoperator/role.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/registryoperator/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/registryoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/util/util.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/storage/azure.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/storage/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/storage/deployment.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
  • hypershift-operator/controllers/hostedcluster/public_endpoint_condition_test.go
  • hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go
  • sharedingress-config-generator/config.go
  • support/azureutil/azureutil.go
  • support/netutil/visibility.go
  • support/netutil/visibility_test.go
🚧 Files skipped from review as they are similar to previous changes (26)
  • control-plane-operator/controllers/hostedcontrolplane/v2/registryoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/controlplaneoperator/role.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/component.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/healthcheck.go
  • sharedingress-config-generator/config.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/router/config.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • hypershift-operator/controllers/nodepool/apiserver-haproxy/haproxy.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/controlplaneoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/ingress/router.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/config.go
  • support/azureutil/azureutil.go
  • hypershift-operator/controllers/hostedcluster/public_endpoint_condition_test.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/registryoperator/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/azure/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/storage/azure.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
  • api/hypershift/v1beta1/azure_private_spec_test.go
  • support/netutil/visibility.go

Comment thread api/hypershift/v1beta1/azure.go Outdated
@muraee muraee force-pushed the api-driven-azure-topology branch from 589ce0d to 9a3435a Compare May 19, 2026 10:10
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 May 19, 2026 10:18 Inactive
@muraee muraee force-pushed the api-driven-azure-topology branch from 8a0850e to 5e9c12c Compare May 19, 2026 15:17
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jun 3, 2026

/test e2e-aks

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 June 3, 2026 14:29 Inactive
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2026
@muraee muraee force-pushed the api-driven-azure-topology branch 2 times, most recently from cb1a5ee to 85d7b5b Compare June 3, 2026 18:09
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2026
@muraee muraee force-pushed the api-driven-azure-topology branch 2 times, most recently from 029b997 to c307083 Compare June 3, 2026 18:16
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jun 3, 2026

/test e2e-aks
/test e2e-azure-self-managed

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 June 3, 2026 18:23 Inactive
Copy link
Copy Markdown
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Withdrawing my previous ratcheting concern: PR #8490 (now merged) introduced the PrivateLink ternary change. This PR only adds the Swift ternary (new field, nothing to ratchet) and loosens the type immutability rule. No validation tightening here.

Three remaining items on the API side, all on new code in this PR.

Comment thread api/hypershift/v1beta1/azure.go Outdated
Comment thread api/hypershift/v1beta1/azure.go Outdated
Comment thread api/hypershift/v1beta1/azure.go
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 June 5, 2026 10:09 Inactive
@muraee muraee force-pushed the api-driven-azure-topology branch from eee04ec to 080c56a Compare June 5, 2026 11:32
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 June 5, 2026 11:38 Inactive
@muraee muraee force-pushed the api-driven-azure-topology branch from 080c56a to 8323431 Compare June 5, 2026 12:33
muraee and others added 6 commits June 5, 2026 14:35
…ruct

Add Swift as a new AzurePrivateType enum value and introduce
AzureSwiftSpec struct with topology and pod network instance fields.
Add PublicEndpointExposed condition type for shared ingress
convergence tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add per-cluster helper functions that derive topology behavior from
the Topology and Private API fields instead of isAroHCP() env-var checks:

- UseSwiftNetworkingHCP/HC: checks Private.Type == Swift with legacy
  annotation fallback for unmigrated clusters
- UseSharedIngressHCP/HC: UseSwiftNetworking && IsPublic, with env-var
  fallback for ARO HCP without Swift API fields (e.g. CI)
- SwiftPodNetworkInstanceHCP: API field first, annotation fallback
- IsAroHCPByHCP/HC: checks ManagedIdentities auth type as definitive
  per-cluster ARO HCP indicator, decoupled from Swift detection

Update IsPrivateHCP/HC with Phase 1 fallback for Topology="" + Swift.
Update LabelHCPRoutes to use UseSharedIngress || UseSwiftNetworking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace azureutil.IsAroHCP() env-var checks with API-driven per-cluster
helpers across all CPO controllers:

- v2 components: CNO, CCM, storage, registry operator, ingress operator
  predicates and adapt functions now use IsAroHCPByHCP(cpContext.HCP)
- KAS, CPO deployment, router config adapt functions
- Router deployment: SwiftPodNetworkInstanceHCP instead of annotation read
- UseHCPRouter: UseSwiftNetworkingHCP for Swift routing, with
  UseSharedIngressHCP guard for shared ingress clusters
- reconcileHCPRouterServices: UseSwiftNetworkingHCP for ClusterIP vs LB
- reconcileExternalRouterServiceStatus: UseSharedIngressHCP
- reconcileAPIServerServiceStatus: UseSharedIngressHCP
- KAS healthcheck: UseSharedIngressHCP
- HCCO credential and platform resource reconciliation
- verifyResourceGroupLocationsMatch: reverted to IsAroHCP() env var
  (management-cluster-level check, not per-cluster)

Management-cluster-level calls in main.go are intentionally kept.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace env-var and parameterless checks with per-cluster API-driven
helpers in HO controllers:

- Config-generator: add UseSharedIngressHC cluster-level filter to skip
  Private clusters entirely from HAProxy config (security-critical)
- NodePool HAProxy: UseSharedIngressHC for worker node backend selection
- NodePool controller: UseSharedIngressHC for HAProxy image selection,
  preventing unnecessary rollouts on Private+Swift clusters
- Network policies: UseSharedIngressHC for shared ingress policy
- HostedCluster controller: UseSwiftNetworkingHC and IsAroHCPByHC
  for PrivateLink validation, SecretProviderClass, and hosted resources

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add reconciliation logic for the PublicEndpointExposed condition on
HostedCluster. The HC controller probes the shared ingress Service's
ClusterIP with the cluster's KAS hostname as SNI to determine whether
public API server endpoints are actually reachable.

The probe function is injectable via ProbeSharedIngressEndpoint field
on the reconciler for unit testing.

Condition truth table (observed state):
- LB not ready + UseSharedIngress: False/ConvergenceInProgress
- Probe success + UseSharedIngress: True/SharedIngressConfigured
- Probe failure + UseSharedIngress: False/ConvergenceInProgress
- Probe success + !UseSharedIngress: True/ConvergenceInProgress
- Probe failure + !UseSharedIngress: False/TopologyPrivate

Convergence cases requeue after 10s to re-check. Only set on Swift
clusters; non-Swift clusters (PrivateLink) don't get this condition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@muraee muraee force-pushed the api-driven-azure-topology branch from 8323431 to 81c25e6 Compare June 5, 2026 12:35
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jun 5, 2026

/test e2e-aks
/test e2e-azure-self-managed

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8537 June 5, 2026 12:45 Inactive
Update test objects across CPO and HO controllers to use Swift API
fields (Private.Type=Swift, Topology) for API-driven Swift detection.
Add unit tests for UseSwiftNetworkingHCP/HC, UseSharedIngressHCP/HC,
IsAroHCPByHCP/HC, and SwiftPodNetworkInstanceHCP covering API field,
annotation fallback, env-var fallback, and negative cases.

Split infra test cases into annotation-only, API-only, and both
variants for comprehensive Swift detection coverage.

Update test fixtures to set AzureAuthenticationConfigType=ManagedIdentities
when simulating ARO HCP clusters.

Add envtest test suite for Azure Swift topology validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@muraee muraee force-pushed the api-driven-azure-topology branch from 81c25e6 to 189a75a Compare June 5, 2026 12:49
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Jun 5, 2026

/test e2e-aks
/test e2e-azure-self-managed

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented Jun 5, 2026

Now I have all the information needed. Let me compile the final report:

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-images, pull-ci-openshift-hypershift-main-e2e-azure-self-managed, pull-ci-openshift-hypershift-main-e2e-aks, pull-ci-openshift-hypershift-main-security
  • Build IDs: 2062876016212185088 (images), 2062876057182146560 (e2e-azure-self-managed), 2062876057131814912 (e2e-aks), 2062876016535146496 (security)
  • PR: CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1) #8537 — CNTRLPLANE-3250, CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1)
  • Failed Commit: 81c25e6ff315682002ee5743df7283675bd1a2a9 (superseded by force-push to 189a75a9b52f)
  • Status: ✅ Already fixed — new CI runs triggered on corrected HEAD are passing

Test Failure Analysis

Error

# github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster
hypershift-operator/controllers/hostedcluster/network_policies.go:82:5: undefined: netutil

Summary

All 9 failing CI jobs were caused by a single compilation error: a missing import statement for the netutil package in network_policies.go. Commit eb9d9f95 in the PR changed sharedingress.UseSharedIngress() to netutil.UseSharedIngressHC(hcluster) at line 82, but did not add the corresponding import "github.com/openshift/hypershift/support/netutil" statement. This Go compilation error blocked every job that builds the hypershift-operator or hypershift-tests binary — images, e2e tests, lint, verify, unit tests, and Konflux builds all failed identically. The author has already force-pushed a corrected version (189a75a9b52f) that includes the import, and the re-triggered CI jobs are now passing.

Root Cause

The PR refactors ~30 call sites across the codebase to replace environment-variable-based ARO HCP detection (sharedingress.UseSharedIngress(), isAroHCP()) with new API-driven per-cluster helpers in the support/netutil package (netutil.UseSharedIngressHC(), netutil.IsAroHCPByHCP(), etc.).

In commit eb9d9f95 ("refactor(ho): replace isAroHCP/UseSharedIngress with API-driven helpers"), the file hypershift-operator/controllers/hostedcluster/network_policies.go was modified at line 82 to call netutil.UseSharedIngressHC(hcluster), but the required import line was omitted:

// MISSING from the import block:
"github.com/openshift/hypershift/support/netutil"

This caused the Go compiler to emit undefined: netutil, which is a fatal build error. Since the hostedcluster package is imported transitively by the hypershift-operator binary, the hypershift-tests binary, and all test packages that depend on them, this single missing import cascaded into failures across every CI job:

Job Failure Mode
ci/prow/images hypershift-operator-amd64 and hypershift-tests-amd64 image builds failed
ci/prow/e2e-azure-self-managed Build phase failed before tests could run
ci/prow/e2e-aks Build phase failed before tests could run
ci/prow/security Build phase failed / job terminated
lint / Lint typecheck linter reported undefined: netutil
verify / Verify staticcheck failed with compilation error
Unit Tests (hypershift-operator) 3 test shards reported [build failed]
Konflux / hypershift-operator-main Container image build failed
Konflux / hypershift-release-mce-50 Container image build failed

The fix was a one-line addition of the import statement, included in the force-pushed commit set now at PR HEAD 189a75a9b52f.

Recommendations
  1. No action required — the author has already fixed the issue by force-pushing a corrected commit set. The re-triggered CI jobs (security, verify-deps, okd-scos-images) are already passing, and the remaining jobs (images, e2e-azure, e2e-aks) are pending completion.

  2. Prevention: Consider adding a local pre-push check (e.g., go build ./...) to catch compilation errors before pushing. The goimports tool would have also caught the missing import automatically.

  3. Review note: When reviewing large refactoring PRs that change import dependencies across many files (this PR touches 92 files), verify that every file using a new package has the corresponding import statement — especially in files where only the function call changed but not the import block.

Evidence
Evidence Detail
Error location hypershift-operator/controllers/hostedcluster/network_policies.go:82:5
Error message undefined: netutil — Go compiler cannot resolve the netutil identifier
Offending commit eb9d9f95 — changed sharedingress.UseSharedIngress()netutil.UseSharedIngressHC(hcluster) without adding import
Failed SHA 81c25e6ff315682002ee5743df7283675bd1a2a9 (PR HEAD at time of failed runs)
Fixed SHA 189a75a9b52fa6b6cd0417d7665e58588c3a3b4f (current PR HEAD with import added)
Fix diff + "github.com/openshift/hypershift/support/netutil" added to import block at line 18
images build-log make: *** [Makefile:190: hypershift-operator] Error 1 and make: *** [Makefile:477: reqserving-e2e] Error 1
lint log hypershift-operator/controllers/hostedcluster/network_policies.go:82:5: undefined: netutil (typecheck)
verify log hypershift-operator/controllers/hostedcluster/network_policies.go:82:5: undefined: netutil (compile)
unit test log FAIL github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster [build failed]
New security job Build ID 2062879479046344704 — status: success
Jobs affected All 9 jobs failed due to same single compilation error

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@zgalor
Copy link
Copy Markdown

zgalor commented Jun 7, 2026

Testing Report — Swift Private Type

Tested by: zgalor
Date: 2026-06-07
Environment: ARO-HCP personal dev env (pers-usw3zgal-mgmt-1, westus3)

Setup

  • Built HyperShift operator image from PR commit 189a75a9 and deployed to management cluster
  • Updated HyperShift CRDs (HostedCluster, HostedControlPlane) from PR branch to include Swift enum and AzureSwiftSpec struct
  • Installed secret-sync-controller (CRD + controller) — was missing from personal dev env
  • Updated CS code to set Private.Type=Swift with swift.podNetworkInstance field

Test Clusters

Cluster api.listening Topology Private.Type Swift.PodNetworkInstance
zg-public-2 external PublicAndPrivate Swift pni-2qpf83u3mimal92n21fo0qqodmiijn8p
zg-private-2 internal Private Swift pni-2qpf83tlu4i60c8pn91b5mbp5dql6jqe

Verified

  1. CRD validation accepts Swift type — HostedCluster CRs with private.type: Swift and swift.podNetworkInstance pass CRD validation and are applied successfully.

  2. Topology mapping is correctapi.listening=external maps to PublicAndPrivate, api.listening=internal maps to Private.

  3. Union discriminator enforced — CRD correctly requires swift sub-field when type=Swift and rejects it without. Error: swift is required when type is Swift, and forbidden otherwise.

  4. Control planes fully bootstrapped — Both clusters progressed through etcd, KAS (3 replicas), kube-controller-manager, openshift-apiserver, etc. Public cluster reached Available: True.

  5. KAS visibility enforced correctly:

    • Public cluster (PublicAndPrivate): KAS reachable via public endpoint — curl https://api-int.zg-public-2.../healthz returns HTTP 200 ok
    • Private cluster (Private): KAS NOT reachable via public endpoint — curl https://api-int.zg-private-2.../healthz returns connection failure (HTTP 000)
    • Both DNS names resolve to the same shared ingress IP, confirming the routing differentiation happens at the ingress layer, not DNS.

Suspected Bug: Konnectivity Route Hostname Not Auto-Populated

Symptom: Both clusters were stuck on InfrastructureReady: False with message "Cluster infrastructure is still provisioning" indefinitely.

Root cause: The konnectivity-server route in the HCP namespace was created without a spec.host. The CPO's infrastructure readiness check (reconcileKonnectivityServiceStatus) reads this route and returns empty host/port, causing InfrastructureStatus.IsReady() to return false. This blocked the entire control plane bootstrapping (etcd, KAS, etc.).

Context: For Swift-enabled clusters, the Cluster Service intentionally omits the Konnectivity hostname from the HostedCluster's spec.services[].route.hostname, expecting HyperShift to auto-populate it. The CPO's reconcileKonnectivityServerService calls ReconcileKonnectivityInternalRouteReconcileInternalRoute, which should set route.Spec.Host = "konnectivity-server.apps.<cluster>.hypershift.local" when the host is empty. The route was updated by the CPO (annotations were added) but the host was NOT set, suggesting the createOrUpdate mutation is not applying the host change.

Workaround: Manually patched the konnectivity route:

kubectl patch route konnectivity-server -n <hcp-namespace> \
  --type=merge -p '{"spec":{"host":"konnectivity-server.apps.<cluster-name>.hypershift.local"}}'

After patching, InfrastructureReady immediately became True and control planes bootstrapped within ~2 minutes.

Same pattern for Ignition route: The ignition-server route also has no hostname for Swift clusters. This didn't block infrastructure readiness but may cause issues later.

Recommendation: Investigate why the CPO's ReconcileInternalRoute is not setting route.Spec.Host despite the code path appearing correct. This may be a server-side apply or createOrUpdate diffing issue where the empty-to-populated host change is not detected.

Additional Setup Notes

  • The HyperShift CRDs need to be applied from the PR branch — deploying only the operator image is not sufficient since the old CRDs don't include the Swift enum.
  • The SecretSync CRD (secret-sync.x-k8s.io/v1alpha1) must be installed for ManifestWork resources to apply successfully.
  • After installing new CRDs, the klusterlet work agent may need a pod restart to re-discover the new API resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/azure PR/issue for Azure (AzurePlatform) platform jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants