Skip to content

ROSAENG-8224: feat(operator): use MC machine network CIDR to block HCP egress to management KAS#8689

Draft
Ajpantuso wants to merge 1 commit into
openshift:mainfrom
Ajpantuso:apantuso/rosaeng-8224
Draft

ROSAENG-8224: feat(operator): use MC machine network CIDR to block HCP egress to management KAS#8689
Ajpantuso wants to merge 1 commit into
openshift:mainfrom
Ajpantuso:apantuso/rosaeng-8224

Conversation

@Ajpantuso
Copy link
Copy Markdown

@Ajpantuso Ajpantuso commented Jun 5, 2026

What this PR does / why we need it:

Adds a --use-mc-machine-network-for-kas-block flag to the HyperShift Operator.

When enabled, the operator reads the management cluster's machine network
CIDR(s) from kube-system/cluster-config-v1 (the install-config key) and
uses them as the Except entries in the egress IPBlock rules of the
private-router and management-kas NetworkPolicies, instead of the
per-pod /32 CIDRs derived from the default/kubernetes Endpoints object.

If the ConfigMap is absent or unparseable, the operator falls back to the
existing kasEndpointsToCIDRs() behavior and logs a warning.

The flag is wired end-to-end: hypershift install --use-mc-machine-network-for-kas-block
sets it on the operator Deployment.

This PR will remain in draft until live testing is completed to ensure no regressions.

Which issue(s) this PR fixes:

Fixes ROSAENG-8224

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

    • Added a new --use-mc-machine-network-for-kas-block flag (disabled by default) to allow using management-cluster machine-network CIDRs for KAS network policy egress exception lists; when enabled, KAS and private-router egress exceptions prefer machine-network CIDRs.
  • Tests

    • Added tests validating CIDR selection behavior for KAS egress exceptions with the new flag.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 5, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jun 5, 2026

@Ajpantuso: This pull request references ROSAENG-8224 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds a --use-mc-machine-network-for-kas-block flag to the HyperShift Operator.

When enabled, the operator reads the management cluster's machine network
CIDR(s) from kube-system/cluster-config-v1 (the install-config key) and
uses them as the Except entries in the egress IPBlock rules of the
private-router and management-kas NetworkPolicies, instead of the
per-pod /32 CIDRs derived from the default/kubernetes Endpoints object.

If the ConfigMap is absent or unparseable, the operator falls back to the
existing kasEndpointsToCIDRs() behavior and logs a warning.

The flag is wired end-to-end: hypershift install --use-mc-machine-network-for-kas-block
sets it on the operator Deployment.

This PR will remain in draft until live testing is completed to ensure no regressions.

Which issue(s) this PR fixes:

Fixes ROSAENG-8224

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 95a5dcf3-6fd4-4e58-82fc-ff15a854db62

📥 Commits

Reviewing files that changed from the base of the PR and between 3b45a90 and 2827d9d.

📒 Files selected for processing (7)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/assets/hypershift_operator_test.go
  • cmd/install/install.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
  • hypershift-operator/controllers/hostedcluster/network_policies_test.go
  • hypershift-operator/main.go
✅ Files skipped from review due to trivial changes (2)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • cmd/install/assets/hypershift_operator_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • hypershift-operator/main.go
  • hypershift-operator/controllers/hostedcluster/network_policies_test.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go

📝 Walkthrough

Walkthrough

This pull request adds a new UseMCMachineNetworkForKASBlock option wired from installer CLI and operator startup into HostedClusterReconciler. The reconciler now precomputes a kasBlock []string per reconcile: when enabled it reads kube-system/cluster-config-v1 install-config to return management cluster machineNetwork CIDRs, otherwise it falls back to /32 CIDRs derived from KAS endpoints. NetworkPolicy builders and tests were updated to accept and consume the kasBlock exceptions list instead of Endpoints objects.

Sequence Diagram(s)

sequenceDiagram
  participant InstallerCLI
  participant HyperShiftOperatorDeployment
  participant HostedClusterReconciler
  participant ManagementConfigMap
  participant KubernetesEndpoints
  participant NetworkPolicy
  InstallerCLI->>HyperShiftOperatorDeployment: set --use-mc-machine-network-for-kas-block=%t
  HyperShiftOperatorDeployment->>HostedClusterReconciler: operator receives flag
  HostedClusterReconciler->>ManagementConfigMap: GET kube-system/cluster-config-v1 (if flag)
  ManagementConfigMap->>HostedClusterReconciler: return install-config YAML (machineNetwork CIDRs)
  HostedClusterReconciler->>KubernetesEndpoints: GET Endpoints (fallback)
  KubernetesEndpoints->>HostedClusterReconciler: return endpoint IPs
  HostedClusterReconciler->>NetworkPolicy: reconcile with IPBlock.Except using machineNetwork CIDRs or /32 endpoint CIDRs + pod CIDRs
Loading

Suggested reviewers

  • jparrill
  • sjenning
🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: introducing a new flag to use management cluster machine network CIDRs for KAS egress blocking instead of individual pod IPs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All new tests use hardcoded static test names via struct fields with t.Run(tc.name). No dynamic values, timestamps, UUIDs, or generated identifiers present in test names.
Test Structure And Quality ✅ Passed Tests use table-driven pattern with meaningful assertion messages matching codebase conventions. Setup and teardown follow existing patterns with no resource cleanup issues.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds configuration flags and network policy logic without introducing topology-unaware scheduling constraints (no required anti-affinity, topology spread constraints, or control-plane selectors).
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. All changes consist of Go unit tests using testing.T framework and production code changes. The custom check is not applicable.
No-Weak-Crypto ✅ Passed PR adds no weak crypto algorithms, custom crypto implementations, or timing-vulnerable comparisons; uses only standard Kubernetes/Go libraries for YAML parsing.
Container-Privileges ✅ Passed PR adds only UseMCMachineNetworkForKASBlock flag for network policy; no privileged, hostPID/Network/IPC, SYS_ADMIN, or allowPrivilegeEscalation changes found.
No-Sensitive-Data-In-Logs ✅ Passed PR introduces no sensitive data logging. Error handling wraps YAML parsing errors safely, only extracts machine network CIDRs, and never logs raw install-config content or other sensitive data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the area/cli Indicates the PR includes changes for CLI label Jun 5, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Ajpantuso
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jun 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/network_policies_test.go (1)

1697-1707: ⚡ Quick win

Assert management pod CIDR is still included in Except.

Line 1701 validates the feature-toggle CIDRs, but it doesn’t confirm the existing management cluster pod CIDR remains present. Add one assertion so this test also catches regressions where new KAS block CIDRs replace (instead of extend) existing exceptions.

Suggested test hardening
 			exceptCIDRs := collectExceptCIDRs(privateRouterPolicy)
+			g.Expect(exceptCIDRs).To(ContainElement("10.128.0.0/14"), "private-router Except list should retain management cluster pod CIDR")
 			for _, cidr := range tc.expectedExceptCIDRs {
 				g.Expect(exceptCIDRs).To(ContainElement(cidr), "private-router Except list should contain %s", cidr)
 			}

Based on learnings: "Unit test any code changes and additions and include e2e tests when changes impact consumer behaviour."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/network_policies_test.go`
around lines 1697 - 1707, The test currently checks feature-toggle CIDRs but
doesn't assert the management cluster pod CIDR is still present; after
collecting exceptCIDRs (from collectExceptCIDRs(privateRouterPolicy)) add an
assertion that the known management pod CIDR (e.g., managementPodCIDR or the
variable used elsewhere in this test file) is contained in exceptCIDRs using
g.Expect(exceptCIDRs).To(ContainElement(managementPodCIDR)) to catch regressions
where exceptions get replaced instead of extended; place this assertion near the
existing checks for tc.expectedExceptCIDRs and tc.expectCIDRNotPresent so it
runs for each test case referencing createdNetworkPolicies["private-router"] and
privateRouterPolicy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/network_policies_test.go`:
- Around line 1697-1707: The test currently checks feature-toggle CIDRs but
doesn't assert the management cluster pod CIDR is still present; after
collecting exceptCIDRs (from collectExceptCIDRs(privateRouterPolicy)) add an
assertion that the known management pod CIDR (e.g., managementPodCIDR or the
variable used elsewhere in this test file) is contained in exceptCIDRs using
g.Expect(exceptCIDRs).To(ContainElement(managementPodCIDR)) to catch regressions
where exceptions get replaced instead of extended; place this assertion near the
existing checks for tc.expectedExceptCIDRs and tc.expectCIDRNotPresent so it
runs for each test case referencing createdNetworkPolicies["private-router"] and
privateRouterPolicy.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: eadd8666-ca8d-475c-8303-c5739a6cf3c8

📥 Commits

Reviewing files that changed from the base of the PR and between f13c62d and 4e64951.

📒 Files selected for processing (6)
  • cmd/install/assets/hypershift_operator.go
  • cmd/install/install.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/network_policies.go
  • hypershift-operator/controllers/hostedcluster/network_policies_test.go
  • hypershift-operator/main.go

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 5, 2026

Codecov Report

❌ Patch coverage is 86.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.45%. Comparing base (f13c62d) to head (2827d9d).

Files with missing lines Patch % Lines
...ator/controllers/hostedcluster/network_policies.go 88.88% 2 Missing and 3 partials ⚠️
hypershift-operator/main.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8689      +/-   ##
==========================================
+ Coverage   41.43%   41.45%   +0.01%     
==========================================
  Files         756      756              
  Lines       93647    93687      +40     
==========================================
+ Hits        38802    38837      +35     
- Misses      52124    52128       +4     
- Partials     2721     2722       +1     
Files with missing lines Coverage Δ
cmd/install/assets/hypershift_operator.go 48.13% <100.00%> (+0.02%) ⬆️
cmd/install/install.go 63.13% <100.00%> (+0.06%) ⬆️
...trollers/hostedcluster/hostedcluster_controller.go 45.42% <ø> (ø)
hypershift-operator/main.go 0.00% <0.00%> (ø)
...ator/controllers/hostedcluster/network_policies.go 77.68% <88.88%> (+0.49%) ⬆️
Flag Coverage Δ
cmd-support 34.88% <100.00%> (+<0.01%) ⬆️
cpo-hostedcontrolplane 43.50% <ø> (ø)
cpo-other 42.74% <ø> (ø)
hypershift-operator 51.62% <85.10%> (+0.05%) ⬆️
other 31.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Ajpantuso Ajpantuso force-pushed the apantuso/rosaeng-8224 branch 2 times, most recently from 3b45a90 to 8df0806 Compare June 5, 2026 21:10
@openshift-ci openshift-ci Bot added area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing labels Jun 5, 2026
@Ajpantuso Ajpantuso force-pushed the apantuso/rosaeng-8224 branch from 8df0806 to 2827d9d Compare June 5, 2026 21:11
@hypershift-jira-solve-ci
Copy link
Copy Markdown

I now have the complete picture. Here is the analysis:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

3: B6 Body message is missing
make: *** [Makefile:614: run-gitlint] Error 1
Process completed with exit code 2.

Summary

The gitlint CI check failed because the commit 2827d9d has no commit body — it only contains a title line (feat: optionally use OCP install configuration to block hostnetwork for HCP egress). The repository's gitlint configuration enforces built-in rule B6 (body-is-missing), which requires every commit to include a descriptive body below the title. The 3: prefix in the error indicates line 3 was expected to contain body text but was empty.

Root Cause

The commit message for SHA 2827d9d20517872a4d70684ac6f3e3eb16349b99 consists of only a single title line:

feat: optionally use OCP install configuration to block hostnetwork for HCP egress

There is no body text following the title. Gitlint rule B6 (body-is-missing) requires that every commit (except merge commits, which are ignored by default) includes a body message — i.e., descriptive text separated from the title by a blank line. The repository's .gitlint configuration does not disable B6, so it is enforced by default.

The error 3: B6 Body message is missing means gitlint expected body content starting at line 3 (line 1 = title, line 2 = blank separator, line 3 = body start) but found nothing.

Notably, the PR description on GitHub does contain a well-written explanation of the change, but this text was not included in the git commit message body itself.

Recommendations

Amend the commit to add a body describing what the change does and why. For example:

git commit --amend

Then edit the message to include a body after the title, separated by a blank line:

feat: optionally use OCP install configuration to block hostnetwork for HCP egress

Adds a --use-mc-machine-network-for-kas-block flag to the HyperShift Operator.
When enabled, the operator reads the management cluster's machine network
CIDR(s) from kube-system/cluster-config-v1 and uses them as the Except entries
in the egress IPBlock rules of the private-router and management-kas
NetworkPolicies, instead of the per-pod /32 CIDRs derived from the
default/kubernetes Endpoints object.

After amending, force-push the branch to re-trigger CI:

git push --force-with-lease
Evidence
Evidence Detail
Gitlint error 3: B6 Body message is missing
Exit code 2 (gitlint lint violations found)
Commit SHA 2827d9d20517872a4d70684ac6f3e3eb16349b99
Commit message (full) feat: optionally use OCP install configuration to block hostnetwork for HCP egress (title only, no body)
Gitlint rule B6 body-is-missing — requires a commit body below the title line
.gitlint config Uses contrib-title-conventional-commits, title max 120 chars, body max line 140 chars; B6 is not disabled
CI command make run-gitlint — lints commits from f13c62d7 to 2827d9d2
Makefile target Line 614: run-gitlint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli Indicates the PR includes changes for CLI area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants