fix: OCPBUGS-86693: add machineNetwork CIDRs to KAS NO_PROXY to prevent x509 errors#8628
fix: OCPBUGS-86693: add machineNetwork CIDRs to KAS NO_PROXY to prevent x509 errors#8628dpateriya wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
📝 WalkthroughWalkthroughThe kube-apiserver deployment configuration now extends the no-proxy exemption list to include machine network CIDRs in addition to cluster and service CIDRs. This change prevents proxy-based routing of management traffic through konnectivity, which would cause TLS certificate validation failures. The Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dpateriya The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go`:
- Line 192: Add a unit test that exercises updateMainContainer so NO_PROXY
includes the machineNetwork CIDRs: instantiate an HCP with
Spec.Networking.MachineNetwork set to known CIDRs, call the code path that runs
updateMainContainer (or extract and call the helper that computes
additionalNoProxyCIDRS), and assert that the env var produced via
proxy.SetEnvVars contains the expected
netutil.MachineCIDRs(hcp.Spec.Networking.MachineNetwork) values; reference
updateMainContainer, netutil.MachineCIDRs and proxy.SetEnvVars when locating
code and add the test to the existing deployment_test.go (or a new focused test
file) similar to support/proxy/proxy_test.go so the NO_PROXY propagation from
MachineNetwork is covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: fe1c654d-7425-4506-8b76-6a59b45737b7
📒 Files selected for processing (1)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8628 +/- ##
==========================================
- Coverage 45.84% 41.79% -4.05%
==========================================
Files 440 252 -188
Lines 52824 16414 -36410
==========================================
- Hits 24218 6861 -17357
+ Misses 26816 9061 -17755
+ Partials 1790 492 -1298
... and 188 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
407d826 to
ef9280f
Compare
|
@dpateriya: This pull request references Jira Issue OCPBUGS-86693, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Actionable comments posted: 0 |
|
@dpateriya: This pull request references Jira Issue OCPBUGS-86693, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@dpateriya: This pull request references Jira Issue OCPBUGS-86693, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ef9280f to
5aa05e6
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go (1)
15-15: ⚡ Quick winUse Gherkin-style naming for the new test cases.
Please rename these tests to follow the required “When ... it should ...” pattern (or wrap assertions in
t.Runcases with that phrasing), so they match repo test conventions.💡 Example update
-func TestUpdateMainContainerNoProxy(t *testing.T) { +func TestUpdateMainContainer_WhenProxyConfiguredWithMachineNetwork_ItShouldIncludeClusterServiceMachineAndKubeAPIServerInNoProxy(t *testing.T) { ... -func TestUpdateMainContainerNoProxyWithoutMachineNetwork(t *testing.T) { +func TestUpdateMainContainer_WhenProxyConfiguredWithoutMachineNetwork_ItShouldIncludeClusterAndServiceInNoProxy(t *testing.T) {As per coding guidelines, "
**/*_test.go: Always use 'When ... it should ...' format for describing test cases when creating unit tests" and "Prefer Gherkin Syntax to define unit test cases (e.g., 'When... it should...')".Also applies to: 58-58
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go` at line 15, Rename the top-level test functions to use Gherkin-style descriptions (e.g., "When ... it should ...") or wrap existing assertions in t.Run subtests using that phrasing; specifically change TestUpdateMainContainerNoProxy and the other test in the file (the one around the Test* at the other noted location) to names like TestWhenUpdatingMainContainerWithoutProxyItShould... or use t.Run("When updating main container without proxy it should ...", func(t *testing.T) { ... }) inside the existing Test* wrappers so the function/ subtest labels follow the "When ... it should ..." convention and match repo test naming guidelines while leaving the test logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go`:
- Line 15: Rename the top-level test functions to use Gherkin-style descriptions
(e.g., "When ... it should ...") or wrap existing assertions in t.Run subtests
using that phrasing; specifically change TestUpdateMainContainerNoProxy and the
other test in the file (the one around the Test* at the other noted location) to
names like TestWhenUpdatingMainContainerWithoutProxyItShould... or use
t.Run("When updating main container without proxy it should ...", func(t
*testing.T) { ... }) inside the existing Test* wrappers so the function/ subtest
labels follow the "When ... it should ..." convention and match repo test naming
guidelines while leaving the test logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 20d74fd2-993f-4c15-8d84-fb4f84cccd34
📒 Files selected for processing (2)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
There was a problem hiding this comment.
Suggestion: Add a dual-stack test case. MachineNetwork supports up to 2 entries. A test with both IPv4 and IPv6 CIDRs (e.g., 192.168.1.0/24 + fd00::/48) would validate the iteration works correctly for dual-stack clusters.
…nt x509 errors When the management cluster uses an HTTP proxy, the kube-apiserver (KAS) container in the hosted control plane inherits the management cluster proxy environment. The NO_PROXY list for KAS already includes the guest cluster's pod network (clusterNetwork) and service network (serviceNetwork) CIDRs, but omits the machine network (machineNetwork) CIDRs. Because KAS connects to kubelets on node IPs (in the machineNetwork range) via the konnectivity tunnel for operations like oc logs, oc exec, and oc attach, these HTTPS connections are routed through the management cluster's HTTP proxy. The proxy performs TLS interception, presenting its own certificate which is not trusted by KAS, resulting in: x509: certificate signed by unknown authority Adding the machineNetwork CIDRs to the NO_PROXY list ensures these connections bypass the proxy and go directly through konnectivity. Co-authored-by: Cursor <cursoragent@cursor.com>
5aa05e6 to
de5fab1
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go (1)
15-91: ⚡ Quick winConsider adding a test case for when proxy environment variables are not set.
The current test cases all set
HTTP_PROXYandHTTPS_PROXY. Adding a test case where these are not set would validate thatupdateMainContainerdoesn't addNO_PROXYwhen proxy is not configured, ensuring no regression in non-proxy environments.✨ Suggested test case
+ { + name: "When proxy is not configured it should not set NO_PROXY", + clusterNetwork: []hyperv1.ClusterNetworkEntry{{CIDR: *ipnet.MustParseCIDR("10.128.0.0/14")}}, + serviceNetwork: []hyperv1.ServiceNetworkEntry{{CIDR: *ipnet.MustParseCIDR("172.30.0.0/16")}}, + machineNetwork: []hyperv1.MachineNetworkEntry{{CIDR: *ipnet.MustParseCIDR("192.168.1.0/24")}}, + expectedCIDRs: []string{}, + },And update the test logic to handle this case:
for _, tc := range testCases { t.Run(tc.name, func(t *testing.T) { - t.Setenv("HTTP_PROXY", "http://proxy.example.com:3128") - t.Setenv("HTTPS_PROXY", "http://proxy.example.com:3128") - t.Setenv("NO_PROXY", "") + if len(tc.expectedCIDRs) > 0 { + t.Setenv("HTTP_PROXY", "http://proxy.example.com:3128") + t.Setenv("HTTPS_PROXY", "http://proxy.example.com:3128") + t.Setenv("NO_PROXY", "") + } // ... rest of test g := NewWithT(t) var noProxyValue string for _, env := range podSpec.Containers[0].Env { if env.Name == "NO_PROXY" { noProxyValue = env.Value break } } - g.Expect(noProxyValue).ToNot(BeEmpty(), "NO_PROXY env var should be set when proxy is configured") + if len(tc.expectedCIDRs) > 0 { + g.Expect(noProxyValue).ToNot(BeEmpty(), "NO_PROXY env var should be set when proxy is configured") + } else { + g.Expect(noProxyValue).To(BeEmpty(), "NO_PROXY env var should not be set when proxy is not configured") + } for _, cidr := range tc.expectedCIDRs { g.Expect(noProxyValue).To(ContainSubstring(cidr), "NO_PROXY should contain %s", cidr) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go` around lines 15 - 91, Add a new sub-test inside TestUpdateMainContainerNoProxy that leaves HTTP_PROXY and HTTPS_PROXY unset (do not call t.Setenv for them) and sets NO_PROXY to "" or leaves it unset, construct the same hcp/podSpec used by the other cases, call updateMainContainer(podSpec, hcp) and assert that the NO_PROXY env var is not added to podSpec.Containers[0].Env (i.e., iterate the Env for name "NO_PROXY" and Expect it to be empty/not found). This ensures updateMainContainer (and the ComponentName container) does not inject NO_PROXY when no proxy is configured.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go`:
- Around line 15-91: Add a new sub-test inside TestUpdateMainContainerNoProxy
that leaves HTTP_PROXY and HTTPS_PROXY unset (do not call t.Setenv for them) and
sets NO_PROXY to "" or leaves it unset, construct the same hcp/podSpec used by
the other cases, call updateMainContainer(podSpec, hcp) and assert that the
NO_PROXY env var is not added to podSpec.Containers[0].Env (i.e., iterate the
Env for name "NO_PROXY" and Expect it to be empty/not found). This ensures
updateMainContainer (and the ComponentName container) does not inject NO_PROXY
when no proxy is configured.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 15019770-7dfb-4b60-aa64-be4fa1080794
📒 Files selected for processing (2)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
|
@dpateriya: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Hi @bryan-cox, I guess the The only remaining option for customer is to have management cluster no_proxy updated with the secondary machine network. Anything we can do, like some sort of an RFE to get this |
|
As I deployed HCP AWS and I see the hosted cluster got machine network populated here: So this PR can be valid for cloud deployment of HCP @bryan-cox |
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryBoth failures are caused by a CI infrastructure issue — the imagestreams Root CauseThe root cause is a CI infrastructure issue on the
Both imagestreams were available as recently as May 30 (PR #8643's The Recommendations
Evidence
|
@dpateriya - yeah you're welcome to open an RFE for this. Since this PR doesn't fix the kubevirt thing and it needs an RFE, I think we should close the bug ticket out and this PR. |
|
Thanks @bryan-cox. Closing this in favor of RFE. |
|
@dpateriya: This pull request references Jira Issue OCPBUGS-86693. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary
When the management cluster uses an HTTP proxy, the KAS container in the hosted control plane inherits the management cluster's proxy environment via
proxy.SetEnvVars()in the control-plane-operator. TheNO_PROXYlist for KAS already includes the guest cluster'sclusterNetwork(pod) andserviceNetworkCIDRs, but omits themachineNetworkCIDRs.KAS connects to kubelets on node IPs (which fall in the
machineNetworkrange) via the konnectivity tunnel for subresource operations (oc logs,oc exec,oc attach). Without themachineNetworkCIDRs inNO_PROXY, these HTTPS connections are routed through the management cluster's HTTP proxy. The proxy performs TLS interception, presenting its own certificate which KAS does not trust, resulting in:Root Cause
In
control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go, theadditionalNoProxyCIDRSslice only includesclusterNetworkandserviceNetwork, but notmachineNetwork:Fix
Add
machineNetworkCIDRs to the KASNO_PROXYlist:How to Reproduce
machineNetworkis NOT included in the management cluster'sNO_PROXYoc logs <any-pod>from the hosted clusterx509: certificate signed by unknown authorityerrorCustomer Validation
A customer confirmed this fix resolves the issue — after manually adding the guest cluster's
machineNetwork(along withclusterNetworkandserviceNetwork) to the management cluster'sNO_PROXY, theoc logsx509 error was immediately resolved. This PR automates that by ensuring the control-plane-operator always includesmachineNetworkin KAS'sNO_PROXY.Test Plan
go test ./control-plane-operator/controllers/hostedcontrolplane/v2/kas/...)oc logs,oc exec,oc attachwork without manually addingmachineNetworkto management clusterNO_PROXYMade with Cursor
Summary by CodeRabbit