OCPBUGS-84269: fix(nodepool): skip CNI-internal IPs in ClusterNetworkCIDRConflict check#8230
OCPBUGS-84269: fix(nodepool): skip CNI-internal IPs in ClusterNetworkCIDRConflict check#8230amasolov wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@amasolov: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe NodePool controller's Sequence Diagram(s)sequenceDiagram
participant Controller as NodePoolController
participant NP as NodePool (status)
participant M as Machine (addresses)
participant CN as ClusterNetwork (CIDRs)
Controller->>CN: Parse all ClusterNetwork CIDR prefixes
Controller->>M: Read machine addresses
M-->>Controller: Return deduplicated, filtered addresses (exclude link-local/multicast)
Controller->>Controller: For each address, check membership against parsed CIDRs
alt All non-link-local addresses are within ClusterNetwork CIDRs
Controller->>NP: Set NodePoolClusterNetworkCIDRConflictType = True (with per-address messages)
else Some addresses exist outside ClusterNetwork CIDRs
Controller->>NP: Do not emit in-CIDR collision messages
end
alt No collision messages collected
Controller->>NP: Remove NodePoolClusterNetworkCIDRConflictType condition
end
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @amasolov. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller_test.go (1)
1288-1393: Please add a dual-stack regression case here.These additions still run with a single
ClusterNetworkentry, so they would pass even ifsetCIDRConflictConditionmisclassified an address fromClusterNetwork[1]as "outside" and hid a real collision. A two-entry cluster-network fixture would lock down the CAPK dual-stack scenario this PR is targeting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go` around lines 1288 - 1393, Add a regression test case that uses a dual-stack ClusterNetwork (two entries: one IPv4 CIDR and one IPv6 CIDR) so the table-driven tests exercise setCIDRConflictCondition against ClusterNetwork[0] and ClusterNetwork[1]; locate the table entries near the existing cases ("no cidr collision when ...") in nodepool_controller_test.go, add a new case whose machinesGenerator produces addresses that fall into both the IPv4 and IPv6 cluster CIDRs (and some outside) and assert expectedCIDRCollision and expectedAllMachine/expectedAllNodes accordingly; ensure the test setup that builds the ClusterNetwork fixture used by the reconciler (the same variable used by the surrounding tests) is updated to include two ClusterNetwork entries so the new case validates dual-stack behavior and prevents misclassification between ClusterNetwork[0] and ClusterNetwork[1].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/nodepool/conditions.go`:
- Around line 748-752: The code currently sets hasAddressOutsideClusterNetwork
by testing addr.Address only against hc.Spec.Networking.ClusterNetwork[0], which
fails on dual-stack clusters; update the logic in the function that builds
conflictingIPs and sets hasAddressOutsideClusterNetwork so that for each
non-link-local addr.Address you iterate all entries in
hc.Spec.Networking.ClusterNetwork, parse each CIDR and test Contains(addr.IP),
and only set hasAddressOutsideClusterNetwork = true when none of the
cluster-network CIDRs contain the address; apply the same multi-CIDR containment
check to the other identical block referenced around the
hasAddressOutsideClusterNetwork/conflictingIPs handling so both places use all
ClusterNetwork CIDRs rather than index 0.
---
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go`:
- Around line 1288-1393: Add a regression test case that uses a dual-stack
ClusterNetwork (two entries: one IPv4 CIDR and one IPv6 CIDR) so the
table-driven tests exercise setCIDRConflictCondition against ClusterNetwork[0]
and ClusterNetwork[1]; locate the table entries near the existing cases ("no
cidr collision when ...") in nodepool_controller_test.go, add a new case whose
machinesGenerator produces addresses that fall into both the IPv4 and IPv6
cluster CIDRs (and some outside) and assert expectedCIDRCollision and
expectedAllMachine/expectedAllNodes accordingly; ensure the test setup that
builds the ClusterNetwork fixture used by the reconciler (the same variable used
by the surrounding tests) is updated to include two ClusterNetwork entries so
the new case validates dual-stack behavior and prevents misclassification
between ClusterNetwork[0] and ClusterNetwork[1].
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 7d09322d-a8cc-4d0b-a128-5fd42c1590b8
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/nodepool_controller_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller_test.go (1)
1341-1393: Test case name could be more precise.The test name says "only link-local and CNI-internal addresses are present" but the machine actually includes
192.168.1.10which is neither link-local nor CNI-internal—it's a valid infrastructure IP outside the cluster network. The test logic is correct, but the name is misleading.Consider renaming to something like:
"no cidr collision when machine has out-of-cluster address alongside link-local and in-cluster addresses"♻️ Suggested name change
- name: "no cidr collision when only link-local and CNI-internal addresses are present", + name: "no cidr collision when machine has out-of-cluster address alongside link-local and in-cluster addresses",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go` around lines 1341 - 1393, The test case name string is misleading (it claims "only link-local and CNI-internal addresses" while the Machine includes 192.168.1.10); update the test case's name literal in the table-driven test to a precise description such as "no cidr collision when machine has out-of-cluster address alongside link-local and in-cluster addresses" so it matches the Machine created by the machinesGenerator and helps future readers locate the case (look for the test entry with the machinesGenerator creating a capiv1.Machine and expectedCIDRCollision nil).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go`:
- Around line 1341-1393: The test case name string is misleading (it claims
"only link-local and CNI-internal addresses" while the Machine includes
192.168.1.10); update the test case's name literal in the table-driven test to a
precise description such as "no cidr collision when machine has out-of-cluster
address alongside link-local and in-cluster addresses" so it matches the Machine
created by the machinesGenerator and helps future readers locate the case (look
for the test entry with the machinesGenerator creating a capiv1.Machine and
expectedCIDRCollision nil).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 3df4795b-9b85-47c0-bd04-fe164a8c1006
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/nodepool_controller_test.go
d48e033 to
6066c7d
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/nodepool_controller_test.go (1)
2398-2490: Consider asserting the conflict message content for dual-stack cases.The dual-stack test only checks
cond.Status == ConditionTruewhen a conflict is expected. A regression where only one family's CIDR is reported (e.g., the IPv6 prefix is silently dropped from the message, or the v4 CIDR is matched against a v6 address) would still pass. Since the whole point of this test is dual-stack correctness, it’s worth asserting the message mentions both the offending IP and the corresponding CIDR per case (e.g.,10.128.0.5with10.128.0.0/14, andfd01::5withfd01::/48for case 1;fd01::awithfd01::/48for case 3).🧪 Suggested tightening
if tc.expectConflict { gg.Expect(cond).ToNot(BeNil(), "expected CIDR conflict condition to be set") gg.Expect(cond.Status).To(Equal(corev1.ConditionTrue)) + for _, want := range tc.expectMessageContains { + gg.Expect(cond.Message).To(ContainSubstring(want)) + } } else {with an added
expectMessageContains []stringfield populated per case.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go` around lines 2398 - 2490, The test TestSetCIDRConflictDualStack currently only asserts that a CIDR conflict condition exists and is True; update the table-driven cases to include an expectMessageContains []string for each case and, after calling r.setCIDRConflictCondition and retrieving cond via FindStatusCondition(np.Status.Conditions, hyperv1.NodePoolClusterNetworkCIDRConflictType), when tc.expectConflict is true also assert that cond.Message contains each substring in tc.expectMessageContains (e.g., "10.128.0.5" and "10.128.0.0/14" and "fd01::5" and "fd01::/48" for the first case, and "fd01::a" and "fd01::/48" for the third); leave the no-conflict branch asserting cond is nil. This change keeps the focus on setCIDRConflictCondition and tightens dual-stack message verification without changing production code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go`:
- Around line 2398-2490: The test TestSetCIDRConflictDualStack currently only
asserts that a CIDR conflict condition exists and is True; update the
table-driven cases to include an expectMessageContains []string for each case
and, after calling r.setCIDRConflictCondition and retrieving cond via
FindStatusCondition(np.Status.Conditions,
hyperv1.NodePoolClusterNetworkCIDRConflictType), when tc.expectConflict is true
also assert that cond.Message contains each substring in
tc.expectMessageContains (e.g., "10.128.0.5" and "10.128.0.0/14" and "fd01::5"
and "fd01::/48" for the first case, and "fd01::a" and "fd01::/48" for the
third); leave the no-conflict branch asserting cond is nil. This change keeps
the focus on setCIDRConflictCondition and tightens dual-stack message
verification without changing production code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 375d5a36-3e24-4365-a6f2-ad6265b9cb5c
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/nodepool_controller_test.go
✅ Files skipped from review due to trivial changes (1)
- hypershift-operator/controllers/nodepool/conditions.go
6066c7d to
3d65d38
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller_test.go (1)
2398-2496: Dual-stack test is well-structured; consider one small addition.The three scenarios (all-in, mixed-with-outside, IPv6-only-in-second-CIDR) cover the multi-prefix matching and the suppression rule cleanly, and the assertions on message substrings (Lines 2426, 2465) correctly validate both the IP and the matched cluster-network CIDR are reported.
Optional: add a fourth case where the machine has an address in the IPv4 cluster CIDR and another IPv4 address outside both CIDRs (dual-stack configured but machine only reports IPv4). This would directly guard against regressions of the original false-positive scenario on single-family hosts against a dual-stack
ClusterNetwork.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go` around lines 2398 - 2496, Add a fourth table-driven case inside TestSetCIDRConflictDualStack to cover the single-family IPv4-only host against a dual-stack ClusterNetwork: in the test vector for NodePoolReconciler.setCIDRConflictCondition provide clusterNetwork entries for an IPv4 CIDR (e.g., 10.128.0.0/14) and an IPv6 CIDR, and a machines entry whose Status.Addresses contains two IPv4 addresses (one inside the IPv4 CIDR and one outside both CIDRs); set expectConflict to false and assert no CIDR conflict condition is set — this guards against false positives for IPv4-only machines when the cluster is dual-stack.hypershift-operator/controllers/nodepool/conditions.go (1)
750-847: Heuristic implementation looks correct; flag one robustness nit on dedup keys.The rewrite correctly:
- parses all
ClusterNetwork[*].CIDR(addresses the prior multi-CIDR review),- deduplicates addresses per machine,
- skips link-local unicast/multicast (covers
169.254.0.0/16andfe80::/10),- suppresses in-network "conflicts" when the machine also has a non-link-local address outside every cluster CIDR (the PR's intended heuristic for OVN-K management-port IPs on KubeVirt), and
- explicitly removes the condition when no conflicts remain.
Minor: the
seenmap is keyed by the rawaddr.Addressstring (Line 781), so two semantically identical IPv6 strings (e.g.,fd01::0001andfd01::1) would bypass dedup. Sinceipaddris already parsed a couple of lines below, keying onipaddr(or its canonicalString()) after parsing would be strictly more robust. In practice CAPK typically emits a single canonical form, so this is low impact.♻️ Optional: dedup on the parsed address
seen := make(map[string]struct{}) ... - if _, ok := seen[addr.Address]; ok { - continue - } - seen[addr.Address] = struct{}{} - ipaddr, err := netip.ParseAddr(addr.Address) if err != nil { return err } + key := ipaddr.String() + if _, ok := seen[key]; ok { + continue + } + seen[key] = struct{}{} if ipaddr.IsLinkLocalUnicast() || ipaddr.IsLinkLocalMulticast() { continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/conditions.go` around lines 750 - 847, Issue: the per-machine dedupe map in setCIDRConflictCondition currently keys by the raw addr.Address string so different textual IPv6 forms can bypass dedup. Fix: after parsing addr.Address into ipaddr (via netip.ParseAddr) use the canonical form ipaddr.String() as the key in the seen map (replace uses of seen[addr.Address] with seen[ipaddr.String()]) so deduplication is done on the parsed/canonical IP; update the seen map creation and both the presence check and insertion points around the parsing logic in setCIDRConflictCondition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/conditions.go`:
- Around line 750-847: Issue: the per-machine dedupe map in
setCIDRConflictCondition currently keys by the raw addr.Address string so
different textual IPv6 forms can bypass dedup. Fix: after parsing addr.Address
into ipaddr (via netip.ParseAddr) use the canonical form ipaddr.String() as the
key in the seen map (replace uses of seen[addr.Address] with
seen[ipaddr.String()]) so deduplication is done on the parsed/canonical IP;
update the seen map creation and both the presence check and insertion points
around the parsing logic in setCIDRConflictCondition.
In `@hypershift-operator/controllers/nodepool/nodepool_controller_test.go`:
- Around line 2398-2496: Add a fourth table-driven case inside
TestSetCIDRConflictDualStack to cover the single-family IPv4-only host against a
dual-stack ClusterNetwork: in the test vector for
NodePoolReconciler.setCIDRConflictCondition provide clusterNetwork entries for
an IPv4 CIDR (e.g., 10.128.0.0/14) and an IPv6 CIDR, and a machines entry whose
Status.Addresses contains two IPv4 addresses (one inside the IPv4 CIDR and one
outside both CIDRs); set expectConflict to false and assert no CIDR conflict
condition is set — this guards against false positives for IPv4-only machines
when the cluster is dual-stack.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 71fabf99-e8f6-40af-b502-75a634ade8f0
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/nodepool_controller_test.go
3d65d38 to
14079a4
Compare
|
Stale PRs are closed after 21d of inactivity. If this PR is still relevant, comment to refresh it or remove the stale label. If this PR is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
@amasolov: This pull request references Jira Issue OCPBUGS-84269, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
14079a4 to
cdd2396
Compare
|
/jira refresh |
|
@amasolov: This pull request references Jira Issue OCPBUGS-84269, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/ok-to-test |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8230 +/- ##
==========================================
+ Coverage 40.69% 40.72% +0.02%
==========================================
Files 755 755
Lines 93417 93450 +33
==========================================
+ Hits 38019 38055 +36
+ Misses 52659 52657 -2
+ Partials 2739 2738 -1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
bryan-cox
left a comment
There was a problem hiding this comment.
Review: CIDR Conflict False-Positive Fix
Overall this is a well-reasoned fix with a sound heuristic. The production code changes are correct and the dual-stack support is a good improvement over the original single-CIDR assumption. I have a few findings below — two blocking items around missing behavioral test coverage, plus several suggestions and nits.
Blocking (2):
- Missing test for condition clearing (the new
else removeStatusConditionpath) - Missing test for link-local + in-cluster only addresses (no outside address)
Suggestions (3):
- Add debug logging for suppressed CNI-internal addresses
- Add
t.Parallel()toTestSetCIDRConflictConditionDualStackand a multi-machine test case - Add a comment explaining why two test functions cover overlapping behavior
Nits (1):
- Move
conflictEntrytype declaration outside the loop
Questions (1):
- What is the expected behavior when a machine has only link-local addresses?
The test quality is good — proper Gherkin names, table-driven structure, gomega matchers, and good dual-stack coverage. Closing the two behavioral gaps above will make this solid.
| type conflictEntry struct { | ||
| ip string | ||
| cidr string | ||
| } |
There was a problem hiding this comment.
[nit] The conflictEntry type is declared inside the per-machine loop body, meaning Go re-declares the type on every iteration. While the compiler handles this fine, it's unusual and makes the type harder to find. Consider moving it above the loop (or to package level) for readability:
type conflictEntry struct {
ip string
cidr string
}
// ...
for _, machine := range machines {
var conflicting []conflictEntry
// ...There was a problem hiding this comment.
Done. Moved to cidrConflictEntry at function scope above the loop.
| if ipaddr.IsLinkLocalUnicast() || ipaddr.IsLinkLocalMulticast() { | ||
| continue | ||
| } |
There was a problem hiding this comment.
[question] What happens when a machine has zero qualifying addresses — i.e., all addresses are link-local? In that case hasAddressOutsideClusterNetwork stays false and conflicting is empty, so the !hasAddressOutsideClusterNetwork branch enters but appends nothing. The behavior is correct (no conflict message), but is a machine with only link-local addresses a valid state? If not, should we flag it differently (e.g., a warning)? If it is valid, please add a test case to document that expectation (see my test comment below).
There was a problem hiding this comment.
When a machine has only link-local addresses, both hasAddressOutsideClusterNetwork and conflicting remain empty, so no conflict is reported and no condition is set. This is correct because link-local addresses are not routable infrastructure IPs and should not influence the heuristic in either direction.
Added a test documenting this expectation: "When machine has only link-local addresses it should not report cidr collision"
| if !hasAddressOutsideClusterNetwork { | ||
| for _, entry := range conflicting { | ||
| messages = append(messages, fmt.Sprintf("machine [%s] with ip [%s] collides with cluster-network cidr [%s]", machine.Name, entry.ip, entry.cidr)) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
[suggestion] Consider emitting a structured log line at debug level when addresses are suppressed by the heuristic. Operators debugging networking issues on KubeVirt clusters might want to see that addresses were evaluated but deemed CNI-internal:
if hasAddressOutsideClusterNetwork && len(conflicting) > 0 {
log.Log.V(4).Info("Skipping CNI-internal addresses for CIDR conflict check",
"machine", machine.Name,
"suppressedCount", len(conflicting))
}This is optional but would help day-2 troubleshooting.
There was a problem hiding this comment.
Done. Added a log.V(4).Info call when addresses are suppressed:
} else if len(conflicting) > 0 {
log.V(4).Info("Skipping CNI-internal addresses for CIDR conflict check",
"machine", machine.Name,
"suppressedCount", len(conflicting))
}Also added ctx to the function signature to get a structured logger via ctrl.LoggerFrom(ctx).
| } else { | ||
| removeStatusCondition(&nodePool.Status.Conditions, hyperv1.NodePoolClusterNetworkCIDRConflictType) |
There was a problem hiding this comment.
[blocking] This else branch with removeStatusCondition is new behavior — previously the condition was never cleared once set. This is a great operational improvement, but no test verifies it.
Please add a test case that:
- Pre-populates a
NodePoolwith an existingClusterNetworkCIDRConflictTypecondition (Status=True) - Calls
setCIDRConflictConditionwith machines that have addresses outside the cluster network (so no conflict should be reported) - Asserts the condition is removed from
nodePool.Status.Conditions
Example name: "When previously-conflicting machine gains out-of-cluster address it should clear the cidr conflict condition"
Without this test, a regression that breaks condition clearing would go undetected.
There was a problem hiding this comment.
Done. Added test case: "When previously-conflicting machine gains out-of-cluster address it should clear the cidr conflict condition"
It pre-populates the NodePool with an existing ClusterNetworkCIDRConflictType=True condition, then calls setCIDRConflictCondition with a machine that has an outside address, and asserts the condition is removed.
| name: "When machine has out-of-cluster address alongside link-local and in-cluster addresses it should not report cidr collision", | ||
| machinesGenerator: func() []client.Object { | ||
| return []client.Object{ | ||
| &capiv1.Machine{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "node1", | ||
| Namespace: "myns-cluster-name", | ||
| Annotations: map[string]string{ | ||
| nodePoolAnnotation: "myns/np-name", | ||
| }, | ||
| }, | ||
| Status: capiv1.MachineStatus{ | ||
| Conditions: []capiv1.Condition{ | ||
| { | ||
| Type: capiv1.ReadyCondition, | ||
| Status: corev1.ConditionTrue, | ||
| }, | ||
| { | ||
| Type: capiv1.MachineNodeHealthyCondition, | ||
| Status: corev1.ConditionTrue, | ||
| }, | ||
| }, | ||
| Addresses: capiv1.MachineAddresses{ | ||
| { | ||
| Type: capiv1.MachineInternalIP, | ||
| Address: "192.168.1.10", | ||
| }, | ||
| { | ||
| Type: capiv1.MachineInternalIP, | ||
| Address: "169.254.0.2", | ||
| }, | ||
| { | ||
| Type: capiv1.MachineInternalIP, | ||
| Address: "fe80::1", | ||
| }, | ||
| { | ||
| Type: capiv1.MachineInternalIP, | ||
| Address: "10.10.10.2", | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
| }, | ||
| expectedAllMachine: &testCondition{ | ||
| Status: corev1.ConditionTrue, | ||
| Reason: hyperv1.AsExpectedReason, | ||
| Messages: []string{hyperv1.AllIsWellMessage}, | ||
| }, | ||
| expectedAllNodes: &testCondition{ | ||
| Status: corev1.ConditionTrue, | ||
| Reason: hyperv1.AsExpectedReason, | ||
| Messages: []string{hyperv1.AllIsWellMessage}, | ||
| }, | ||
| expectedCIDRCollision: nil, | ||
| }, | ||
| } { | ||
| t.Run(tc.name, func(tt *testing.T) { | ||
| gg := NewWithT(tt) |
There was a problem hiding this comment.
[blocking] This test case verifies two behaviors simultaneously — link-local address skipping AND out-of-cluster address suppression — because the machine has 192.168.1.10 (outside), 169.254.0.2 (link-local v4), fe80::1 (link-local v6), and 10.10.10.2 (in-cluster). Since 192.168.1.10 is outside, the heuristic suppresses the conflict regardless of whether link-local filtering works.
There is no test that verifies link-local addresses are excluded from the "outside" determination. Please add a case where a machine has only link-local + in-cluster addresses (no outside address):
{
name: "When machine has only link-local and in-cluster addresses it should report cidr collision",
// Addresses: 169.254.0.2 (link-local), fe80::1 (link-local), 10.10.10.2 (in-cluster)
// Expected: conflict IS reported because link-local doesn't count as "outside"
}This tests the behavioral contract that link-local addresses are truly excluded from the heuristic, not just skipped during iteration. Without it, a bug that counts link-local as "outside" would silently suppress real conflicts.
There was a problem hiding this comment.
Done. Added a dedicated test case: "When machine has only link-local and in-cluster addresses it should report cidr collision"
This machine has only 169.254.0.2 (link-local v4), fe80::1 (link-local v6), and 10.128.0.5 (in-cluster), with no outside address. Asserts that the conflict IS reported, verifying that link-local addresses are truly excluded from the "outside" determination and do not silently suppress real conflicts.
| func TestSetCIDRConflictConditionDualStack(t *testing.T) { | ||
| r := NodePoolReconciler{} |
There was a problem hiding this comment.
[suggestion] A few improvements for this test function:
-
Add
t.Parallel(): This function creates freshNodePool,HostedCluster, andNodePoolReconciler{}per subtest with no shared mutable state. It should be parallelized, consistent withTestSetMachineAndNodeConditionswhich callst.Parallel(). -
Add a multi-machine test case: All test cases use a single machine. Please add a case with 2+ machines where one has a real conflict (all addresses in-cluster) and another is suppressed (has an outside address). This verifies the per-machine logic doesn't leak across machines.
-
[nit] This function tests
setCIDRConflictConditiondirectly while the two cases added toTestSetMachineAndNodeConditionsabove test it indirectly via the parentsetMachineAndNodeConditions. The split is reasonable (isolated unit test vs integration), but consider adding a brief comment at the top explaining why both exist — future maintainers may wonder whether they're redundant.
There was a problem hiding this comment.
All three done:
- Added
t.Parallel()at the function level andtt.Parallel()in each subtest. - Added multi-machine test case:
"When multiple machines have mixed conflict states it should only report the conflicting one"(one machine with all addresses in-cluster, another with an outside address, verifies per-machine isolation). - Added a comment at the top of
TestSetCIDRConflictConditionDualStackexplaining why both this direct unit test and the integration-level cases inTestSetMachineAndNodeConditionsexist.
cdd2396 to
a28c276
Compare
|
@bryan-cox does it look better now? |
|
/ok-to-test |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: amasolov, bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe root cause is a missing The hypershift CI runs unit tests in 5 parallel shards ( The Codecov report itself confirms the anomaly: it is "36 commits behind head on main" and shows 315 new files (+40,577 lines) between the base and PR — this is impossible for a 2-file change. The base commit PR #8642 ("Add codecov carryforward flags to stabilize project coverage checks") was merged to Recommendations
Evidence
|
Since CAPK began exposing all VMI interface IPs for dual-stack support (cluster-api-provider-kubevirt#366), KubeVirt machines report OVN-Kubernetes management port addresses (e.g. ovn-k8s-mp0) alongside the real infrastructure address. These OVN IPs fall within the hosted cluster's own clusterNetwork CIDR by design, causing setCIDRConflictCondition to raise a false-positive ClusterNetworkCIDRConflict condition on every KubeVirt NodePool. Fix the check so that a collision is only reported when ALL of a machine's non-link-local addresses are inside the cluster network, which indicates the machine's infrastructure IP genuinely overlaps with the pod CIDR. When a machine also has addresses outside the cluster network, the in-network addresses are treated as expected CNI-internal IPs and skipped. Additionally: - Parse and check all ClusterNetwork CIDRs for dual-stack support - Deduplicate addresses per machine using canonical IP form - Skip link-local addresses (169.254.0.0/16, fe80::/10) - Clear the condition when no conflicts remain - Log suppressed CNI-internal addresses at debug level Signed-off-by: Alexey Masolov <amasolov@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
a28c276 to
0c35447
Compare
|
/ok-to-test |
|
@amasolov: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@bryan-cox @csrwng @enxebre @muraee @sdminonne review please to avoid going in circles again |
@amasolov I put the approve on so I'll leave it up to someone else to put the final lgtm on. You'll also need to have this PR pre-merge tested for the verified label. Do you have someone on your team who could QE this PR? |
|
@bryan-cox thank you for all your help here! I'm a TAM, so I don't have a QE team behind me for this sort of thing. I raised and authored this PR purely to help my customer work through a real issue they're hitting in production with KubeVirt NodePools. The unit tests pass and all CI is green, but I'm not set up to run the pre-merge verification workflow myself. I understand there are ongoing QE capacity challenges across many of our products, so I don't want to make assumptions about who can pick this up. What would be the best path forward here? Is there a QE team or individual who typically handles verification for HyperShift PRs that I could reach out to, or is there another process I should follow to get the verified label? Happy to provide any additional context or help coordinate, just don't want this to go stale again. |
What this PR does / why we need it:
Since CAPK began exposing all VMI interface IPs for dual-stack support (kubernetes-sigs/cluster-api-provider-kubevirt#366), KubeVirt machines report OVN-Kubernetes management port addresses alongside the real infrastructure address. These OVN IPs fall within the hosted cluster's own
clusterNetworkCIDR by design, causingsetCIDRConflictConditionto raise a false-positiveClusterNetworkCIDRConflictcondition on every KubeVirt NodePool.This PR fixes the check to only report a collision when ALL of a machine's non-link-local addresses are inside the cluster network. When a machine also has addresses outside the cluster network, the in-network addresses are treated as expected CNI-internal IPs. Additionally:
Which issue(s) this PR fixes:
Fixes #8229
Special notes for your reviewer:
The root cause is an interaction between two independent changes:
ClusterNetworkCIDRConflictcondition, checking allMachineInternalIP/MachineExternalIPaddresses against theclusterNetworkCIDR.KubevirtMachineaddress reporting to include all VMI interface IPs for dual-stack CSR approval support.On KubeVirt guests running OVN-Kubernetes, the VMI now exposes the
ovn-k8s-mp0management port IP (which is by design within the pod CIDR) as aMachineInternalIP. The original check did not anticipate this because CAPK previously only reported the single primary DHCP-assigned address.The fix uses a heuristic: if a machine has at least one non-link-local address outside the cluster network, then any in-network addresses are assumed to be CNI-internal and are not flagged. A real conflict is only reported when ALL addresses are within the cluster network.
The pre-existing build failure in
secret_janitor_test.go(missingreleaseinfo.NewMockProviderWithRegistryOverrides) is unrelated to this change.Checklist:
Summary by CodeRabbit
Bug Fixes
Tests