OCPBUGS-86311: fix: validate agent-config interface names match networkConfig#10567
Conversation
openshift-install agent does not cross-validate that interface names in hosts[].interfaces[] match the names used in hosts[].networkConfig. When names mismatch, the pre-network-manager-config.sh script silently fails to rename .nmconnection files at boot time, causing complete network failure for bond/VLAN/bridge topologies with no diagnostic. Add validateInterfaceNamesMatchNetworkConfig() to validateAgentHosts() that ensures every interfaces[].name exists in the networkConfig interfaces list. The error message lists valid networkConfig names to guide users toward the correct configuration. Only the agent-config.yaml path is affected; install-config.yaml derives interface names from networkConfig automatically, so names always match. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Note
|
| Layer / File(s) | Summary |
|---|---|
Origin flag and wiring pkg/asset/agent/agentconfig/agenthosts.go |
Adds internal hostsFromAgentConfig and sets it when hosts are appended from agentConfig.Config.Hosts (install) or addNodesConfig.Config.Hosts (add-nodes). |
Validation warning implementation pkg/asset/agent/agentconfig/agenthosts.go |
validateAgentHosts now calls warnInterfaceNamesNotInNetworkConfig for applicable hosts. The helper unmarshals host.NetworkConfig.Raw (silently returns on unmarshal failure), gathers NMState interfaces[].name, and logs Warnf for each host interface name not present in that set. The behavior is a warning, not a validation error, and only runs when hosts originated from agent-config. |
Tests, fixtures and helpers pkg/asset/agent/agentconfig/agenthosts_test.go |
Adds agentNetworkConfigBond fixture; updates expected host interface names from enp3s1 to eth0; extends TestAgentHosts_Generate with match/mismatch and bonded cases; adds helpers getAgentConfigBondMatching, getAgentConfigBondMismatched, getAgentConfigMismatchedInterfaceName, getAgentConfigMatchingInterfaceName, HostBuilder.rawNetworkConfig, and edge-case generators (empty and malformed network config). |
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 11
❌ Failed checks (1 warning, 10 inconclusive)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. | |
| Stable And Deterministic Test Names | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| Test Structure And Quality | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| Microshift Test Compatibility | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| Single Node Openshift (Sno) Test Compatibility | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| Topology-Aware Scheduling Compatibility | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| Ote Binary Stdout Contract | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| Ipv6 And Disconnected Network Test Compatibility | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| No-Weak-Crypto | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| Container-Privileges | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
| No-Sensitive-Data-In-Logs | ❓ Inconclusive | Repository clone failed, so this custom check could not run with code access. | Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title directly describes the main change: adding validation that agent-config interface names match networkConfig, aligning with the PR's core objective. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Warning
There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.
🔧 golangci-lint (2.12.2)
Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
|
Hi @chdeshpa-hue. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/jira refresh |
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| if !ncNames[iface.Name] { | ||
| errMsg := "interface name \"" + iface.Name + "\" not found in networkConfig interfaces [" + strings.Join(ncNameList, ", ") + "]; " + | ||
| "the interfaces[].name values are logical names that must match the interface names used in networkConfig " + | ||
| "so that the MAC-to-interface mapping works correctly at boot time" |
There was a problem hiding this comment.
I don't think this is true. Or rather, it is true that the names need to match for the MAC-to-interface mapping to work. But if the interface name correctly matches the one defined by the kernel (or if the nmstate uses identifier: mac-address), then you don't need the MAC-to-interface mapping in order for it to choose the right interface.
And in fact there is at least one important case where we rely on this: when using an unmodified baremetal IPI install-config to do an agent install without an agent-config.yaml. Baremetal IPI does not support a MAC-to-interface mapping, so the input must always match up to the true interface names. It does, however, require providing one MAC address to identify the host, and so when we internally generate the host list we just use a bogus name for that interface that doesn't match the ones in the nmstate config. This change will break that feature.
There was a problem hiding this comment.
Thanks for the thorough review @zaneb — both points are well taken.
You're right that the baremetal IPI → agent path (where getInstallConfigDefaults generates the fallback "boot" interface name at L276-280) would be broken by this validation. I missed that topology entirely — the validation assumed all interfaces[] entries are user-provided, which isn't true for that flow.
And I appreciate the clarification on the nmstate-as-opaque-blob principle. I can see how relying on parsing its internal structure creates a fragile coupling.
Given these constraints, would you be open to a narrower alternative?
Option A: Move the diagnostic to the boot script itself
Enhance pre-network-manager-config.sh to emit a clear error when sed finds zero matches during the rename step — something like "WARNING: interface 'foo' from agent-config not found in generated .nmconnection files". This keeps the installer from parsing nmstate at all and catches the failure at the point where it actually matters.
Option B: Warn-only at build time, scoped to agent-config.yaml path
Only run the check when interfaces[] comes from a user-provided agent-config.yaml (not from getInstallConfigDefaults), and emit a warning instead of a hard error. This still gives users early feedback for the common bond/VLAN misconfiguration case without blocking the baremetal IPI path.
The underlying problem we're solving is that bond/VLAN/bridge topologies silently get zero connectivity when names mismatch, and users get no useful diagnostic. Either option would address that without violating the design principles. Happy to rework the PR if either direction seems reasonable to you.
There was a problem hiding this comment.
Yes, I like the Option B proposal.
Warning instead of error, and keeping it on the agent-config path rather than after data from install-config and agent-config are combined, would address my main concerns.
Better if we continue to treat the NMState as opaque and get the info we want from the keyfiles, but given that we are already not following this principle to some extent and that there will only be a warning instead of an error, I would not block on that.
There was a problem hiding this comment.
Thanks @zaneb — I've updated the PR to implement your suggested approach:
-
Warning instead of error — uses
logrus.Warnfso it never blocks legitimate configs (e.g.identifier: mac-addresscases) -
Scoped to user-authored hosts only — the
hostsFromAgentConfigguard ensures it only fires for:agent-config.yamlhosts (Install workflow)nodes-config.yamlhosts (oc adm node-image create/ AddNodes workflow — ref OCPBUGS-86420)
It never fires for the
getInstallConfigDefaultspath (where the synthetic"boot"interface name is generated from baremetal IPI install-config) -
NMState treated as opaque — the check only extracts interface names (top-level
interfaces[].name), consistent with the existing parsing already in the file. No deeper structural assumptions.
Test coverage includes the inert install-config path, AddNodes mismatch/match, empty interface names, and malformed networkConfig graceful handling.
| } | ||
|
|
||
| var netInterfaces nmStateInterface | ||
| if err := yaml.Unmarshal(host.NetworkConfig.Raw, &netInterfaces); err != nil { |
There was a problem hiding this comment.
It was part of our design principles that we treat nmstate as an opaque blob and not rely on knowing the internal structure of it, which may change over time.
Addresses @zaneb's review: the interface name cross-check against networkConfig is now a warning (logrus.Warnf) instead of a hard error, and runs for both agent-config.yaml and nodes-config.yaml (oc adm node-image create) paths — but never for install-config baremetal hosts where getInstallConfigDefaults generates synthetic interface names. This ensures the baremetal IPI fallback path (which generates a bogus "boot" interface name) is never affected, while giving users early visibility into potential name mismatches that could cause connectivity failures at boot time. Test coverage added for: - install-config inert path (no warning fires) - AddNodes workflow mismatch (warns) and match (no warning) - empty interface name (skipped gracefully) - malformed networkConfig YAML (no panic) - bond interfaces matching and mismatching Ref: OCPBUGS-86420 Co-authored-by: Cursor <cursoragent@cursor.com>
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/asset/agent/agentconfig/agenthosts.go (1)
182-217: 💤 Low valueConsider deduplicating warnings per host.
The warning logic emits one
Warnfper mismatched interface. If a host has multiple mismatched interfaces, this produces multiple log lines with the same networkConfig list. Consider collecting all mismatched names and emitting a single warning per host.📋 Example refactor to deduplicate warnings
+ var mismatched []string for _, iface := range host.Interfaces { if iface.Name == "" { continue } if !ncNames[iface.Name] { - logrus.Warnf("agent-config: interface name %q not found in networkConfig interfaces %v; "+ - "connectivity may fail if interface names do not match at boot time", - iface.Name, ncNameList) + mismatched = append(mismatched, iface.Name) } } + if len(mismatched) > 0 { + logrus.Warnf("agent-config: interface names %v not found in networkConfig interfaces %v; "+ + "connectivity may fail if interface names do not match at boot time", + mismatched, ncNameList) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/agent/agentconfig/agenthosts.go` around lines 182 - 217, In warnInterfaceNamesNotInNetworkConfig, instead of calling logrus.Warnf for each mismatched iface, collect mismatched interface names (e.g. into a slice like mismatchedNames) while iterating host.Interfaces (skip empty names and use ncNames to check membership), and after the loop emit a single logrus.Warnf that includes the host identifier, the deduplicated mismatchedNames and the ncNameList; ensure you only log when mismatchedNames is non-empty to preserve the existing early-return behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/asset/agent/agentconfig/agenthosts.go`:
- Around line 182-217: In warnInterfaceNamesNotInNetworkConfig, instead of
calling logrus.Warnf for each mismatched iface, collect mismatched interface
names (e.g. into a slice like mismatchedNames) while iterating host.Interfaces
(skip empty names and use ncNames to check membership), and after the loop emit
a single logrus.Warnf that includes the host identifier, the deduplicated
mismatchedNames and the ncNameList; ensure you only log when mismatchedNames is
non-empty to preserve the existing early-return behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b97e3420-d29f-4dd5-85c6-5cc36f9f770d
📒 Files selected for processing (2)
pkg/asset/agent/agentconfig/agenthosts.gopkg/asset/agent/agentconfig/agenthosts_test.go
|
/ok-to-test |
|
@chdeshpa-hue: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
validateInterfaceNamesMatchNetworkConfig()tovalidateAgentHosts()inpkg/asset/agent/agentconfig/agenthosts.gothat cross-validateshosts[].interfaces[].namevalues exist inhosts[].networkConfiginterfacesinterfaces[]names (enp3s1) did not matchnetworkConfignames (eth0) — these were silently inconsistent before the new validationProblem
openshift-install agent create imageaccepts agent-config.yaml wherehosts[].interfaces[]names don't matchhosts[].networkConfiginterface names. At boot, thepre-network-manager-config.shscript usesinterfaces[]names to find and rename.nmconnectionfiles generated fromnetworkConfig. When names mismatch:sedreplacements find zero matches (the script says "updated" but replaces nothing)Only the
agent-config.yamlpath is affected. Theinstall-config.yamlpath derives interface names FROMnetworkConfigingetInstallConfigDefaults(), so names always match by construction.Test plan
interface-name-mismatch-with-networkconfig— single ethernet, name mismatch rejectedinterface-name-matches-networkconfig— single ethernet, matching names passbond-networkconfig-with-matching-interfaces— bond with 2 slaves, matching names passbond-networkconfig-with-mismatched-interfaces— bond with 2 slaves, mismatch rejectedMade with Cursor
Summary by CodeRabbit
New Features
Tests