Skip to content

feat: honor a declared primary host NIC when selecting the boot device#2690

Merged
chet merged 1 commit into
NVIDIA:mainfrom
chet:honor-declared-primary-interface
Jun 19, 2026
Merged

feat: honor a declared primary host NIC when selecting the boot device#2690
chet merged 1 commit into
NVIDIA:mainfrom
chet:honor-declared-primary-interface

Conversation

@chet

@chet chet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Makes a host's declared ExpectedHostNic.primary authoritative when selecting its boot device, for the paths that own a NIC today (zero-DPU / NIC-mode hosts).

primary is meant to declare the boot interface, but ingestion ignored it -- predicted interfaces always promoted as non-primary, so a declared NIC never stuck and the host booted via the lowest-MAC fallback until an operator ran set-primary-interface. This records the declared primary on the prediction so it survives promotion, resolves it through one ExpectedMachineData::declared_primary_mac() the writers share, and honors it in the DHCP find-or-create path.

It also settles the multi-NIC pre-ingestion window: several NICs leasing before ingestion each defaulted to primary, and adopting the second one tripped the one_primary_interface_per_machine index and failed the host's ingestion (OnePrimaryInterface). Ingestion now reconciles the adopted rows to exactly one primary.

Nothing declared -> today's automation stands (DPU takeover during ingestion, else the pick_boot_interface lowest-MAC fallback).

Scope

Part of #2657 (epic #2660). The genuine-DPU-mode case -- booting a declared integrated NIC while the DPU stays managed -- is split into #2668: there, demoting the DPU host interface must come together with owning the integrated NIC as the replacement primary admin interface (coupled by the admin-primary invariant), so it can't ride along here.

Test plan

  • cargo clippy --all-features --tests clean over the touched crates; cargo +nightly fmt.
  • New tests: a declared_primary_mac resolver unit test; zero-DPU integration tests for declared-primary-via-prediction and multi-NIC no-collision adoption; an api-core DHCP arrival-order test. Existing seed + DPU ingestion tests still pass.

@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2a8944c9-6329-4b24-aefc-8a1d76759207

📥 Commits

Reviewing files that changed from the base of the PR and between 0d1d8e3 and 1737378.

📒 Files selected for processing (10)
  • crates/api-core/src/dhcp/discover.rs
  • crates/api-core/src/handlers/bmc_endpoint_explorer.rs
  • crates/api-core/src/tests/expected_machine.rs
  • crates/api-db/migrations/20260616215917_predicted_machine_interface_primary.sql
  • crates/api-db/src/machine_interface.rs
  • crates/api-db/src/predicted_machine_interface.rs
  • crates/api-model/src/expected_machine.rs
  • crates/api-model/src/predicted_machine_interface.rs
  • crates/site-explorer/src/machine_creator.rs
  • crates/site-explorer/tests/zero_dpu.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/api-db/migrations/20260616215917_predicted_machine_interface_primary.sql
🚧 Files skipped from review as they are similar to previous changes (9)
  • crates/api-db/src/predicted_machine_interface.rs
  • crates/api-core/src/tests/expected_machine.rs
  • crates/api-core/src/dhcp/discover.rs
  • crates/api-model/src/expected_machine.rs
  • crates/api-model/src/predicted_machine_interface.rs
  • crates/site-explorer/src/machine_creator.rs
  • crates/site-explorer/tests/zero_dpu.rs
  • crates/api-core/src/handlers/bmc_endpoint_explorer.rs
  • crates/api-db/src/machine_interface.rs

Summary by CodeRabbit

  • Bug Fixes
    • Ensured declared primary host NICs and predicted primary boot interfaces persist correctly across DHCP arrival and promotion, including reverse discovery/ingestion ordering on multi-NIC machines.
    • Reconciled primary flags when creating or moving machine and predicted interfaces so the declared intent is authoritative.
  • Performance
    • Added an index on predicted_machine_interfaces(machine_id) to speed up promotion lookups.
  • Tests
    • Added regression and zero-DPU ingestion tests covering primary preservation and reconciliation behavior.

Walkthrough

A new primary_interface boolean field is added to PredictedMachineInterface (backed by a DB migration) and a declared_primary_mac() method is introduced on ExpectedMachineData. These primitives are wired through DHCP discovery, predicted-interface promotion (move_predicted_machine_interface_to_machine), and zero-DPU machine ingestion so that ExpectedHostNic.primary is authoritative regardless of DHCP arrival order.

Changes

Declared Primary NIC propagation through discovery and ingestion

Layer / File(s) Summary
Schema migration and model contracts
crates/api-db/migrations/20260616215917_predicted_machine_interface_primary.sql, crates/api-model/src/predicted_machine_interface.rs, crates/api-model/src/expected_machine.rs
Adds primary_interface column (default false) and machine_id index to predicted_machine_interfaces; adds primary_interface: bool to PredictedMachineInterface and NewPredictedMachineInterface; introduces declared_primary_mac() on ExpectedMachineData to extract the declared primary MAC from host NICs with unit tests covering both absent and declared primary cases.
Database persistence and interface reconciliation
crates/api-db/src/predicted_machine_interface.rs, crates/api-db/src/machine_interface.rs
Extends create to persist primary_interface from the prediction; upgrades find_or_create_machine_interface with bidirectional primary-flag reconciliation comparing the row's current primary_interface against caller-declared is_primary and applying set_primary_interface when they differ; reworks move_predicted_machine_interface_to_machine to carry the prediction's declared primary value through creation and reconcile existing interfaces before machine association.
DHCP discovery integration with declared primary
crates/api-core/src/dhcp/discover.rs
Refactors discover_dhcp expected-host path to replace a dual-purpose host_nics loop with separate calls: declared_primary_mac() for determining primary status and iter().find() for NIC selection by MAC, eliminating manual primary inference iteration.
Zero-DPU ingestion: declared primary propagation
crates/site-explorer/src/machine_creator.rs
Derives declared_primary MAC from ExpectedMachineData, computes is_declared_primary per observed NIC, reconciles existing unowned machine_interface rows to the declared primary before adoption, and stamps primary_interface on newly created predicted_machine_interface entries to ensure declaration authority regardless of ingestion order.
Integration tests, regression tests, and resolver documentation
crates/api-core/src/tests/expected_machine.rs, crates/api-core/src/handlers/bmc_endpoint_explorer.rs, crates/site-explorer/tests/zero_dpu.rs
Adds DHCP ordering regression test verifying declared primary survives reversed discovery arrival; introduces two zero-DPU multi-NIC integration tests (no-declaration adoption preventing primary collision, and declared-primary promotion preserving intent across DHCP order); updates BMC endpoint explorer documentation to clarify that primary_interface is carried in predictions but intentionally not consulted by the resolver for boot-interface disambiguation.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(173, 216, 230, 0.5)
    Note over DHCPClient,machine_interfaces: DHCP Discovery Path
    DHCPClient->>discover_dhcp: DHCP packet (mac_address)
    discover_dhcp->>ExpectedMachineData: declared_primary_mac()
    ExpectedMachineData-->>discover_dhcp: Option~MacAddress~
    discover_dhcp->>find_or_create_machine_interface: is_primary = Some(mac == declared)
    find_or_create_machine_interface->>set_primary_interface: reconcile if differs
    find_or_create_machine_interface-->>discover_dhcp: machine_interface
  end
  rect rgba(144, 238, 144, 0.5)
    Note over SiteExplorer,machine_interfaces: Zero-DPU Ingestion Path
    SiteExplorer->>create_zero_dpu_machine: ExpectedMachineData + observed MACs
    create_zero_dpu_machine->>ExpectedMachineData: declared_primary_mac()
    ExpectedMachineData-->>create_zero_dpu_machine: Option~MacAddress~
    create_zero_dpu_machine->>set_primary_interface: reconcile existing unowned interface
    create_zero_dpu_machine->>predicted_machine_interface_create: primary_interface = is_declared_primary
    predicted_machine_interface_create-->>create_zero_dpu_machine: PredictedMachineInterface
    create_zero_dpu_machine->>move_predicted_machine_interface_to_machine: promote prediction
    move_predicted_machine_interface_to_machine->>set_primary_interface: reconcile if current_primary differs
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the primary change: enabling the declared primary host NIC to be honored during boot device selection.
Description check ✅ Passed The description comprehensively explains the problem, solution, scope, and test coverage, with clear traceability to related issues and PRs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@chet

chet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai PTAL, thank you!

@chet

chet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/api-core/src/handlers/bmc_endpoint_explorer.rs (1)

1320-1355: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update outdated comment about predictions and primary flag.

The comment at lines 1322-1323 states "Predictions hold no primary flag," which is no longer accurate after this PR introduces the primary_interface field on PredictedMachineInterface. Predictions now carry the declared primary intent. The comment should be revised to reflect that these specific test predictions are non-primary (as created by the predicted() helper with primary_interface: false), not that predictions categorically lack a primary flag.

📝 Suggested comment revision
-    // Predictions hold no primary flag, so with several (a report listing
-    // SuperNICs alongside the boot NIC) the declared intent is unknowable:
+    // When no prediction is marked primary, with several predictions (a report listing
+    // SuperNICs alongside the boot NIC) the declared intent is unknowable:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/handlers/bmc_endpoint_explorer.rs` around lines 1320 -
1355, Update the outdated comment in the test function
no_mac_multiple_predictions_refuse_to_guess_a_boot_device. The current comment
incorrectly states that predictions hold no primary flag, but the
PredictedMachineInterface struct now includes a primary_interface field. Revise
the comment to clarify that predictions do support a primary flag, but the
specific test predictions created by the predicted() helper function are marked
as non-primary, and the test verifies that resolution refuses to guess the boot
device when facing multiple non-primary predictions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/api-core/src/handlers/bmc_endpoint_explorer.rs`:
- Around line 1320-1355: Update the outdated comment in the test function
no_mac_multiple_predictions_refuse_to_guess_a_boot_device. The current comment
incorrectly states that predictions hold no primary flag, but the
PredictedMachineInterface struct now includes a primary_interface field. Revise
the comment to clarify that predictions do support a primary flag, but the
specific test predictions created by the predicted() helper function are marked
as non-primary, and the test verifies that resolution refuses to guess the boot
device when facing multiple non-primary predictions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 68671db3-5b89-481d-a88b-dd9d806611be

📥 Commits

Reviewing files that changed from the base of the PR and between 6d36a43 and 5c25a50.

📒 Files selected for processing (10)
  • crates/api-core/src/dhcp/discover.rs
  • crates/api-core/src/handlers/bmc_endpoint_explorer.rs
  • crates/api-core/src/tests/expected_machine.rs
  • crates/api-db/migrations/20260616215917_predicted_machine_interface_primary.sql
  • crates/api-db/src/machine_interface.rs
  • crates/api-db/src/predicted_machine_interface.rs
  • crates/api-model/src/expected_machine.rs
  • crates/api-model/src/predicted_machine_interface.rs
  • crates/site-explorer/src/machine_creator.rs
  • crates/site-explorer/tests/zero_dpu.rs

@chet chet force-pushed the honor-declared-primary-interface branch from 5c25a50 to 0d1d8e3 Compare June 18, 2026 07:02
@chet

chet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai Thanks for the eyes. Updated -- revised the stale "predictions hold no primary flag" note in three spots (the resolve_admin_boot_interface_target doc + inline, and the no_mac_multiple_predictions_refuse_to_guess_a_boot_device test) to reflect that predictions now carry primary_interface but this resolver doesn't consult it yet. PTAL, thanks!

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@chet Acknowledged — the documentation corrections are the right call; keeping inline comments and test commentary accurate with the actual data model avoids misleading future contributors. Triggering a review of the updated changes now.

(ᵔᴥᵔ)🐇

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chet chet marked this pull request as ready for review June 19, 2026 01:22
@chet chet requested a review from a team as a code owner June 19, 2026 01:22
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 261 5 22 103 6 125
machine-validation-runner 691 31 182 257 35 186
machine_validation 691 31 182 257 35 186
nvmetal-carbide 691 31 182 257 35 186
TOTAL 2340 98 568 880 111 683

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

ExpectedHostNic.primary is meant to say "this NIC is the host's boot
interface," but ingestion ignored it: a zero-DPU or NIC-mode host's
predicted interfaces always promoted as non-primary, so the declared NIC
never stuck and the host booted via the lowest-MAC fallback until an
operator ran set-primary-interface. The declared primary is now
authoritative across the writers that own a NIC -- recorded on the
prediction so it survives promotion, and honored by the DHCP path.

This also settles the pre-ingestion window for multi-NIC hosts: several
NICs that lease before ingestion each default to primary, and adopting the
second one tripped the one_primary_interface_per_machine index and failed
the host's ingestion. Ingestion now reconciles the adopted rows so exactly
one interface is primary.

- Record the declared primary on predicted_machine_interfaces and resolve
  it through one ExpectedMachineData::declared_primary_mac() the writers
  share; promotion and the DHCP find-or-create path both honor it.
- Reconcile the adopted interfaces at zero-DPU ingestion to exactly one
  primary, ending the OnePrimaryInterface adoption failure.
- Nothing declared -> today's automation stands: DPU takeover during
  ingestion, else the pick_boot_interface lowest-MAC fallback.

Tests cover the declared primary surviving DHCP arrival order, promotion
landing it on the real row, multi-NIC adoption no longer colliding, and a
resolver unit test.

Part of NVIDIA#2657 (epic NVIDIA#2660). The genuine-DPU-mode case -- a declared
integrated NIC while the DPU stays managed -- is tracked separately in

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
NVIDIA#2668.
@chet chet force-pushed the honor-declared-primary-interface branch from 0d1d8e3 to 1737378 Compare June 19, 2026 05:45
@chet chet merged commit 3b0f5ed into NVIDIA:main Jun 19, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants