Skip to content

feat(bmc-mock): model the BlueField DPU/NIC mode flip#2689

Open
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2661
Open

feat(bmc-mock): model the BlueField DPU/NIC mode flip#2689
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2661

Conversation

@chet

@chet chet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

The Bluefield OEM endpoint now accepts the BF-3 Mode.Set action and applies it the way real hardware does: the requested DPU/NIC mode is staged and only takes effect on the next power cycle, so a read-back before then still reports the old mode. The staged mode is applied on the PowerOn event, alongside the existing Dell BIOS-job completion.

Until now the mock could report a Bluefield's mode but not change it, so a test could only stand up a host whose DPU was already in NIC mode -- never watch one get flipped through the same set_nic_mode + power-cycle path site-explorer drives. This is the harness piece the end-to-end DPU-to-NIC re-registration test (#2632) builds on.

Adds a smoke test: a Bluefield starting in DPU mode reports NIC mode only after a staged Mode.Set followed by a power-on, and an unstaged power-on leaves the mode untouched.

Scope note: this is the bmc-mock half of #2661. The machine-a-tron side -- propagating a host power-cycle to its DPUs so a simulated host flips within a run -- is deferred to #2632, where the end-to-end test actually exercises it.

Supports #2661

@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 848743bc-2510-4bda-859c-cdfb2905abee

📥 Commits

Reviewing files that changed from the base of the PR and between 94fce5b and f8e78ad.

📒 Files selected for processing (3)
  • crates/bmc-mock/src/bmc_state.rs
  • crates/bmc-mock/src/http.rs
  • crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/bmc-mock/src/bmc_state.rs
  • crates/bmc-mock/src/http.rs
  • crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs

Summary by CodeRabbit

Release Notes

  • New Features

    • Added BlueField Mode staging: mode changes are queued and only take effect after a power cycle.
    • Introduced a BlueField /Actions/Mode.Set endpoint to configure pending NIC/DPU mode.
  • Improvements

    • Improved power-on behavior to apply any pending BlueField mode.
    • Enhanced request validation for Mode.Set with strict, case-insensitive NIC/DPU parsing and clearer HTTP 400 error responses.
  • Tests

    • Added coverage for staged-vs-applied behavior and invalid/missing Mode handling.

Walkthrough

The bmc-mock crate introduces a production-grade staged mode model for BlueField hardware. BluefieldState transitions from direct field access to a thread-safe Mutex-protected ModeState structure that queues mode changes until the next power cycle. A new Mode.Set Redfish action handler performs strict payload validation and stages the transition; the handler integrates a new bad_request() HTTP helper for consistent error responses. The staged mode is then applied during the PowerOn event in BmcState, ensuring mode transitions occur at the appropriate lifecycle boundary.

Changes

BlueField Staged Mode Change

Layer / File(s) Summary
BluefieldState staged mode data model
crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs
Replaces the direct nic_mode field with ModeState wrapped in Arc<Mutex<...>>, providing thread-safe access. Adds nic_mode() to read the current mode, stage_mode() to queue pending transitions, and pub apply_pending_mode() to promote staged changes to active. Includes Arc and Mutex imports.
Mode.Set handler, routing, and bad_request helper
crates/bmc-mock/src/http.rs, crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs
Introduces bad_request(message) to return standardized 400 JSON error responses. Registers the /Actions/Mode.Set endpoint for BF-3. Updates the OEM get_oem_nvidia handler to call nic_mode() for read-back consistency. Implements the mode_set handler with case-insensitive Mode field validation, staged state transition via stage_mode(), and comprehensive unit tests verifying staged visibility and strict payload parsing.
PowerOn event wiring
crates/bmc-mock/src/bmc_state.rs
Introduces apply_pending_bluefield_mode(&self) as a private method to BmcState that applies any queued BlueField mode when oem_state contains an NvidiaBluefield instance. Integrates this method into the PowerOn branch of on_event() immediately after complete_all_bios_jobs() to ensure staged transitions take effect during power-on.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the primary change: implementing BlueField DPU/NIC mode flip functionality, which aligns with the core objective of enabling mode staging and application on power cycles.
Description check ✅ Passed The description provides comprehensive context about the implementation, its purpose, and integration with the broader testing framework, clearly explaining how the staged mode takes effect on power cycles.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@chet

chet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai PTAL, thank you!

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Fetch PR details
gh pr view 2689 --repo NVIDIA/infra-controller --json title,body,files,additions,deletions,changedFiles

echo "---FILES CHANGED---"
gh pr diff 2689 --repo NVIDIA/infra-controller

Length of output: 8471



@chet — clean, well-scoped implementation. The staging semantics are modelled correctly and the Arc<Mutex<ModeState>> approach integrates naturally with Axum's shared-state pattern. A few observations below, all minor.


crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs

1. Missing test for the invalid-Mode rejection path

The smoke test covers the happy path thoroughly, but the mode_set handler has an explicit rejection branch (bad_request) that is not exercised by any unit test. Extending the unit tests to cover it would make the strict-validation intent self-documenting:

#[test]
fn stage_mode_rejects_unknown_mode_value() {
    // This validates the parsing logic that mode_set relies on.
    // "NicMode" / "DpuMode" (case-insensitive) are the only valid values.
    assert!("NicMode".eq_ignore_ascii_case("nicmode"));
    assert!(!"Bogus".eq_ignore_ascii_case("NicMode"));
    assert!(!"Bogus".eq_ignore_ascii_case("DpuMode"));
}

(Or, preferably, extract the string → bool parsing into a small helper that can be tested directly, keeping the handler thin.)


2. HTTP 200 vs 204 for the Mode.Set action

mode_set returns json!({}).into_ok_response() (HTTP 200). The Redfish spec calls for 204 No Content on action endpoints that carry no response body. The existing mock already uses ok_no_content() (204) in places — using it here would be more consistent with the spec and with the mock's own precedent:

// instead of:
json!({}).into_ok_response()

// prefer:
http::ok_no_content()

3. Doc-comment method name reference in ModeState

The pending_nic_mode doc comment says "see BmcState::on_event". The bmc_state.rs diff shows the call site inside a match event { BmcEvent::PowerOn => … } block — worth confirming the enclosing method is actually named on_event so the cross-reference stays navigable.


Summary

The core logic — staging, apply-on-power-on, no-op when nothing is staged — is correct and well-tested for the happy path. The strict Mode validation over silent defaulting is a sound design choice for a test harness. The above are polish items; the PR is otherwise production-grade for mock code.

@chet

chet commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs (1)

164-188: 💤 Low value

Consider adding test coverage for invalid Mode rejection.

The mode_set handler validates strictly and rejects invalid Mode values (lines 151-158), but this branch is not exercised by the unit test. For a test harness where strict validation is a deliberate design choice, documenting this behavior through tests would make the contract self-evident.

Given the table-driven test preference in this codebase, consider extracting the string-to-bool parsing logic and using scenarios! to cover valid and invalid inputs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs` around lines 164 - 188,
The existing `mode_set_is_staged_and_applied_on_power_on` test does not cover
the invalid Mode rejection logic in the `mode_set` handler, leaving that
validation branch untested. Extract the string-to-bool parsing logic from the
`mode_set` handler into a separate testable function, and then add table-driven
test cases using the `scenarios!` macro to cover both valid Mode inputs that
should succeed and invalid Mode inputs that should be rejected, ensuring the
strict validation behavior is documented and exercised through tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs`:
- Line 161: The Mode.Set action endpoint currently uses
json!({}).into_ok_response() which returns HTTP 200 with an empty JSON body, but
Redfish specification requires HTTP 204 No Content for action endpoints with no
response payload. Replace the json!({}).into_ok_response() call with the
ok_no_content() function that is already available in the crate and used
elsewhere for this purpose.

---

Nitpick comments:
In `@crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs`:
- Around line 164-188: The existing `mode_set_is_staged_and_applied_on_power_on`
test does not cover the invalid Mode rejection logic in the `mode_set` handler,
leaving that validation branch untested. Extract the string-to-bool parsing
logic from the `mode_set` handler into a separate testable function, and then
add table-driven test cases using the `scenarios!` macro to cover both valid
Mode inputs that should succeed and invalid Mode inputs that should be rejected,
ensuring the strict validation behavior is documented and exercised through
tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 59545cbd-edfc-4087-9b37-beedabf199a7

📥 Commits

Reviewing files that changed from the base of the PR and between 6d36a43 and 1c8b177.

📒 Files selected for processing (3)
  • crates/bmc-mock/src/bmc_state.rs
  • crates/bmc-mock/src/http.rs
  • crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs

Comment thread crates/bmc-mock/src/redfish/oem/nvidia/bluefield.rs Outdated
@chet chet marked this pull request as ready for review June 19, 2026 01:22
@chet chet requested a review from a team as a code owner June 19, 2026 01:22
The Bluefield OEM endpoint now accepts the BF-3 `Mode.Set` action and applies
it the way real hardware does: the requested DPU/NIC mode is staged and only
takes effect on the next power cycle, so a read-back before then still reports
the old mode. The staged mode is applied on the `PowerOn` event, alongside the
existing Dell BIOS-job completion.

Until now the mock could report a Bluefield's mode but not change it, so a test
could only stand up a host whose DPU was already in NIC mode -- never watch one
get flipped through the same `set_nic_mode` + power-cycle path site-explorer
drives. This is the harness piece the end-to-end DPU-to-NIC re-registration
test (NVIDIA#2632) builds on.

Adds a smoke test: a Bluefield starting in DPU mode reports NIC mode only after
a staged `Mode.Set` followed by a power-on, and an unstaged power-on leaves the
mode untouched.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@github-actions

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants