Skip to content

feat: improve copy-bfb-to-dpu-rshim flow#920

Merged
krish-nvidia merged 5 commits intoNVIDIA:mainfrom
krish-nvidia:improve-bfb-copy
Apr 13, 2026
Merged

feat: improve copy-bfb-to-dpu-rshim flow#920
krish-nvidia merged 5 commits intoNVIDIA:mainfrom
krish-nvidia:improve-bfb-copy

Conversation

@krish-nvidia
Copy link
Copy Markdown
Contributor

@krish-nvidia krish-nvidia commented Apr 13, 2026

Description

This PR improves the carbide-admin-cli site-explorer copy-bfb-to-dpu-rshim flow in preingestion by:

  • Switching BFB copy from russh SFTP to the system scp binary with SSH_ASKPASS-based password auth. No combination of buffer size or timeout worked reliably with BF2 DPUs over russh's SFTP channel.
  • Making host_bmc_ip required to issue a platform powercycle after the BFB is successfully installed.
  • Adding --pre-copy-powercycle to optionally power-cycle the host before the copy to release rshim control to the DPU BMC.
  • Detects BFB installation completion by SSHing into the DPU BMC and checking the serial console output for markers (login:, Running bfb_post_install from bf.cfg, total 100% complete). Once detected, the post-install host power-cycle is triggered automatically.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

Signed-off-by: Krish Dandiwala <kdandiwala@nvidia.com>
@krish-nvidia krish-nvidia requested a review from wminckler April 13, 2026 18:19
@krish-nvidia krish-nvidia self-assigned this Apr 13, 2026
@krish-nvidia krish-nvidia requested a review from a team as a code owner April 13, 2026 18:19
@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-04-13 18:21:33 UTC | Commit: e5ff17c

}

if let Some(host_ip) = host_bmc_ip {
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've seen this in rust before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a way of creating a new scope so that all the variables inside it are freed after the block ends. Pretty much just a helper function for the host endpoint validation logic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I know what it does, I've just never seen it used in rust ;)

Signed-off-by: Krish Dandiwala <kdandiwala@nvidia.com>
Comment thread crates/ssh/src/ssh.rs Outdated
Comment thread crates/ssh/src/ssh.rs
.map_err(io_ssh_error)?;
std::fs::set_permissions(&askpass_path, std::fs::Permissions::from_mode(0o700))
.map_err(io_ssh_error)?;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if writing a pw to a file is such a good idea. how does ssh console do it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSH console uses the russh library directly to create a client, so the password stays in memory. I need to supply the password somehow to the system scp call so I went with the SSH_ASKPASS tempfile as the simplest approach without changing the container build. Another way would be to install sshpass in the carbide-api container, but I didn't want to add that dependency.

preingestion.bfb already lives in the container with bf.cfg appended, which contains the BMC user/password and never gets cleaned up unlike the tempfile :(

@krish-nvidia krish-nvidia requested a review from wminckler April 13, 2026 23:00
@krish-nvidia krish-nvidia merged commit 74eabda into NVIDIA:main Apr 13, 2026
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants