feat: Add Fabric Manager partition support for NVLink-enabled multi-GPU VMs#158
Open
dnugmanov wants to merge 1 commit intoNVIDIA:masterfrom
Open
feat: Add Fabric Manager partition support for NVLink-enabled multi-GPU VMs#158dnugmanov wants to merge 1 commit intoNVIDIA:masterfrom
dnugmanov wants to merge 1 commit intoNVIDIA:masterfrom
Conversation
9557994 to
13496fa
Compare
…PU VMs Implements support for NVIDIA Fabric Manager partition-aware GPU allocation, enabling NVLink connectivity for multi-GPU VMs in KubeVirt on DGX/HGX H100 systems using the Shared NVSwitch Virtualization Model. Closes NVIDIA#133 Changes: - Add pkg/fabric_manager/ with FM SDK CGO bindings - Implement GetPreferredAllocation for partition-aware allocation - Add automatic partition activation/deactivation in Allocate - Update Dockerfile with FM SDK installation - Add --fm-enabled and --fm-address CLI flags
13496fa to
620a537
Compare
rthallisey
reviewed
Jan 30, 2026
| name: nvidia-kubevirt-gpu-dp-ds | ||
| spec: | ||
| priorityClassName: system-node-critical | ||
| hostNetwork: true # Required for FM API access on localhost:6666 |
Collaborator
There was a problem hiding this comment.
There should be a way to do this without going over the Node's network. A shared socket would be better.
| - name: vfio | ||
| hostPath: | ||
| path: /dev/vfio | ||
| - name: sys |
Collaborator
There was a problem hiding this comment.
Can you explain what we need access to /sys for?
| return dpi | ||
| } | ||
|
|
||
| // SetPartitionManager sets the Fabric Manager partition manager for NVLink support. |
Collaborator
There was a problem hiding this comment.
Why does the device plugin need to configure the Fabric Manager? Can't this happen another way, like through cloud-init or a side-car?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issue
Closes #133 - How to adapt the Shared NVSwitch Virtualization Model of FM to activate nvlink in multi-gpu VMs
Description
This PR implements support for NVIDIA Fabric Manager (FM) partition-aware GPU allocation, enabling NVLink connectivity for multi-GPU VMs in KubeVirt on DGX/HGX H100 systems using the Shared NVSwitch Virtualization Model.
Background
In virtualized environments with DGX/HGX H100 systems, NVIDIA provides the Shared NVSwitch Virtualization Model to enable NVLink connections for multi-GPU VMs. This requires that GPUs assigned to a VM must belong to the same FM partition to establish NVLink fabric connectivity.
What This PR Does
libnvfm) for partition discovery and activationGetPreferredAllocationto recommend GPUs from the same partitionAllocate()and deactivates on pod deletionEnvironment Assumptions
shared_nvswitchmode on the hostvfio-pcidriver for passthroughhostNetwork: trueto access FM on localhostWe would appreciate feedback from maintainers on this implementation approach and any suggestions for improvement.