Skip to content

feat: Add Fabric Manager partition support for NVLink-enabled multi-GPU VMs#158

Open
dnugmanov wants to merge 1 commit intoNVIDIA:masterfrom
dnugmanov:feat/fabric-manager-partition-support
Open

feat: Add Fabric Manager partition support for NVLink-enabled multi-GPU VMs#158
dnugmanov wants to merge 1 commit intoNVIDIA:masterfrom
dnugmanov:feat/fabric-manager-partition-support

Conversation

@dnugmanov
Copy link
Copy Markdown

Related Issue

Closes #133 - How to adapt the Shared NVSwitch Virtualization Model of FM to activate nvlink in multi-gpu VMs


Description

This PR implements support for NVIDIA Fabric Manager (FM) partition-aware GPU allocation, enabling NVLink connectivity for multi-GPU VMs in KubeVirt on DGX/HGX H100 systems using the Shared NVSwitch Virtualization Model.

Background

In virtualized environments with DGX/HGX H100 systems, NVIDIA provides the Shared NVSwitch Virtualization Model to enable NVLink connections for multi-GPU VMs. This requires that GPUs assigned to a VM must belong to the same FM partition to establish NVLink fabric connectivity.

What This PR Does

  1. FM SDK Integration: Adds CGO bindings to the Fabric Manager SDK (libnvfm) for partition discovery and activation
  2. Partition-Aware Allocation: Implements GetPreferredAllocation to recommend GPUs from the same partition
  3. Automatic Partition Activation: Activates FM partitions during Allocate() and deactivates on pod deletion

Environment Assumptions

Assumption Details
FM Daemon Running Fabric Manager daemon must be running in shared_nvswitch mode on the host
Driver Version Tested with NVIDIA driver 580.x series
GPU Architecture Tested on H100 SXM5 80GB (HGX H100 system)
VFIO Binding GPUs must be bound to vfio-pci driver for passthrough
Host Network Plugin requires hostNetwork: true to access FM on localhost

We would appreciate feedback from maintainers on this implementation approach and any suggestions for improvement.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Dec 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dnugmanov dnugmanov force-pushed the feat/fabric-manager-partition-support branch from 9557994 to 13496fa Compare December 29, 2025 09:50
…PU VMs

Implements support for NVIDIA Fabric Manager partition-aware GPU allocation,
enabling NVLink connectivity for multi-GPU VMs in KubeVirt on DGX/HGX H100
systems using the Shared NVSwitch Virtualization Model.

Closes NVIDIA#133

Changes:
- Add pkg/fabric_manager/ with FM SDK CGO bindings
- Implement GetPreferredAllocation for partition-aware allocation
- Add automatic partition activation/deactivation in Allocate
- Update Dockerfile with FM SDK installation
- Add --fm-enabled and --fm-address CLI flags
@dnugmanov dnugmanov force-pushed the feat/fabric-manager-partition-support branch from 13496fa to 620a537 Compare December 29, 2025 09:51
name: nvidia-kubevirt-gpu-dp-ds
spec:
priorityClassName: system-node-critical
hostNetwork: true # Required for FM API access on localhost:6666
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a way to do this without going over the Node's network. A shared socket would be better.

- name: vfio
hostPath:
path: /dev/vfio
- name: sys
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what we need access to /sys for?

return dpi
}

// SetPartitionManager sets the Fabric Manager partition manager for NVLink support.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the device plugin need to configure the Fabric Manager? Can't this happen another way, like through cloud-init or a side-car?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to adapt the Shared NVSwitch Virtualization Model of FM to activate nvlink in multi-gpu VMs

2 participants