I currently have Openshift 4.13 with the Openshift Virtualization (CNV) installed.
I installed the nvidia drivers through https://github.com/vladikr/ocp-nvidia-vgpu-installer, and they work as expected.
I gave to the HyperConverged yaml file the following:
spec:
mediatedDevicesConfiguration:
mediatedDevicesTypes:
- nvidia-258
permittedHostDevices:
mediatedDevices:
- mdevNameSelector: "GRID RTX6000-3Q"
resourceName: "nvidia.com/GRID_RTX6000-3Q"
externalResourceProvider: true
obviously checking that nvidia-258 exists:
$ cd /sys/bus/pci/devices/0000:05:00.0/mdev_supported_types
$ cat nvidia-258/available_instances
8
Then I created 2 mdev devices
$ UUID=$(uuidgen);
$ echo "${UUID}" > nvidia-258/create;
$ mdevctl define --auto --uuid $UUID;
$ mdevctl list
Then I installed the kubevirt-gpu-device-plugin, but when I inspect the nodes log I see
2023/08/29 09:36:53 Not a device, continuing
2023/08/29 09:36:53 Nvidia device 0000:05:00.0
2023/08/29 09:36:53 Not a device, continuing
2023/08/29 09:36:53 Gpu id is 0000:05:00.0
2023/08/29 09:36:53 Vgpu id is GRID_RTX6000-3Q
2023/08/29 09:36:53 Gpu id is 0000:05:00.0
2023/08/29 09:36:53 Vgpu id is GRID_RTX6000-3Q
2023/08/29 09:36:53 Iommu Map map[]
2023/08/29 09:36:53 Device Map map[]
2023/08/29 09:36:53 vGPU Map map[GRID_RTX6000-3Q:[{21ad712a-f454-498c-84d5-4116f3723c01} {43922f20-6573-4d6b-9223-a2ca02f83b29}]]
2023/08/29 09:36:53 GPU vGPU Map map[0000:05:00.0:[21ad712a-f454-498c-84d5-4116f3723c01 43922f20-6573-4d6b-9223-a2ca02f83b29]]
2023/08/29 09:36:53 Could not find NVIDIA device with id: GRID_RTX6000-3Q
2023/08/29 09:36:53 DP Name GRID_RTX6000-3Q
2023/08/29 09:36:53 Devicename GRID_RTX6000-3Q
2023/08/29 09:36:58 [GRID_RTX6000-3Q] Error registering with device plugin manager: context deadline exceeded
2023/08/29 09:36:58 Error starting GRID_RTX6000-3Q device plugin: context deadline exceeded
And I can't run any VMI/VM as once I schedule one, it is never scheduled as it doesn't find any vgpu available when I provide the following to the yaml file:
spec:
gpus:
- deviceName: nvidia.com/GRID_RTX6000-3Q
name: vgpu1
What did I do wrong?
I currently have Openshift 4.13 with the Openshift Virtualization (CNV) installed.
I installed the nvidia drivers through https://github.com/vladikr/ocp-nvidia-vgpu-installer, and they work as expected.
I gave to the HyperConverged yaml file the following:
obviously checking that nvidia-258 exists:
Then I created 2 mdev devices
Then I installed the kubevirt-gpu-device-plugin, but when I inspect the nodes log I see
And I can't run any VMI/VM as once I schedule one, it is never scheduled as it doesn't find any vgpu available when I provide the following to the yaml file:
What did I do wrong?