Cannot run kubevirt virtual machine using nvidia GPU plugin

Hello, 

I have a kubernetes cluster running virtual machines using kubevirt. 
A worker node in the cluster has GPU, i want to use the GPU for the VM in passthrough mode. 
I enabled the feature gate, deployed the kubevirt-gpu-device plugin on the cluster. 

lspci -nn | grep -i nvidia
04:00.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)

> lspci -nnk -d 10de:
> 04:00.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)
> 	Subsystem: NVIDIA Corporation 12GB Computational Accelerator [10de:097e]
> 	Kernel driver in use: vfio-pci
> 	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

This is how the logs from the gpu-device-plugin pod look like:

kubectl logs pod/nvidia-kubevirt-gpu-dp-daemonset-4xmm8 -n kube-system

> 2024/01/05 12:59:06 Not a device, continuing
> 2024/01/05 12:59:06 Nvidia device  0000:03:00.0
> 2024/01/05 12:59:06 Iommu Group 22
> 2024/01/05 12:59:06 Device Id 1023
> 2024/01/05 12:59:06 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
> 2024/01/05 12:59:06 Iommu Map map[22:[{0000:03:00.0}]]
> 2024/01/05 12:59:06 Device Map map[1023:[22]]
> 2024/01/05 12:59:06 vGPU Map  map[]
> 2024/01/05 12:59:06 GPU vGPU Map  map[]
> 2024/01/05 12:59:06 DP Name GK110BGL_TESLA_K40M
> 2024/01/05 12:59:06 Devicename GK110BGL_TESLA_K40M
> 2024/01/05 12:59:06 GK110BGL_TESLA_K40M Device plugin server ready
> 2024/01/05 12:59:06 healthCheck(GK110BGL_TESLA_K40M): invoked


> ls -l /var/lib/kubelet/device-plugins/
> total 40
> -rw------- 1 root root 39215 Jan  5 13:59 kubelet_internal_checkpoint
> srwxr-xr-x 1 root root     0 Jan  4 11:40 kubelet.sock
> srwxr-xr-x 1 root root     0 Jan  5 13:59 kubevirt-GK110BGL_TESLA_K40M.sock
> 

It still couldn't run the pod, it says 'no preemption victims found for incoming pod'
What am i missing? could someone help.

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------

>   Warning  FailedScheduling  47m                default-scheduler  0/5 nodes are available: 1 Insufficient nvidia.com/GK110BGL_Tesla_K40m, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..
>   Warning  FailedScheduling  16m (x6 over 41m)  default-scheduler  0/5 nodes are available: 1 Insufficient nvidia.com/GK110BGL_Tesla_K40m, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run kubevirt virtual machine using nvidia GPU plugin #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot run kubevirt virtual machine using nvidia GPU plugin #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions