Hello,
I have a kubernetes cluster running virtual machines using kubevirt.
A worker node in the cluster has GPU, i want to use the GPU for the VM in passthrough mode.
I enabled the feature gate, deployed the kubevirt-gpu-device plugin on the cluster.
lspci -nn | grep -i nvidia
04:00.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)
lspci -nnk -d 10de:
04:00.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)
Subsystem: NVIDIA Corporation 12GB Computational Accelerator [10de:097e]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
This is how the logs from the gpu-device-plugin pod look like:
kubectl logs pod/nvidia-kubevirt-gpu-dp-daemonset-4xmm8 -n kube-system
2024/01/05 12:59:06 Not a device, continuing
2024/01/05 12:59:06 Nvidia device 0000:03:00.0
2024/01/05 12:59:06 Iommu Group 22
2024/01/05 12:59:06 Device Id 1023
2024/01/05 12:59:06 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
2024/01/05 12:59:06 Iommu Map map[22:[{0000:03:00.0}]]
2024/01/05 12:59:06 Device Map map[1023:[22]]
2024/01/05 12:59:06 vGPU Map map[]
2024/01/05 12:59:06 GPU vGPU Map map[]
2024/01/05 12:59:06 DP Name GK110BGL_TESLA_K40M
2024/01/05 12:59:06 Devicename GK110BGL_TESLA_K40M
2024/01/05 12:59:06 GK110BGL_TESLA_K40M Device plugin server ready
2024/01/05 12:59:06 healthCheck(GK110BGL_TESLA_K40M): invoked
ls -l /var/lib/kubelet/device-plugins/
total 40
-rw------- 1 root root 39215 Jan 5 13:59 kubelet_internal_checkpoint
srwxr-xr-x 1 root root 0 Jan 4 11:40 kubelet.sock
srwxr-xr-x 1 root root 0 Jan 5 13:59 kubevirt-GK110BGL_TESLA_K40M.sock
It still couldn't run the pod, it says 'no preemption victims found for incoming pod'
What am i missing? could someone help.
Events:
Type Reason Age From Message
Warning FailedScheduling 47m default-scheduler 0/5 nodes are available: 1 Insufficient nvidia.com/GK110BGL_Tesla_K40m, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..
Warning FailedScheduling 16m (x6 over 41m) default-scheduler 0/5 nodes are available: 1 Insufficient nvidia.com/GK110BGL_Tesla_K40m, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..
Hello,
I have a kubernetes cluster running virtual machines using kubevirt.
A worker node in the cluster has GPU, i want to use the GPU for the VM in passthrough mode.
I enabled the feature gate, deployed the kubevirt-gpu-device plugin on the cluster.
lspci -nn | grep -i nvidia
04:00.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)
This is how the logs from the gpu-device-plugin pod look like:
kubectl logs pod/nvidia-kubevirt-gpu-dp-daemonset-4xmm8 -n kube-system
It still couldn't run the pod, it says 'no preemption victims found for incoming pod'
What am i missing? could someone help.
Events:
Type Reason Age From Message