Hello,
I've been using MPS on bare metal to optimize the gpu usage and performance as we've seen that even for a single process, having an MPS server leads to better performances than not having it.
I wanted to replicate the same on my kubernetes cluster using this configuration:
driver:
enabled: false
toolkit:
enabled: true
cdi:
enabled: false
nfd:
enabled: true
gfd:
enabled: true
migManager:
enabled: false
devicePlugin:
enabled: true
config:
name: device-plugin-config
create: true
default: default
data:
default: |-
version: v1
flags:
migStrategy: none
failOnInitError: true
rtx-2080-ti: |-
version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 1
But after I label the node with the rtx-2080-ti configuration the gpu-feature-discovery and nvidia-device-plugin-daemonset pods fail due to this error:
I1203 12:02:32.316208 193 main.go:163] Starting OS watcher.
I1203 12:02:32.316516 193 main.go:168] Loading configuration.
I1203 12:02:32.317279 193 main.go:160] Exiting
E1203 12:02:32.317308 193 main.go:127] unable to load config: unable to finalize config: unable to parse config file: error parsing config file: unmarshal error: error unmarshaling JSON: while decoding JSON: number of replicas must be >= 2
Why is not allowed to set MPS replicas to 1? Is there a way to deploy MPS without splitting the GPU in multiple replicas?
Hello,
I've been using MPS on bare metal to optimize the gpu usage and performance as we've seen that even for a single process, having an MPS server leads to better performances than not having it.
I wanted to replicate the same on my kubernetes cluster using this configuration:
But after I label the node with the
rtx-2080-ticonfiguration thegpu-feature-discoveryandnvidia-device-plugin-daemonsetpods fail due to this error:Why is not allowed to set MPS replicas to 1? Is there a way to deploy MPS without splitting the GPU in multiple replicas?