Describe the question.
When running some video decoding benchmarks by building a DALIGenericIterator with a single pipeline on my GH200 system and checking the NVDEC engine utilization via
nvidia-smi dmon -s u
I find my DEC utilization to be capped at ~14%, which would align with only one (out of seven) NVDEC engines being used. According to the documentation I found, the NVIDIA driver should take care of load balancing between the different decoding units.
However, when creating multiple DALI piplines (e.g., seven), I find my DEC utilization to be close to 100%, indicating that all NVDEC engines are used. In raw decoding performance, running multiple pipelines on the same GPU also give me a performance boost.
My questions are:
- Can a single DALI pipeline only use a single decoding unit?
- Is running multiple pipelines on the same GPU the default way to utilize all decoding units?
Check for duplicates
Describe the question.
When running some video decoding benchmarks by building a DALIGenericIterator with a single pipeline on my GH200 system and checking the NVDEC engine utilization via
nvidia-smi dmon -s uI find my DEC utilization to be capped at ~14%, which would align with only one (out of seven) NVDEC engines being used. According to the documentation I found, the NVIDIA driver should take care of load balancing between the different decoding units.
However, when creating multiple DALI piplines (e.g., seven), I find my DEC utilization to be close to 100%, indicating that all NVDEC engines are used. In raw decoding performance, running multiple pipelines on the same GPU also give me a performance boost.
My questions are:
Check for duplicates