Fix NVIDIA container toolkit bug in all backends by jvstme · Pull Request #2877 · dstackai/dstack

jvstme · 2025-07-07T07:01:09Z

Use cgroupfs as a Docker cgroup driver on all
backends by default to work around an NVIDIA
container toolkit bug where the container looses
access to the GPU.

The patch to /etc/docker/daemon.json is
automatically applied in all VM-based backends if
/etc/docker/daemon.json exists, has the NVIDIA
runtime, does not explicitly set another cgroup
driver, and jq is installed. This is not the
case in Nebius. Lambda, and CUDO, so they still
need custom code to apply the workaround - either
installing jq or just writing a hardcoded
/etc/docker/daemon.json that is known to work on this backend.

#2860

Use cgroupfs as a Docker cgroup driver on all backends by default to work around an NVIDIA container toolkit bug where the container looses access to the GPU. The patch to `/etc/docker/daemon.json` is automatically applied in all VM-based backends if `/etc/docker/daemon.json` exists, has the NVIDIA runtime, does not explicitly set another cgroup driver, and `jq` is installed. This is not the case in Nebius. Lambda, and CUDO, so they still need custom code to apply the workaround - either installing `jq` or just writing a hardcoded `/etc/docker/daemon.json` that is known to work on this backend.

jvstme requested a review from un-def July 7, 2025 07:20

un-def approved these changes Jul 7, 2025

View reviewed changes

jvstme merged commit 66ffdd1 into master Jul 7, 2025
25 checks passed

jvstme deleted the issue_2860_nvidia_bug_workaround branch July 7, 2025 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NVIDIA container toolkit bug in all backends#2877

Fix NVIDIA container toolkit bug in all backends#2877
jvstme merged 1 commit intomasterfrom
issue_2860_nvidia_bug_workaround

jvstme commented Jul 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jvstme commented Jul 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants