Update NVIDIA driver in dstack OS images#3099
Merged
peterschmidt85 merged 5 commits intomasterfrom Sep 23, 2025
Merged
Conversation
Update the driver to support NVIDIA B200. - Update from the 535 to the 570 family. - Update to Ubuntu 24.04, since Ubuntu 22.04 does not have the gcc version required for building the 570 driver. - Switch from proprietary to open kernel modules. - Since pre-Turing GPUs aren't supported by NVIDIA open kernel modules, conditionally choose between old and new dstack OS images based on the GPU name. - Adjust handling `apt` race conditions - the existing hack did not work on OCI's Ubuntu 24.04. - Install `ufw` when building the image - it is missing in OCI's Ubuntu 24.04.
r4victor
approved these changes
Sep 15, 2025
Contributor
|
Building 0.11rc2 via https://github.com/dstackai/dstack/actions/runs/17939443790 |
| image_name = ( | ||
| f"dstack-{version.base_image}" if not cuda else f"dstack-cuda-{version.base_image}" | ||
| ) | ||
| if gpu_name is None: |
Contributor
There was a problem hiding this comment.
Regarding AWS, just to confirm, the new image is only required for very few GPU types note covered by AWS DLAMI (e.g. T4), right?
Contributor
peterschmidt85
left a comment
There was a problem hiding this comment.
Regarding Azure:
Not that it's a problem but for the Grid image (e.g., A10:4GB), we are still hard-coding the old CUDA version (550):
peterschmidt85
approved these changes
Sep 23, 2025
Bumped `base_image` to `0.11rc2`
Updated tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently,
aws,gcp,azure, andocibackends use our custom OS images. These images use 535 version of the CUDA driver. This version doesn't support newer generations of GPUs such as NVIDIA B200.This PR updates the scripts that build these custom OS images to update the CUDA version from 535 to 570.
This PR is required for #3100 (issue #3088).
Scope:
gccversion required for building the 570 driver.aptrace conditions - the existing hack did not work on OCI's Ubuntu 24.04.ufwwhen building the image - it is missing in OCI's Ubuntu 24.04.Notes:
azure's Grid drivers (used for A10).gcp's A3 OS image scriptBefore/upon merging:
base_imageinversion.py