Skip to content

Commit 65f8d48

Browse files
[Docker] Update the CUDA version in the default Docker image to 12.8 (from 12.1) (#3166)
* [Docker] Update the CUDA version in the default Docker image to 12.8 (from 12.1) #3163 * Pin base image version for Azure GRID image * [Azure] Downgrade Linux Kernel to 6.8 as a workaround to install Grid driver * Bugfix * [Docker] Update the CUDA version in the default Docker image to 12.8 (from 12.1) #3163 Updated base_image to 0.11 --------- Co-authored-by: Jvst Me <git@jvst.me>
1 parent 8f9689c commit 65f8d48

7 files changed

Lines changed: 32 additions & 5 deletions

File tree

docker/base/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
ARG UBUNTU_VERSION
33

44
# Build stage
5-
FROM nvidia/cuda:12.1.1-base-ubuntu${UBUNTU_VERSION}.04 AS builder
5+
FROM nvidia/cuda:12.8.1-base-ubuntu${UBUNTU_VERSION}.04 AS builder
66

77
ENV NCCL_HOME=/opt/nccl
88
ENV CUDA_HOME=/usr/local/cuda

docker/base/Dockerfile.common

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
ARG UBUNTU_VERSION
22

3-
FROM nvidia/cuda:12.1.1-base-ubuntu${UBUNTU_VERSION}.04
3+
FROM nvidia/cuda:12.8.1-base-ubuntu${UBUNTU_VERSION}.04
44

55
ARG _UV_HOME="/opt/uv"
66

scripts/packer/azure-image-grid.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,15 @@
6363
"./install-docker.sh --version {{user `docker_version`}}"
6464
]
6565
},
66+
{
67+
"type": "shell",
68+
"script": "provisioners/downgrade-azure-kernel.sh"
69+
},
70+
{
71+
"type": "shell",
72+
"inline": ["sudo reboot"],
73+
"expect_disconnect": true
74+
},
6675
{
6776
"type": "shell",
6877
"script": "provisioners/install-nvidia-grid-driver-for-azure.sh"
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
3+
# based on https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux#known-issues
4+
# this is a temporary solution only required until the issue is fixed
5+
6+
set -e
7+
8+
# Install 6.8 kernel
9+
sudo apt-get update
10+
sudo DEBIAN_FRONTEND=noninteractive apt install linux-image-6.8.0-1015-azure linux-headers-6.8.0-1015-azure -y
11+
12+
# Update the Grub entry name
13+
grub_entry_name="$(sudo grep -Po "menuentry '\KUbuntu, with Linux 6\.8[^(']+" /boot/grub/grub.cfg | sort -V | head -1)"
14+
sudo sed -i "s/^\s*GRUB_DEFAULT=.*$/GRUB_DEFAULT='Advanced options for Ubuntu>$grub_entry_name'/" /etc/default/grub
15+
sudo update-grub
16+
17+
# Disable the kernel package upgrade
18+
sudo apt-mark hold $(dpkg --get-selections | grep -Po "^linux[^\t]+${grub_entry_name##* }")

src/dstack/_internal/core/backends/vastai/compute.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def __init__(self, config: VastAIConfig):
4747
"reliability2": {"gte": 0.9},
4848
"inet_down": {"gt": 128},
4949
"verified": {"eq": True},
50-
"cuda_max_good": {"gte": 12.1},
50+
"cuda_max_good": {"gte": 12.8},
5151
"compute_cap": {"gte": 600},
5252
}
5353
)

src/dstack/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,5 @@
55

66
__version__ = "0.0.0"
77
__is_release__ = False
8-
base_image = "0.11rc2"
8+
base_image = "0.11"
99
base_image_ubuntu_version = "22.04"

src/tests/_internal/server/routers/test_runs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ def get_dev_env_run_dict(
333333
" && tail -f /dev/null"
334334
),
335335
]
336-
image_name = "dstackai/base:0.11rc2-base-ubuntu22.04"
336+
image_name = "dstackai/base:0.11-base-ubuntu22.04"
337337

338338
return {
339339
"id": run_id,

0 commit comments

Comments
 (0)