Skip to content

Commit 84f2b2b

Browse files
committed
Update examples
* remove `dstack init` prerequisite, as repos are optional since 0.19.25 * replace repo paths with `files`
1 parent 38e66bc commit 84f2b2b

File tree

26 files changed

+327
-562
lines changed

26 files changed

+327
-562
lines changed

docs/blog/posts/amd-on-tensorwave.md

Lines changed: 15 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
22
title: Using SSH fleets with TensorWave's private AMD cloud
33
date: 2025-03-11
4-
description: "This tutorial walks you through how dstack can be used with TensorWave's private AMD cloud using SSH fleets."
4+
description: "This tutorial walks you through how dstack can be used with TensorWave's private AMD cloud using SSH fleets."
55
slug: amd-on-tensorwave
66
image: https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png
77
categories:
88
- Case studies
99
---
1010

11-
# Using SSH fleets with TensorWave's private AMD cloud
11+
# Using SSH fleets with TensorWave's private AMD cloud
1212

1313
Since last month, when we introduced support for private clouds and data centers, it has become easier to use `dstack`
1414
to orchestrate AI containers with any AI cloud vendor, whether they provide on-demand compute or reserved clusters.
1515

1616
In this tutorial, we’ll walk you through how `dstack` can be used with
17-
[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using
17+
[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using
1818
[SSH fleets](../../docs/concepts/fleets.md#ssh).
1919

2020
<img src="https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png" width="630"/>
@@ -31,19 +31,7 @@ TensorWave dashboard.
3131

3232
## Creating a fleet
3333

34-
??? info "Prerequisites"
35-
Once `dstack` is [installed](https://dstack.ai/docs/installation), create a project repo folder and run `dstack init`.
36-
37-
<div class="termy">
38-
39-
```shell
40-
$ mkdir tensorwave-demo && cd tensorwave-demo
41-
$ dstack init
42-
```
43-
44-
</div>
45-
46-
Now, define an SSH fleet configuration by listing the IP addresses of each node in the cluster,
34+
Define an SSH fleet configuration by listing the IP addresses of each node in the cluster,
4735
along with the SSH user and SSH key configured for each host.
4836

4937
<div editor-title="fleet.dstack.yml">
@@ -79,9 +67,9 @@ $ dstack apply -f fleet.dstack.yml
7967
Provisioning...
8068
---> 100%
8169
82-
FLEET INSTANCE RESOURCES STATUS CREATED
83-
my-tensorwave-fleet 0 8xMI300X (192GB) 0/8 busy 3 mins ago
84-
1 8xMI300X (192GB) 0/8 busy 3 mins ago
70+
FLEET INSTANCE RESOURCES STATUS CREATED
71+
my-tensorwave-fleet 0 8xMI300X (192GB) 0/8 busy 3 mins ago
72+
1 8xMI300X (192GB) 0/8 busy 3 mins ago
8573
8674
```
8775

@@ -98,7 +86,7 @@ Once the fleet is created, you can use `dstack` to run workloads.
9886

9987
A dev environment lets you access an instance through your desktop IDE.
10088

101-
<div editor-title=".dstack.yml">
89+
<div editor-title=".dstack.yml">
10290

10391
```yaml
10492
type: dev-environment
@@ -137,9 +125,9 @@ Open the link to access the dev environment using your desktop IDE.
137125
138126
A task allows you to schedule a job or run a web app. Tasks can be distributed and support port forwarding.
139127
140-
Below is a distributed training task configuration:
128+
Below is a distributed training task configuration:
141129
142-
<div editor-title="train.dstack.yml">
130+
<div editor-title="train.dstack.yml">
143131
144132
```yaml
145133
type: task
@@ -175,7 +163,7 @@ Provisioning `train-distrib`...
175163

176164
</div>
177165

178-
`dstack` automatically runs the container on each node while passing
166+
`dstack` automatically runs the container on each node while passing
179167
[system environment variables](../../docs/concepts/tasks.md#system-environment-variables)
180168
which you can use with `torchrun`, `accelerate`, or other distributed frameworks.
181169

@@ -185,7 +173,7 @@ A service allows you to deploy a model or any web app as a scalable and secure e
185173

186174
Create the following configuration file inside the repo:
187175

188-
<div editor-title="deepseek.dstack.yml">
176+
<div editor-title="deepseek.dstack.yml">
189177

190178
```yaml
191179
type: service
@@ -196,7 +184,7 @@ env:
196184
- MODEL_ID=deepseek-ai/DeepSeek-R1
197185
- HSA_NO_SCRATCH_RECLAIM=1
198186
commands:
199-
- python3 -m sglang.launch_server --model-path $MODEL_ID --port 8000 --tp 8 --trust-remote-code
187+
- python3 -m sglang.launch_server --model-path $MODEL_ID --port 8000 --tp 8 --trust-remote-code
200188
port: 8000
201189
model: deepseek-ai/DeepSeek-R1
202190

@@ -221,7 +209,7 @@ Submit the run `deepseek-r1-sglang`? [y/n]: y
221209
Provisioning `deepseek-r1-sglang`...
222210
---> 100%
223211

224-
Service is published at:
212+
Service is published at:
225213
http://localhost:3000/proxy/services/main/deepseek-r1-sglang/
226214
Model deepseek-ai/DeepSeek-R1 is published at:
227215
http://localhost:3000/proxy/models/main/
@@ -236,6 +224,6 @@ Want to see how it works? Check out the video below:
236224
<iframe width="750" height="520" src="https://www.youtube.com/embed/b1vAgm5fCfE?si=qw2gYHkMjERohdad&rel=0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
237225
238226
!!! info "What's next?"
239-
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
227+
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
240228
2. Read about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md)
241229
3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)

examples/accelerators/amd/README.md

Lines changed: 37 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
# AMD
22

33
`dstack` supports running dev environments, tasks, and services on AMD GPUs.
4-
You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh)
4+
You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh)
55
with on-prem AMD GPUs or configuring a backend that offers AMD GPUs such as the `runpod` backend.
66

77
## Deployment
88

9-
Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
9+
Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
1010
Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/installation_amd){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/getting_started/amd-installation.html){:target="_blank"}.
1111

1212
=== "TGI"
13-
14-
<div editor-title="examples/inference/tgi/amd/.dstack.yml">
15-
13+
14+
<div editor-title="examples/inference/tgi/amd/.dstack.yml">
15+
1616
```yaml
1717
type: service
1818
name: amd-service-tgi
19-
19+
2020
# Using the official TGI's ROCm Docker image
2121
image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
2222

@@ -30,26 +30,26 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
3030
port: 8000
3131
# Register the model
3232
model: meta-llama/Meta-Llama-3.1-70B-Instruct
33-
33+
3434
# Uncomment to leverage spot instances
3535
#spot_policy: auto
36-
36+
3737
resources:
3838
gpu: MI300X
3939
disk: 150GB
4040
```
41-
41+
4242
</div>
4343

4444

4545
=== "vLLM"
4646

47-
<div editor-title="examples/inference/vllm/amd/.dstack.yml">
48-
47+
<div editor-title="examples/inference/vllm/amd/.dstack.yml">
48+
4949
```yaml
5050
type: service
5151
name: llama31-service-vllm-amd
52-
52+
5353
# Using RunPod's ROCm Docker image
5454
image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04
5555
# Required environment variables
@@ -84,20 +84,20 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
8484
port: 8000
8585
# Register the model
8686
model: meta-llama/Meta-Llama-3.1-70B-Instruct
87-
87+
8888
# Uncomment to leverage spot instances
8989
#spot_policy: auto
90-
90+
9191
resources:
9292
gpu: MI300X
9393
disk: 200GB
9494
```
9595
</div>
9696

9797
Note, maximum size of vLLM’s `KV cache` is 126192, consequently we must set `MAX_MODEL_LEN` to 126192. Adding `/opt/conda/envs/py_3.10/bin` to PATH ensures we use the Python 3.10 environment necessary for the pre-built binaries compiled specifically for this version.
98-
99-
> To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3.
100-
> You can find the task to build and upload the binary in
98+
99+
> To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3.
100+
> You can find the task to build and upload the binary in
101101
> [`examples/inference/vllm/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd/){:target="_blank"}.
102102

103103
!!! info "Docker image"
@@ -110,22 +110,26 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
110110

111111
=== "TRL"
112112

113-
Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"}
113+
Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"}
114114
and the [`mlabonne/guanaco-llama2-1k` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k){:target="_blank"}
115115
dataset.
116-
116+
117117
<div editor-title="examples/single-node-training/trl/amd/.dstack.yml">
118-
118+
119119
```yaml
120120
type: task
121121
name: trl-amd-llama31-train
122-
122+
123123
# Using RunPod's ROCm Docker image
124124
image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04
125125

126126
# Required environment variables
127127
env:
128128
- HF_TOKEN
129+
# Copy train.py script located next to the dstack configuration
130+
# to the working directory inside the run container
131+
files:
132+
- train.py
129133
# Commands of the task
130134
commands:
131135
- export PATH=/opt/conda/envs/py_3.10/bin:$PATH
@@ -140,25 +144,25 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
140144
- pip install peft
141145
- pip install transformers datasets huggingface-hub scipy
142146
- cd ..
143-
- python examples/single-node-training/trl/amd/train.py
144-
147+
- python train.py
148+
145149
# Uncomment to leverage spot instances
146150
#spot_policy: auto
147-
151+
148152
resources:
149153
gpu: MI300X
150154
disk: 150GB
151155
```
152-
156+
153157
</div>
154158

155159
=== "Axolotl"
156-
Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"}
160+
Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"}
157161
and the [tatsu-lab/alpaca :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/tatsu-lab/alpaca){:target="_blank"}
158162
dataset.
159-
163+
160164
<div editor-title="examples/single-node-training/axolotl/amd/.dstack.yml">
161-
165+
162166
```yaml
163167
type: task
164168
# The name is optional, if not specified, generated randomly
@@ -198,9 +202,9 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
198202
- make
199203
- pip install .
200204
- cd ..
201-
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
202-
--wandb-project "$WANDB_PROJECT"
203-
--wandb-name "$WANDB_NAME"
205+
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
206+
--wandb-project "$WANDB_PROJECT"
207+
--wandb-name "$WANDB_NAME"
204208
--hub-model-id "$HUB_MODEL_ID"
205209

206210
resources:
@@ -211,7 +215,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
211215

212216
Note, to support ROCm, we need to checkout to commit `d4f6c65`. This commit eliminates the need to manually modify the Axolotl source code to make xformers compatible with ROCm, as described in the [xformers workaround :material-arrow-top-right-thin:{ .external }](https://docs.axolotl.ai/docs/amd_hpc.html#apply-xformers-workaround). This installation approach is also followed for building Axolotl ROCm docker image. [(See Dockerfile) :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.
213217

214-
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
218+
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
215219
> You can find the tasks that build and upload the binaries
216220
> in [`examples/single-node-training/axolotl/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd/){:target="_blank"}.
217221

@@ -235,7 +239,7 @@ $ dstack apply -f examples/inference/vllm/amd/.dstack.yml
235239

236240
## Source code
237241

238-
The source-code of this example can be found in
242+
The source-code of this example can be found in
239243
[`examples/inference/tgi/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi/amd){:target="_blank"},
240244
[`examples/inference/vllm/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd){:target="_blank"},
241245
[`examples/single-node-training/axolotl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd){:target="_blank"} and

0 commit comments

Comments
 (0)