You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
227
+
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
240
228
2. Read about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md)
Copy file name to clipboardExpand all lines: examples/accelerators/amd/README.md
+37-33Lines changed: 37 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,22 @@
1
1
# AMD
2
2
3
3
`dstack` supports running dev environments, tasks, and services on AMD GPUs.
4
-
You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh)
4
+
You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh)
5
5
with on-prem AMD GPUs or configuring a backend that offers AMD GPUs such as the `runpod` backend.
6
6
7
7
## Deployment
8
8
9
-
Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
9
+
Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
10
10
Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/installation_amd){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/getting_started/amd-installation.html){:target="_blank"}.
@@ -84,20 +84,20 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
84
84
port: 8000
85
85
# Register the model
86
86
model: meta-llama/Meta-Llama-3.1-70B-Instruct
87
-
87
+
88
88
# Uncomment to leverage spot instances
89
89
#spot_policy: auto
90
-
90
+
91
91
resources:
92
92
gpu: MI300X
93
93
disk: 200GB
94
94
```
95
95
</div>
96
96
97
97
Note, maximum size of vLLM’s `KV cache` is 126192, consequently we must set `MAX_MODEL_LEN` to 126192. Adding `/opt/conda/envs/py_3.10/bin` to PATH ensures we use the Python 3.10 environment necessary for the pre-built binaries compiled specifically for this version.
98
-
99
-
> To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3.
100
-
> You can find the task to build and upload the binary in
98
+
99
+
> To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3.
100
+
> You can find the task to build and upload the binary in
@@ -110,22 +110,26 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
110
110
111
111
=== "TRL"
112
112
113
-
Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"}
113
+
Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"}
114
114
and the [`mlabonne/guanaco-llama2-1k` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k){:target="_blank"}
Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"}
160
+
Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"}
157
161
and the [tatsu-lab/alpaca :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/tatsu-lab/alpaca){:target="_blank"}
@@ -211,7 +215,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
211
215
212
216
Note, to support ROCm, we need to checkout to commit `d4f6c65`. This commit eliminates the need to manually modify the Axolotl source code to make xformers compatible with ROCm, as described in the [xformers workaround :material-arrow-top-right-thin:{ .external }](https://docs.axolotl.ai/docs/amd_hpc.html#apply-xformers-workaround). This installation approach is also followed for building Axolotl ROCm docker image. [(See Dockerfile) :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.
213
217
214
-
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
218
+
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
215
219
> You can find the tasks that build and upload the binaries
216
220
> in [`examples/single-node-training/axolotl/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd/){:target="_blank"}.
[`examples/single-node-training/axolotl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd){:target="_blank"} and
0 commit comments