Skip to content

Commit 05a37a9

Browse files
authored
Merge pull request #2291 from Bihan/add_deepseek_and_intel_examples
Add Deepseek and Intel Examples
2 parents ae0835e + 4702770 commit 05a37a9

File tree

27 files changed

+1437
-8
lines changed

27 files changed

+1437
-8
lines changed

docs/assets/stylesheets/extra.css

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@
2424
}
2525
}
2626

27-
[dir=ltr] .md-typeset :is(.admonition,details) pre, [dir=ltr] .md-typeset :is(.admonition,details) :is(.admonition,details) {
27+
[dir=ltr] .md-typeset :is(.admonition,details) pre,
28+
[dir=ltr] .md-typeset :is(.admonition,details) :is(.admonition,details, .termy) {
2829
margin-left: 32px;
2930
}
3031

docs/examples.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,18 @@ hide:
8383
</p>
8484
</a>
8585

86+
<a href="/examples/accelerators/intel"
87+
class="feature-cell sky">
88+
<h3>
89+
Intel Gaudi
90+
</h3>
91+
92+
<p>
93+
Deploy and fine-tune LLMs on AMD
94+
</p>
95+
</a>
96+
97+
8698
<a href="/examples/accelerators/tpu"
8799
class="feature-cell sky">
88100
<h3>
@@ -98,6 +110,16 @@ hide:
98110
## LLMs
99111

100112
<div class="tx-landing__highlights_grid">
113+
<a href="/examples/llms/deepseek"
114+
class="feature-cell sky">
115+
<h3>
116+
Deepseek
117+
</h3>
118+
119+
<p>
120+
Deploy and train Deepseek models
121+
</p>
122+
</a>
101123
<a href="/examples/llms/llama31"
102124
class="feature-cell sky">
103125
<h3>

docs/examples/accelerators/intel/index.md

Whitespace-only changes.

docs/examples/llms/deepseek/index.md

Whitespace-only changes.
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Intel Gaudi
2+
3+
`dstack` supports running dev environments, tasks, and services on Intel Gaudi GPUs via
4+
[SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh).
5+
6+
## Deployment
7+
8+
Serving frameworks like vLLM and TGI have Intel Gaudi support. Here's an example of
9+
a service that deploys
10+
[`DeepSeek-R1-Distill-Llama-70B` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B){:target="_blank"}
11+
using [TGI on Gaudi :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/tgi-gaudi){:target="_blank"}
12+
and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/HabanaAI/vllm-fork){:target="_blank"}.
13+
14+
=== "TGI"
15+
<div editor-title="examples/deployment/tgi/intel/.dstack.yml">
16+
17+
```yaml
18+
type: service
19+
name: tgi
20+
21+
image: ghcr.io/huggingface/tgi-gaudi:2.3.1
22+
env:
23+
- HF_TOKEN
24+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
25+
- PORT=8000
26+
- OMPI_MCA_btl_vader_single_copy_mechanism=none
27+
- TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN=true
28+
- PT_HPU_ENABLE_LAZY_COLLECTIVES=true
29+
- MAX_TOTAL_TOKENS=2048
30+
- BATCH_BUCKET_SIZE=256
31+
- PREFILL_BATCH_BUCKET_SIZE=4
32+
- PAD_SEQUENCE_TO_MULTIPLE_OF=64
33+
- ENABLE_HPU_GRAPH=true
34+
- LIMIT_HPU_GRAPH=true
35+
- USE_FLASH_ATTENTION=true
36+
- FLASH_ATTENTION_RECOMPUTE=true
37+
commands:
38+
- text-generation-launcher
39+
--sharded true
40+
--num-shard $DSTACK_GPUS_NUM
41+
--max-input-length 1024
42+
--max-total-tokens 2048
43+
--max-batch-prefill-tokens 4096
44+
--max-batch-total-tokens 524288
45+
--max-waiting-tokens 7
46+
--waiting-served-ratio 1.2
47+
--max-concurrent-requests 512
48+
port: 8000
49+
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
50+
51+
resources:
52+
gpu: gaudi2:8
53+
54+
# Uncomment to cache downloaded models
55+
#volumes:
56+
# - /root/.cache/huggingface/hub:/root/.cache/huggingface/hub
57+
```
58+
59+
</div>
60+
61+
=== "vLLM"
62+
63+
<div editor-title="examples/deployment/vllm/intel/.dstack.yml">
64+
65+
```yaml
66+
type: service
67+
name: deepseek-r1-gaudi
68+
69+
image: vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
70+
env:
71+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
72+
- HABANA_VISIBLE_DEVICES=all
73+
- OMPI_MCA_btl_vader_single_copy_mechanism=none
74+
commands:
75+
- git clone https://github.com/HabanaAI/vllm-fork.git
76+
- cd vllm-fork
77+
- git checkout habana_main
78+
- pip install -r requirements-hpu.txt
79+
- python setup.py develop
80+
- vllm serve $MODEL_ID
81+
--tensor-parallel-size 8
82+
--trust-remote-code
83+
--download-dir /data
84+
port: 8000
85+
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
86+
87+
88+
resources:
89+
gpu: gaudi2:8
90+
91+
# Uncomment to cache downloaded models
92+
#volumes:
93+
# - /root/.cache/huggingface/hub:/root/.cache/huggingface/hub
94+
```
95+
</div>
96+
97+
98+
## Fine-tuning
99+
100+
Below is an example of LoRA fine-tuning of [`DeepSeek-R1-Distill-Qwen-7B` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B){:target="_blank"}
101+
using [Optimum for Intel Gaudi :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-habana){:target="_blank"}
102+
and [DeepSpeed :material-arrow-top-right-thin:{ .external }](https://docs.habana.ai/en/latest/PyTorch/DeepSpeed/DeepSpeed_User_Guide/DeepSpeed_User_Guide.html#deepspeed-user-guide){:target="_blank"} with
103+
the [`lvwerra/stack-exchange-paired` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/lvwerra/stack-exchange-paired){:target="_blank"} dataset.
104+
105+
<div editor-title="examples/fine-tuning/trl/intel/.dstack.yml">
106+
107+
```yaml
108+
type: task
109+
name: trl-train
110+
111+
image: vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0
112+
env:
113+
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
114+
- WANDB_API_KEY
115+
- WANDB_PROJECT
116+
commands:
117+
- pip install --upgrade-strategy eager optimum[habana]
118+
- pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0
119+
- git clone https://github.com/huggingface/optimum-habana.git
120+
- cd optimum-habana/examples/trl
121+
- pip install -r requirements.txt
122+
- pip install wandb
123+
- DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 python ../gaudi_spawn.py --world_size $DSTACK_GPUS_NUM --use_deepspeed sft.py
124+
--model_name_or_path $MODEL_ID
125+
--dataset_name "lvwerra/stack-exchange-paired"
126+
--deepspeed ../language-modeling/llama2_ds_zero3_config.json
127+
--output_dir="./sft"
128+
--do_train
129+
--max_steps=500
130+
--logging_steps=10
131+
--save_steps=100
132+
--per_device_train_batch_size=1
133+
--per_device_eval_batch_size=1
134+
--gradient_accumulation_steps=2
135+
--learning_rate=1e-4
136+
--lr_scheduler_type="cosine"
137+
--warmup_steps=100
138+
--weight_decay=0.05
139+
--optim="paged_adamw_32bit"
140+
--lora_target_modules "q_proj" "v_proj"
141+
--bf16
142+
--remove_unused_columns=False
143+
--run_name="sft_deepseek_70"
144+
--report_to="wandb"
145+
--use_habana
146+
--use_lazy_mode
147+
148+
resources:
149+
gpu: gaudi2:8
150+
```
151+
152+
</div>
153+
154+
To finetune `DeepSeek-R1-Distill-Llama-70B` with eight Gaudi 2,
155+
you can partially offload parameters to CPU memory using the Deepspeed configuration file.
156+
For more details, refer to [parameter offloading](https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeedzerooffloadparamconfig).
157+
158+
## Applying a configuration
159+
160+
Once the configuration is ready, run `dstack apply -f <configuration file>`.
161+
162+
<div class="termy">
163+
164+
```shell
165+
$ dstack apply -f examples/deployment/vllm/.dstack.yml
166+
167+
# BACKEND REGION RESOURCES SPOT PRICE
168+
1 ssh remote 152xCPU,1007GB,8xGaudi2:96GB yes $0 idle
169+
170+
Submit a new run? [y/n]: y
171+
172+
Provisioning...
173+
---> 100%
174+
```
175+
176+
</div>
177+
178+
## Source code
179+
180+
The source-code of this example can be found in
181+
[`examples/llms/deepseek/tgi/intel` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/tgi/intel){:target="_blank"},
182+
[`examples/llms/deepseek/vllm/intel` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/vllm/intel){:target="_blank"} and
183+
[`examples/llms/deepseek/trl/intel` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/trl/intel){:target="_blank"}.
184+
185+
!!! info "What's next?"
186+
1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), and [services](https://dstack.ai/docs/services).
187+
2. See also [Intel Gaudi Documentation :material-arrow-top-right-thin:{ .external }](https://docs.habana.ai/en/latest/index.html), [vLLM Inference with Gaudi](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/vLLM_Inference.html)
188+
and [Optimum for Gaudi examples :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-habana/blob/main/examples/trl/README.md).
Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
type: service
2-
name: llama31
2+
name: qwen-nim
33

4-
image: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
4+
image: nvcr.io/nim/qwen/qwen-2.5-7b-instruct:latest
55
env:
66
- NGC_API_KEY
77
- NIM_MAX_MODEL_LEN=4096
@@ -10,16 +10,18 @@ registry_auth:
1010
password: ${{ env.NGC_API_KEY }}
1111
port: 8000
1212
# Register the model
13-
model: meta/llama-3.1-8b-instruct
13+
model: qwen/qwen-2.5-7b-instruct
1414

1515
# Uncomment to leverage spot instances
1616
#spot_policy: auto
1717

1818
# Cache downloaded models
1919
volumes:
20-
- /root/.cache/nim:/opt/nim/.cache
20+
- instance_path: /root/.cache/nim
21+
path: /opt/nim/.cache
22+
optional: true
2123

2224
resources:
2325
gpu: 24GB
2426
# Uncomment if using multiple GPUs
25-
#shm_size: 24GB
27+
shm_size: 16GB

0 commit comments

Comments
 (0)