|
| 1 | +# Intel Gaudi |
| 2 | + |
| 3 | +`dstack` supports running dev environments, tasks, and services on Intel Gaudi GPUs via |
| 4 | +[SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh). |
| 5 | + |
| 6 | +## Deployment |
| 7 | + |
| 8 | +Serving frameworks like vLLM and TGI have Intel Gaudi support. Here's an example of |
| 9 | +a service that deploys |
| 10 | +[`DeepSeek-R1-Distill-Llama-70B` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B){:target="_blank"} |
| 11 | +using [TGI on Gaudi :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/tgi-gaudi){:target="_blank"} |
| 12 | +and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/HabanaAI/vllm-fork){:target="_blank"}. |
| 13 | + |
| 14 | +=== "TGI" |
| 15 | + <div editor-title="examples/deployment/tgi/intel/.dstack.yml"> |
| 16 | + |
| 17 | + ```yaml |
| 18 | + type: service |
| 19 | + name: tgi |
| 20 | + |
| 21 | + image: ghcr.io/huggingface/tgi-gaudi:2.3.1 |
| 22 | + env: |
| 23 | + - HF_TOKEN |
| 24 | + - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
| 25 | + - PORT=8000 |
| 26 | + - OMPI_MCA_btl_vader_single_copy_mechanism=none |
| 27 | + - TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN=true |
| 28 | + - PT_HPU_ENABLE_LAZY_COLLECTIVES=true |
| 29 | + - MAX_TOTAL_TOKENS=2048 |
| 30 | + - BATCH_BUCKET_SIZE=256 |
| 31 | + - PREFILL_BATCH_BUCKET_SIZE=4 |
| 32 | + - PAD_SEQUENCE_TO_MULTIPLE_OF=64 |
| 33 | + - ENABLE_HPU_GRAPH=true |
| 34 | + - LIMIT_HPU_GRAPH=true |
| 35 | + - USE_FLASH_ATTENTION=true |
| 36 | + - FLASH_ATTENTION_RECOMPUTE=true |
| 37 | + commands: |
| 38 | + - text-generation-launcher |
| 39 | + --sharded true |
| 40 | + --num-shard $DSTACK_GPUS_NUM |
| 41 | + --max-input-length 1024 |
| 42 | + --max-total-tokens 2048 |
| 43 | + --max-batch-prefill-tokens 4096 |
| 44 | + --max-batch-total-tokens 524288 |
| 45 | + --max-waiting-tokens 7 |
| 46 | + --waiting-served-ratio 1.2 |
| 47 | + --max-concurrent-requests 512 |
| 48 | + port: 8000 |
| 49 | + model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
| 50 | + |
| 51 | + resources: |
| 52 | + gpu: gaudi2:8 |
| 53 | + |
| 54 | + # Uncomment to cache downloaded models |
| 55 | + #volumes: |
| 56 | + # - /root/.cache/huggingface/hub:/root/.cache/huggingface/hub |
| 57 | + ``` |
| 58 | + |
| 59 | + </div> |
| 60 | + |
| 61 | +=== "vLLM" |
| 62 | + |
| 63 | + <div editor-title="examples/deployment/vllm/intel/.dstack.yml"> |
| 64 | + |
| 65 | + ```yaml |
| 66 | + type: service |
| 67 | + name: deepseek-r1-gaudi |
| 68 | + |
| 69 | + image: vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest |
| 70 | + env: |
| 71 | + - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
| 72 | + - HABANA_VISIBLE_DEVICES=all |
| 73 | + - OMPI_MCA_btl_vader_single_copy_mechanism=none |
| 74 | + commands: |
| 75 | + - git clone https://github.com/HabanaAI/vllm-fork.git |
| 76 | + - cd vllm-fork |
| 77 | + - git checkout habana_main |
| 78 | + - pip install -r requirements-hpu.txt |
| 79 | + - python setup.py develop |
| 80 | + - vllm serve $MODEL_ID |
| 81 | + --tensor-parallel-size 8 |
| 82 | + --trust-remote-code |
| 83 | + --download-dir /data |
| 84 | + port: 8000 |
| 85 | + model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
| 86 | + |
| 87 | + |
| 88 | + resources: |
| 89 | + gpu: gaudi2:8 |
| 90 | + |
| 91 | + # Uncomment to cache downloaded models |
| 92 | + #volumes: |
| 93 | + # - /root/.cache/huggingface/hub:/root/.cache/huggingface/hub |
| 94 | + ``` |
| 95 | + </div> |
| 96 | + |
| 97 | + |
| 98 | +## Fine-tuning |
| 99 | + |
| 100 | +Below is an example of LoRA fine-tuning of [`DeepSeek-R1-Distill-Qwen-7B` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B){:target="_blank"} |
| 101 | +using [Optimum for Intel Gaudi :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-habana){:target="_blank"} |
| 102 | +and [DeepSpeed :material-arrow-top-right-thin:{ .external }](https://docs.habana.ai/en/latest/PyTorch/DeepSpeed/DeepSpeed_User_Guide/DeepSpeed_User_Guide.html#deepspeed-user-guide){:target="_blank"} with |
| 103 | +the [`lvwerra/stack-exchange-paired` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/lvwerra/stack-exchange-paired){:target="_blank"} dataset. |
| 104 | + |
| 105 | +<div editor-title="examples/fine-tuning/trl/intel/.dstack.yml"> |
| 106 | + |
| 107 | +```yaml |
| 108 | +type: task |
| 109 | +name: trl-train |
| 110 | + |
| 111 | +image: vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0 |
| 112 | +env: |
| 113 | + - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| 114 | + - WANDB_API_KEY |
| 115 | + - WANDB_PROJECT |
| 116 | +commands: |
| 117 | + - pip install --upgrade-strategy eager optimum[habana] |
| 118 | + - pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0 |
| 119 | + - git clone https://github.com/huggingface/optimum-habana.git |
| 120 | + - cd optimum-habana/examples/trl |
| 121 | + - pip install -r requirements.txt |
| 122 | + - pip install wandb |
| 123 | + - DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 python ../gaudi_spawn.py --world_size $DSTACK_GPUS_NUM --use_deepspeed sft.py |
| 124 | + --model_name_or_path $MODEL_ID |
| 125 | + --dataset_name "lvwerra/stack-exchange-paired" |
| 126 | + --deepspeed ../language-modeling/llama2_ds_zero3_config.json |
| 127 | + --output_dir="./sft" |
| 128 | + --do_train |
| 129 | + --max_steps=500 |
| 130 | + --logging_steps=10 |
| 131 | + --save_steps=100 |
| 132 | + --per_device_train_batch_size=1 |
| 133 | + --per_device_eval_batch_size=1 |
| 134 | + --gradient_accumulation_steps=2 |
| 135 | + --learning_rate=1e-4 |
| 136 | + --lr_scheduler_type="cosine" |
| 137 | + --warmup_steps=100 |
| 138 | + --weight_decay=0.05 |
| 139 | + --optim="paged_adamw_32bit" |
| 140 | + --lora_target_modules "q_proj" "v_proj" |
| 141 | + --bf16 |
| 142 | + --remove_unused_columns=False |
| 143 | + --run_name="sft_deepseek_70" |
| 144 | + --report_to="wandb" |
| 145 | + --use_habana |
| 146 | + --use_lazy_mode |
| 147 | + |
| 148 | +resources: |
| 149 | + gpu: gaudi2:8 |
| 150 | +``` |
| 151 | +
|
| 152 | +</div> |
| 153 | +
|
| 154 | +To finetune `DeepSeek-R1-Distill-Llama-70B` with eight Gaudi 2, |
| 155 | +you can partially offload parameters to CPU memory using the Deepspeed configuration file. |
| 156 | +For more details, refer to [parameter offloading](https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeedzerooffloadparamconfig). |
| 157 | + |
| 158 | +## Applying a configuration |
| 159 | + |
| 160 | +Once the configuration is ready, run `dstack apply -f <configuration file>`. |
| 161 | + |
| 162 | +<div class="termy"> |
| 163 | + |
| 164 | +```shell |
| 165 | +$ dstack apply -f examples/deployment/vllm/.dstack.yml |
| 166 | +
|
| 167 | + # BACKEND REGION RESOURCES SPOT PRICE |
| 168 | + 1 ssh remote 152xCPU,1007GB,8xGaudi2:96GB yes $0 idle |
| 169 | +
|
| 170 | +Submit a new run? [y/n]: y |
| 171 | +
|
| 172 | +Provisioning... |
| 173 | +---> 100% |
| 174 | +``` |
| 175 | + |
| 176 | +</div> |
| 177 | + |
| 178 | +## Source code |
| 179 | + |
| 180 | +The source-code of this example can be found in |
| 181 | +[`examples/llms/deepseek/tgi/intel` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/tgi/intel){:target="_blank"}, |
| 182 | +[`examples/llms/deepseek/vllm/intel` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/vllm/intel){:target="_blank"} and |
| 183 | +[`examples/llms/deepseek/trl/intel` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/trl/intel){:target="_blank"}. |
| 184 | + |
| 185 | +!!! info "What's next?" |
| 186 | + 1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), and [services](https://dstack.ai/docs/services). |
| 187 | + 2. See also [Intel Gaudi Documentation :material-arrow-top-right-thin:{ .external }](https://docs.habana.ai/en/latest/index.html), [vLLM Inference with Gaudi](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/vLLM_Inference.html) |
| 188 | + and [Optimum for Gaudi examples :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-habana/blob/main/examples/trl/README.md). |
0 commit comments