diff --git a/docs/blog/posts/amd-on-tensorwave.md b/docs/blog/posts/amd-on-tensorwave.md
index 52d153bed8..80a766b94d 100644
--- a/docs/blog/posts/amd-on-tensorwave.md
+++ b/docs/blog/posts/amd-on-tensorwave.md
@@ -1,20 +1,20 @@
 ---
 title: Using SSH fleets with TensorWave's private AMD cloud
 date: 2025-03-11
-description: "This tutorial walks you through how dstack can be used with TensorWave's private AMD cloud using SSH fleets."  
+description: "This tutorial walks you through how dstack can be used with TensorWave's private AMD cloud using SSH fleets."
 slug: amd-on-tensorwave
 image: https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png
 categories:
   - Case studies
 ---
 
-# Using SSH fleets with TensorWave's private AMD cloud 
+# Using SSH fleets with TensorWave's private AMD cloud
 
 Since last month, when we introduced support for private clouds and data centers, it has become easier to use `dstack`
 to orchestrate AI containers with any AI cloud vendor, whether they provide on-demand compute or reserved clusters.
 
 In this tutorial, we’ll walk you through how `dstack` can be used with
-[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using 
+[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using
 [SSH fleets](../../docs/concepts/fleets.md#ssh).
 
 <img src="https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png" width="630"/>
@@ -32,13 +32,12 @@ TensorWave dashboard.
 ## Creating a fleet
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), create a project repo folder and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), create a project folder.
     
     <div class="termy">
     
     ```shell
     $ mkdir tensorwave-demo && cd tensorwave-demo
-    $ dstack init
     ```
     
     </div>
@@ -79,9 +78,9 @@ $ dstack apply -f fleet.dstack.yml
 Provisioning...
 ---> 100%
 
- FLEET                INSTANCE  RESOURCES         STATUS     CREATED 
- my-tensorwave-fleet  0         8xMI300X (192GB)  0/8 busy   3 mins ago      
-                      1         8xMI300X (192GB)  0/8 busy   3 mins ago    
+ FLEET                INSTANCE  RESOURCES         STATUS     CREATED
+ my-tensorwave-fleet  0         8xMI300X (192GB)  0/8 busy   3 mins ago
+                      1         8xMI300X (192GB)  0/8 busy   3 mins ago
 
 ```
 
@@ -98,7 +97,7 @@ Once the fleet is created, you can use `dstack` to run workloads.
 
 A dev environment lets you access an instance through your desktop IDE.
 
-<div editor-title=".dstack.yml"> 
+<div editor-title=".dstack.yml">
 
 ```yaml
 type: dev-environment
@@ -137,9 +136,9 @@ Open the link to access the dev environment using your desktop IDE.
 
 A task allows you to schedule a job or run a web app. Tasks can be distributed and support port forwarding.
 
-Below is a distributed training task configuration: 
+Below is a distributed training task configuration:
 
-<div editor-title="train.dstack.yml"> 
+<div editor-title="train.dstack.yml">
 
 ```yaml
 type: task
@@ -175,7 +174,7 @@ Provisioning `train-distrib`...
 
 </div>
 
-`dstack` automatically runs the container on each node while passing 
+`dstack` automatically runs the container on each node while passing
 [system environment variables](../../docs/concepts/tasks.md#system-environment-variables)
 which you can use with `torchrun`, `accelerate`, or other distributed frameworks.
 
@@ -185,7 +184,7 @@ A service allows you to deploy a model or any web app as a scalable and secure e
 
 Create the following configuration file inside the repo:
 
-<div editor-title="deepseek.dstack.yml"> 
+<div editor-title="deepseek.dstack.yml">
 
 ```yaml
 type: service
@@ -196,7 +195,7 @@ env:
   - MODEL_ID=deepseek-ai/DeepSeek-R1
   - HSA_NO_SCRATCH_RECLAIM=1
 commands:
-  - python3 -m sglang.launch_server --model-path $MODEL_ID --port 8000 --tp 8 --trust-remote-code 
+  - python3 -m sglang.launch_server --model-path $MODEL_ID --port 8000 --tp 8 --trust-remote-code
 port: 8000
 model: deepseek-ai/DeepSeek-R1
 
@@ -221,7 +220,7 @@ Submit the run `deepseek-r1-sglang`? [y/n]: y
 Provisioning `deepseek-r1-sglang`...
 ---> 100%
 
-Service is published at: 
+Service is published at:
   http://localhost:3000/proxy/services/main/deepseek-r1-sglang/
 Model deepseek-ai/DeepSeek-R1 is published at:
   http://localhost:3000/proxy/models/main/
@@ -236,6 +235,6 @@ Want to see how it works? Check out the video below:
 <iframe width="750" height="520" src="https://www.youtube.com/embed/b1vAgm5fCfE?si=qw2gYHkMjERohdad&rel=0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
 !!! info "What's next?"
-    1. See [SSH fleets](../../docs/concepts/fleets.md#ssh) 
+    1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
     2. Read about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md)
     3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
diff --git a/examples/accelerators/amd/README.md b/examples/accelerators/amd/README.md
index 6036594304..d75841d150 100644
--- a/examples/accelerators/amd/README.md
+++ b/examples/accelerators/amd/README.md
@@ -1,22 +1,22 @@
 # AMD
 
 `dstack` supports running dev environments, tasks, and services on AMD GPUs.
-You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh) 
+You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh)
 with on-prem AMD GPUs or configuring a backend that offers AMD GPUs such as the `runpod` backend.
 
 ## Deployment
 
-Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys 
+Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
 Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/installation_amd){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/getting_started/amd-installation.html){:target="_blank"}.
 
 === "TGI"
-    
-    <div editor-title="examples/inference/tgi/amd/.dstack.yml"> 
-    
+
+    <div editor-title="examples/inference/tgi/amd/.dstack.yml">
+
     ```yaml
     type: service
     name: amd-service-tgi
-    
+
     # Using the official TGI's ROCm Docker image
     image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
 
@@ -30,26 +30,26 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
     port: 8000
     # Register the model
     model: meta-llama/Meta-Llama-3.1-70B-Instruct
-    
+
     # Uncomment to leverage spot instances
     #spot_policy: auto
-    
+
     resources:
       gpu: MI300X
       disk: 150GB
     ```
-    
+
     </div>
 
 
 === "vLLM"
 
-    <div editor-title="examples/inference/vllm/amd/.dstack.yml"> 
-    
+    <div editor-title="examples/inference/vllm/amd/.dstack.yml">
+
     ```yaml
     type: service
     name: llama31-service-vllm-amd
-    
+
     # Using RunPod's ROCm Docker image
     image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04
     # Required environment variables
@@ -84,10 +84,10 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
     port: 8000
     # Register the model
     model: meta-llama/Meta-Llama-3.1-70B-Instruct
-    
+
     # Uncomment to leverage spot instances
     #spot_policy: auto
-    
+
     resources:
       gpu: MI300X
       disk: 200GB
@@ -95,9 +95,9 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
     </div>
 
     Note, maximum size of vLLM’s `KV cache` is 126192, consequently we must set `MAX_MODEL_LEN` to 126192. Adding `/opt/conda/envs/py_3.10/bin` to PATH ensures we use the Python 3.10 environment necessary for the pre-built binaries compiled specifically for this version.
-    
-    > To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3. 
-    > You can find the task to build and upload the binary in 
+
+    > To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3.
+    > You can find the task to build and upload the binary in
     > [`examples/inference/vllm/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd/){:target="_blank"}.
 
 !!! info "Docker image"
@@ -110,22 +110,25 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
 
 === "TRL"
 
-    Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"} 
+    Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"}
     and the [`mlabonne/guanaco-llama2-1k` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k){:target="_blank"}
     dataset.
-    
+
     <div editor-title="examples/single-node-training/trl/amd/.dstack.yml">
-    
+
     ```yaml
     type: task
     name: trl-amd-llama31-train
-    
+
     # Using RunPod's ROCm Docker image
     image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04
 
     # Required environment variables
     env:
       - HF_TOKEN
+    # Mount files
+    files:
+      - train.py
     # Commands of the task
     commands:
       - export PATH=/opt/conda/envs/py_3.10/bin:$PATH
@@ -140,25 +143,25 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
       - pip install peft
       - pip install transformers datasets huggingface-hub scipy
       - cd ..
-      - python examples/single-node-training/trl/amd/train.py
-    
+      - python train.py
+
     # Uncomment to leverage spot instances
     #spot_policy: auto
-    
+
     resources:
       gpu: MI300X
       disk: 150GB
     ```
-    
+
     </div>
 
 === "Axolotl"
-    Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"} 
+    Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"}
     and the [tatsu-lab/alpaca :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/tatsu-lab/alpaca){:target="_blank"}
     dataset.
-    
+
     <div editor-title="examples/single-node-training/axolotl/amd/.dstack.yml">
-    
+
     ```yaml
     type: task
     # The name is optional, if not specified, generated randomly
@@ -198,9 +201,9 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
       - make
       - pip install .
       - cd ..
-      - accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml 
-              --wandb-project "$WANDB_PROJECT" 
-              --wandb-name "$WANDB_NAME" 
+      - accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
+              --wandb-project "$WANDB_PROJECT"
+              --wandb-name "$WANDB_NAME"
               --hub-model-id "$HUB_MODEL_ID"
 
     resources:
@@ -211,7 +214,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
 
     Note, to support ROCm, we need to checkout to commit `d4f6c65`. This commit eliminates the need to manually modify the Axolotl source code to make xformers compatible with ROCm, as described in the [xformers workaround :material-arrow-top-right-thin:{ .external }](https://docs.axolotl.ai/docs/amd_hpc.html#apply-xformers-workaround). This installation approach is also followed for building Axolotl ROCm docker image. [(See Dockerfile) :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.
 
-    > To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3. 
+    > To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
     > You can find the tasks that build and upload the binaries
     > in [`examples/single-node-training/axolotl/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd/){:target="_blank"}.
 
@@ -235,7 +238,7 @@ $ dstack apply -f examples/inference/vllm/amd/.dstack.yml
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/inference/tgi/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi/amd){:target="_blank"},
 [`examples/inference/vllm/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd){:target="_blank"},
 [`examples/single-node-training/axolotl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd){:target="_blank"} and
diff --git a/examples/accelerators/tpu/README.md b/examples/accelerators/tpu/README.md
index f5fd933138..2aa595099c 100644
--- a/examples/accelerators/tpu/README.md
+++ b/examples/accelerators/tpu/README.md
@@ -7,8 +7,8 @@ or request TPUs by specifying `tpu` as `vendor` ([see examples](https://dstack.a
 Below are a few examples on using TPUs for deployment and fine-tuning.
 
 !!! info "Multi-host TPUs"
-    Currently, `dstack` supports only single-host TPUs, which means that 
-    the maximum supported number of cores is `8` (e.g. `v2-8`, `v3-8`, `v5litepod-8`, `v5p-8`, `v6e-8`). 
+    Currently, `dstack` supports only single-host TPUs, which means that
+    the maximum supported number of cores is `8` (e.g. `v2-8`, `v3-8`, `v5litepod-8`, `v5p-8`, `v6e-8`).
     Multi-host TPU support is on the roadmap.
 
 !!! info "TPU storage"
@@ -18,18 +18,18 @@ Below are a few examples on using TPUs for deployment and fine-tuning.
 ## Deployment
 
 Many serving frameworks including vLLM and TGI have TPU support.
-Here's an example of a [service](https://dstack.ai/docs/services) that deploys Llama 3.1 8B using 
+Here's an example of a [service](https://dstack.ai/docs/services) that deploys Llama 3.1 8B using
 [Optimum TPU :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-tpu){:target="_blank"}
 and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm){:target="_blank"}.
 
 === "Optimum TPU"
 
-    <div editor-title="examples/inference/tgi/tpu/.dstack.yml"> 
-    
+    <div editor-title="examples/inference/tgi/tpu/.dstack.yml">
+
     ```yaml
     type: service
     name: llama31-service-optimum-tpu
-    
+
     image: dstackai/optimum-tpu:llama31
     env:
       - HF_TOKEN
@@ -41,7 +41,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
     port: 8000
     # Register the model
     model: meta-llama/Meta-Llama-3.1-8B-Instruct
-    
+
     resources:
       gpu: v5litepod-4
     ```
@@ -50,14 +50,14 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
     Note that for Optimum TPU `MAX_INPUT_TOKEN` is set to 4095 by default. We must also set `MAX_BATCH_PREFILL_TOKENS` to 4095.
 
     ??? info "Docker image"
-        The official Docker image `huggingface/optimum-tpu:latest` doesn’t support Llama 3.1-8B. 
-        We’ve created a custom image with the fix: `dstackai/optimum-tpu:llama31`. 
-        Once the [pull request :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-tpu/pull/92){:target="_blank"} is merged, 
+        The official Docker image `huggingface/optimum-tpu:latest` doesn’t support Llama 3.1-8B.
+        We’ve created a custom image with the fix: `dstackai/optimum-tpu:llama31`.
+        Once the [pull request :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-tpu/pull/92){:target="_blank"} is merged,
         the official Docker image can be used.
 
 === "vLLM"
-    <div editor-title="examples/inference/vllm/tpu/.dstack.yml"> 
-    
+    <div editor-title="examples/inference/vllm/tpu/.dstack.yml">
+
     ```yaml
     type: service
     name: llama31-service-vllm-tpu
@@ -79,17 +79,17 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
       - pip install -r requirements-tpu.txt
       - apt-get install -y libopenblas-base libopenmpi-dev libomp-dev
       - python setup.py develop
-      - vllm serve $MODEL_ID 
-          --tensor-parallel-size 4 
+      - vllm serve $MODEL_ID
+          --tensor-parallel-size 4
           --max-model-len $MAX_MODEL_LEN
           --port 8000
     port: 8000
     # Register the model
     model: meta-llama/Meta-Llama-3.1-8B-Instruct
-    
+
     # Uncomment to leverage spot instances
     #spot_policy: auto
-    
+
     resources:
       gpu: v5litepod-4
     ```
@@ -123,11 +123,11 @@ cloud resources and run the configuration.
 
 ## Fine-tuning with Optimum TPU
 
-Below is an example of fine-tuning Llama 3.1 8B using [Optimum TPU :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-tpu){:target="_blank"} 
+Below is an example of fine-tuning Llama 3.1 8B using [Optimum TPU :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-tpu){:target="_blank"}
 and the [`Abirate/english_quotes` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/Abirate/english_quotes){:target="_blank"}
 dataset.
 
-<div editor-title="examples/single-node-training/optimum-tpu/llama31/.dstack.yml"> 
+<div editor-title="examples/single-node-training/optimum-tpu/llama31/.dstack.yml">
 
 ```yaml
 type: task
@@ -136,11 +136,14 @@ name: optimum-tpu-llama-train
 python: "3.11"
 env:
   - HF_TOKEN
+files:
+  - train.py
+  - config.yaml
 commands:
   - git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git
   - mkdir -p optimum-tpu/examples/custom/
-  - cp examples/single-node-training/optimum-tpu/llama31/train.py optimum-tpu/examples/custom/train.py
-  - cp examples/single-node-training/optimum-tpu/llama31/config.yaml optimum-tpu/examples/custom/config.yaml
+  - cp train.py optimum-tpu/examples/custom/train.py
+  - cp config.yaml optimum-tpu/examples/custom/config.yaml
   - cd optimum-tpu
   - pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html
   - pip install datasets evaluate
@@ -178,7 +181,7 @@ Note, `v5litepod` is optimized for fine-tuning transformer-based models. Each co
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/inference/tgi/tpu` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi/tpu){:target="_blank"},
 [`examples/inference/vllm/tpu` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/tpu){:target="_blank"},
 and [`examples/single-node-training/optimum-tpu` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/trl){:target="_blank"}.
@@ -188,5 +191,5 @@ and [`examples/single-node-training/optimum-tpu` :material-arrow-top-right-thin:
 1. Browse [Optimum TPU :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-tpu),
    [Optimum TPU TGI :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/optimum-tpu/tree/main/text-generation-inference) and
    [vLLM :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/getting_started/tpu-installation.html).
-2. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), 
+2. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
    [services](https://dstack.ai/docs/services), and [fleets](https://dstack.ai/docs/concepts/fleets).
diff --git a/examples/distributed-training/axolotl/README.md b/examples/distributed-training/axolotl/README.md
index dd4b7cdb04..17efaf1e1a 100644
--- a/examples/distributed-training/axolotl/README.md
+++ b/examples/distributed-training/axolotl/README.md
@@ -3,14 +3,13 @@
 This example walks you through how to run distributed fine-tune using [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl){:target="_blank"} with `dstack`.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
     </div>
 
@@ -67,7 +66,7 @@ commands:
       --machine_rank=$DSTACK_NODE_RANK \
       --num_processes=$DSTACK_GPUS_NUM \
       --num_machines=$DSTACK_NODES_NUM
-  
+
 resources:
   gpu: 80GB:8
   shm_size: 128GB
@@ -93,10 +92,10 @@ $ WANDB_PROJECT=...
 $ HUB_MODEL_ID=...
 $ dstack apply -f examples/distributed-training/trl/fsdp.dstack.yml
 
- #  BACKEND       RESOURCES                       INSTANCE TYPE  PRICE       
- 1  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle 
- 2  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle  
-    
+ #  BACKEND       RESOURCES                       INSTANCE TYPE  PRICE
+ 1  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle
+ 2  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle
+
 Submit the run trl-train-fsdp-distrib? [y/n]: y
 
 Provisioning...
@@ -106,10 +105,10 @@ Provisioning...
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/distributed-training/axolotl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/distributed-training/axolotl).
 
 !!! info "What's next?"
     1. Read the [clusters](https://dstack.ai/docs/guides/clusters) guide
-    2. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), 
+    2. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks),
        [services](https://dstack.ai/docs/concepts/services), and [fleets](https://dstack.ai/docs/concepts/fleets)
diff --git a/examples/distributed-training/trl/README.md b/examples/distributed-training/trl/README.md
index 7ac67047e8..3e3977c89e 100644
--- a/examples/distributed-training/trl/README.md
+++ b/examples/distributed-training/trl/README.md
@@ -3,14 +3,13 @@
 This example walks you through how to run distributed fine-tune using [TRL :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/trl){:target="_blank"}, [Accelerate :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/accelerate){:target="_blank"} and [Deepspeed :material-arrow-top-right-thin:{ .external }](https://github.com/deepspeedai/DeepSpeed){:target="_blank"}.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
     </div>
 
@@ -41,7 +40,7 @@ Once the fleet is created, define a distributed task configuration. Here's an ex
       - WANDB_API_KEY
       - MODEL_ID=meta-llama/Llama-3.1-8B
       - HUB_MODEL_ID
-    
+
     commands:
       - pip install transformers bitsandbytes peft wandb
       - git clone https://github.com/huggingface/trl
@@ -98,7 +97,7 @@ Once the fleet is created, define a distributed task configuration. Here's an ex
       - HUB_MODEL_ID
       - MODEL_ID=meta-llama/Llama-3.1-8B
       - ACCELERATE_LOG_LEVEL=info
-    
+
     commands:
       - pip install transformers bitsandbytes peft wandb deepspeed
       - git clone https://github.com/huggingface/trl
@@ -153,10 +152,10 @@ $ WANDB_API_KEY=...
 $ HUB_MODEL_ID=...
 $ dstack apply -f examples/distributed-training/trl/fsdp.dstack.yml
 
- #  BACKEND       RESOURCES                       INSTANCE TYPE  PRICE       
- 1  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle 
- 2  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle  
-    
+ #  BACKEND       RESOURCES                       INSTANCE TYPE  PRICE
+ 1  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle
+ 2  ssh (remote)  cpu=208 mem=1772GB H100:80GB:8  instance       $0     idle
+
 Submit the run trl-train-fsdp-distrib? [y/n]: y
 
 Provisioning...
@@ -166,11 +165,10 @@ Provisioning...
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/distributed-training/trl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/distributed-training/trl){:target="_blank"}.
 
 !!! info "What's next?"
     1. Read the [clusters](https://dstack.ai/docs/guides/clusters) guide
-    2. Check [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), 
+    2. Check [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks),
        [services](https://dstack.ai/docs/concepts/services), and [fleets](https://dstack.ai/docs/concepts/fleets)
-    
diff --git a/examples/inference/nim/README.md b/examples/inference/nim/README.md
index ba68018000..fe520e36bd 100644
--- a/examples/inference/nim/README.md
+++ b/examples/inference/nim/README.md
@@ -3,19 +3,18 @@ title: NVIDIA NIM
 description: "This example shows how to deploy DeepSeek-R1-Distill-Llama-8B to any cloud or on-premises environment using NVIDIA NIM and dstack."
 ---
 
-# NVIDIA NIM 
+# NVIDIA NIM
 
 This example shows how to deploy DeepSeek-R1-Distill-Llama-8B using [NVIDIA NIM :material-arrow-top-right-thin:{ .external }](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html){:target="_blank"} and `dstack`.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -59,7 +58,7 @@ resources:
 
 ### Running a configuration
 
-To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command. 
+To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
 
 <div class="termy">
 
@@ -67,10 +66,10 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 $ NGC_API_KEY=...
 $ dstack apply -f examples/inference/nim/.dstack.yml
 
- #  BACKEND  REGION    RESOURCES                  SPOT  PRICE       
- 1  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199   
- 2  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199  
- 3  vultr    nrt       6xCPU, 60GB, 1xA100 (40GB) no    $1.199 
+ #  BACKEND  REGION    RESOURCES                  SPOT  PRICE
+ 1  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199
+ 2  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199
+ 3  vultr    nrt       6xCPU, 60GB, 1xA100 (40GB) no    $1.199
 
 Submit the run serve-distill-deepseek? [y/n]: y
 
@@ -79,7 +78,7 @@ Provisioning...
 ```
 </div>
 
-If no gateway is created, the model will be available via the OpenAI-compatible endpoint 
+If no gateway is created, the model will be available via the OpenAI-compatible endpoint
 at `<dstack server URL>/proxy/models/<project name>/`.
 
 <div class="termy">
@@ -107,12 +106,12 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
 is available at `https://gateway.<gateway domain>/`.
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/inference/nim` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/nim){:target="_blank"}.
 
 ## What's next?
diff --git a/examples/inference/sglang/README.md b/examples/inference/sglang/README.md
index f945e8db5d..f880ac30b7 100644
--- a/examples/inference/sglang/README.md
+++ b/examples/inference/sglang/README.md
@@ -3,14 +3,13 @@
 This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using [SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang){:target="_blank"} and `dstack`.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -19,7 +18,7 @@ This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using [SGL
 Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.
 
 === "AMD"
-    
+
     <div editor-title="examples/inference/sglang/amd/.dstack.yml">
 
     ```yaml
@@ -29,7 +28,7 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
     image: lmsysorg/sglang:v0.4.1.post4-rocm620
     env:
       - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
-        
+
     commands:
       - python3 -m sglang.launch_server
          --model-path $MODEL_ID
@@ -46,7 +45,7 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
     </div>
 
 === "NVIDIA"
-    
+
     <div editor-title="examples/inference/sglang/nvidia/.dstack.yml">
 
     ```yaml
@@ -56,7 +55,7 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
     image: lmsysorg/sglang:latest
     env:
       - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
-    
+
     commands:
       - python3 -m sglang.launch_server
          --model-path $MODEL_ID
@@ -81,9 +80,9 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 ```shell
 $ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
 
- #  BACKEND  REGION     RESOURCES                         SPOT  PRICE   
- 1  runpod   EU-RO-1   24xCPU, 283GB, 1xMI300X (192GB)    no    $2.49  
-    
+ #  BACKEND  REGION     RESOURCES                         SPOT  PRICE
+ 1  runpod   EU-RO-1   24xCPU, 283GB, 1xMI300X (192GB)    no    $2.49
+
 Submit the run deepseek-r1-amd? [y/n]: y
 
 Provisioning...
@@ -119,12 +118,12 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 ```
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
 is available at `https://gateway.<gateway domain>/`.
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/llms/deepseek/sglang` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/sglang){:target="_blank"}.
 
 ## What's next?
diff --git a/examples/inference/tgi/README.md b/examples/inference/tgi/README.md
index 938154c24e..8630473dd9 100644
--- a/examples/inference/tgi/README.md
+++ b/examples/inference/tgi/README.md
@@ -8,14 +8,13 @@ description: "This example shows how to deploy Llama 4 Scout to any cloud or on-
 This example shows how to deploy Llama 4 Scout with `dstack` using [HuggingFace TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/index){:target="_blank"}.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -40,7 +39,7 @@ env:
   # max_batch_prefill_tokens must be >= max_input_tokens
   - MAX_BATCH_PREFILL_TOKENS=8192
 commands:
-   # Activate the virtual environment at /usr/src/.venv/ 
+   # Activate the virtual environment at /usr/src/.venv/
    # as required by TGI's latest image.
    - . /usr/src/.venv/bin/activate
    - NUM_SHARD=$DSTACK_GPUS_NUM text-generation-launcher
@@ -64,7 +63,7 @@ resources:
 
 ### Running a configuration
 
-To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command. 
+To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
 
 <div class="termy">
 
@@ -72,9 +71,9 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 $ HF_TOKEN=...
 $ dstack apply -f examples/inference/tgi/.dstack.yml
 
- #  BACKEND  REGION     RESOURCES                      SPOT PRICE   
- 1  vastai   is-iceland 48xCPU, 128GB, 2xH200 (140GB)  no   $7.87   
- 2  runpod   EU-SE-1    40xCPU, 128GB, 2xH200 (140GB)  no   $7.98 
+ #  BACKEND  REGION     RESOURCES                      SPOT PRICE
+ 1  vastai   is-iceland 48xCPU, 128GB, 2xH200 (140GB)  no   $7.87
+ 2  runpod   EU-SE-1    40xCPU, 128GB, 2xH200 (140GB)  no   $7.98
 
 Submit the run llama4-scout? [y/n]: y
 
@@ -83,7 +82,7 @@ Provisioning...
 ```
 </div>
 
-If no gateway is created, the model will be available via the OpenAI-compatible endpoint 
+If no gateway is created, the model will be available via the OpenAI-compatible endpoint
 at `<dstack server URL>/proxy/models/<project name>/`.
 
 <div class="termy">
@@ -111,12 +110,12 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
 is available at `https://gateway.<gateway domain>/`.
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/inference/tgi` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi).
 
 ## What's next?
diff --git a/examples/inference/trtllm/README.md b/examples/inference/trtllm/README.md
index d84141a387..3d29ab0d91 100644
--- a/examples/inference/trtllm/README.md
+++ b/examples/inference/trtllm/README.md
@@ -9,14 +9,13 @@ This example shows how to deploy both DeepSeek R1 and its distilled version
 using [TensorRT-LLM :material-arrow-top-right-thin:{ .external }](https://github.com/NVIDIA/TensorRT-LLM){:target="_blank"} and `dstack`.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -72,8 +71,8 @@ To run it, pass the task configuration to `dstack apply`.
 ```shell
 $ dstack apply -f examples/inference/trtllm/build-image.dstack.yml
 
- #  BACKEND  REGION             RESOURCES               SPOT  PRICE       
- 1  cudo     ca-montreal-2      8xCPU, 25GB, (500.0GB)  yes   $0.1073   
+ #  BACKEND  REGION             RESOURCES               SPOT  PRICE
+ 1  cudo     ca-montreal-2      8xCPU, 25GB, (500.0GB)  yes   $0.1073
 
 Submit the run build-image? [y/n]: y
 
@@ -93,7 +92,7 @@ Below is the service configuration that deploys DeepSeek R1 using the built Tens
     name: serve-r1
 
     # Specify the image built with `examples/inference/trtllm/build-image.dstack.yml`
-    image: dstackai/tensorrt_llm:9b931c0f6305aefa3660e6fb84a76a42c0eef167 
+    image: dstackai/tensorrt_llm:9b931c0f6305aefa3660e6fb84a76a42c0eef167
     env:
       - MAX_BATCH_SIZE=256
       - MAX_NUM_TOKENS=16384
@@ -125,15 +124,15 @@ Below is the service configuration that deploys DeepSeek R1 using the built Tens
     </div>
 
 
-To run it, pass the configuration to `dstack apply`. 
+To run it, pass the configuration to `dstack apply`.
 
 <div class="termy">
 
 ```shell
 $ dstack apply -f examples/inference/trtllm/serve-r1.dstack.yml
 
- #  BACKEND  REGION             RESOURCES                        SPOT  PRICE       
- 1  vastai   is-iceland         192xCPU, 2063GB, 8xH200 (141GB)  yes   $25.62   
+ #  BACKEND  REGION             RESOURCES                        SPOT  PRICE
+ 1  vastai   is-iceland         192xCPU, 2063GB, 8xH200 (141GB)  yes   $25.62
 
 Submit the run serve-r1? [y/n]: y
 
@@ -149,7 +148,7 @@ To deploy DeepSeek R1 Distill Llama 8B, follow the steps below.
 
 #### Convert and upload checkpoints
 
-Here’s the task config that converts a Hugging Face model to a TensorRT-LLM checkpoint format 
+Here’s the task config that converts a Hugging Face model to a TensorRT-LLM checkpoint format
 and uploads it to S3 using the provided AWS credentials.
 
 <div editor-title="examples/inference/trtllm/convert-model.dstack.yml">
@@ -168,7 +167,7 @@ and uploads it to S3 using the provided AWS credentials.
       - AWS_DEFAULT_REGION
     commands:
       # nvcr.io/nvidia/tritonserver:25.01-trtllm-python-py3 container uses TensorRT-LLM version 0.17.0,
-      # therefore we are using branch v0.17.0 
+      # therefore we are using branch v0.17.0
       - git clone --branch v0.17.0 --depth 1 https://github.com/triton-inference-server/tensorrtllm_backend.git
       - git clone --branch v0.17.0 --single-branch https://github.com/NVIDIA/TensorRT-LLM.git
       - git clone https://github.com/triton-inference-server/server.git
@@ -192,15 +191,15 @@ and uploads it to S3 using the provided AWS credentials.
     </div>
 
 
-To run it, pass the configuration to `dstack apply`. 
+To run it, pass the configuration to `dstack apply`.
 
 <div class="termy">
 
 ```shell
 $ dstack apply -f examples/inference/trtllm/convert-model.dstack.yml
 
- #  BACKEND  REGION       RESOURCES                    SPOT  PRICE       
- 1  vastai   us-iowa      12xCPU, 85GB, 1xA100 (40GB)  yes   $0.66904  
+ #  BACKEND  REGION       RESOURCES                    SPOT  PRICE
+ 1  vastai   us-iowa      12xCPU, 85GB, 1xA100 (40GB)  yes   $0.66904
 
 Submit the run convert-model? [y/n]: y
 
@@ -228,7 +227,7 @@ Here’s the task config that builds a TensorRT-LLM model and uploads it to S3 w
         - AWS_SECRET_ACCESS_KEY
         - AWS_DEFAULT_REGION
         - MAX_SEQ_LEN=8192 # Sum of Max Input Length & Max Output Length
-        - MAX_INPUT_LEN=4096 
+        - MAX_INPUT_LEN=4096
         - MAX_BATCH_SIZE=256
         - TRITON_MAX_BATCH_SIZE=1
         - INSTANCE_COUNT=1
@@ -260,15 +259,15 @@ Here’s the task config that builds a TensorRT-LLM model and uploads it to S3 w
     ```
     </div>
 
-To run it, pass the configuration to `dstack apply`. 
+To run it, pass the configuration to `dstack apply`.
 
 <div class="termy">
 
 ```shell
 $ dstack apply -f examples/inference/trtllm/build-model.dstack.yml
 
- #  BACKEND  REGION       RESOURCES                    SPOT  PRICE       
- 1  vastai   us-iowa      12xCPU, 85GB, 1xA100 (40GB)  yes   $0.66904  
+ #  BACKEND  REGION       RESOURCES                    SPOT  PRICE
+ 1  vastai   us-iowa      12xCPU, 85GB, 1xA100 (40GB)  yes   $0.66904
 
 Submit the run build-model? [y/n]: y
 
@@ -302,25 +301,25 @@ Below is the service configuration that deploys DeepSeek R1 Distill Llama 8B.
       - ./aws/install
       - aws s3 sync s3://${S3_BUCKET_NAME}/tllm_engine_1gpu_bf16 ./tllm_engine_1gpu_bf16
       - git clone https://github.com/triton-inference-server/server.git
-      - python3 server/python/openai/openai_frontend/main.py --model-repository s3://${S3_BUCKET_NAME}/triton_model_repo  --tokenizer tokenizer_dir --openai-port 8000  
+      - python3 server/python/openai/openai_frontend/main.py --model-repository s3://${S3_BUCKET_NAME}/triton_model_repo  --tokenizer tokenizer_dir --openai-port 8000
     port: 8000
     model: ensemble
 
     resources:
       gpu: A100:40GB
- 
+
 ```
 </div>
 
-To run it, pass the configuration to `dstack apply`. 
+To run it, pass the configuration to `dstack apply`.
 
 <div class="termy">
 
 ```shell
 $ dstack apply -f examples/inference/trtllm/serve-distill.dstack.yml
 
- #  BACKEND  REGION       RESOURCES                    SPOT  PRICE       
- 1  vastai   us-iowa      12xCPU, 85GB, 1xA100 (40GB)  yes   $0.66904  
+ #  BACKEND  REGION       RESOURCES                    SPOT  PRICE
+ 1  vastai   us-iowa      12xCPU, 85GB, 1xA100 (40GB)  yes   $0.66904
 
 Submit the run serve-distill? [y/n]: y
 
@@ -331,7 +330,7 @@ Provisioning...
 
 ## Access the endpoint
 
-If no gateway is created, the model will be available via the OpenAI-compatible endpoint 
+If no gateway is created, the model will be available via the OpenAI-compatible endpoint
 at `<dstack server URL>/proxy/models/<project name>/`.
 
 <div class="termy">
@@ -360,12 +359,12 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
 is available at `https://gateway.<gateway domain>/`.
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/inference/trtllm` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/trtllm){:target="_blank"}.
 
 ## What's next?
diff --git a/examples/inference/vllm/README.md b/examples/inference/vllm/README.md
index 57c6758301..d646ea2874 100644
--- a/examples/inference/vllm/README.md
+++ b/examples/inference/vllm/README.md
@@ -7,14 +7,13 @@ description: "This example shows how to deploy Llama 3.1 to any cloud or on-prem
 This example shows how to deploy Llama 3.1 8B with `dstack` using [vLLM :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/){:target="_blank"}.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -60,14 +59,14 @@ resources:
 
 ### Running a configuration
 
-To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command. 
+To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
 
 <div class="termy">
 
 ```shell
 $ dstack apply -f examples/inference/vllm/.dstack.yml
 
- #  BACKEND  REGION    RESOURCES                    SPOT  PRICE     
+ #  BACKEND  REGION    RESOURCES                    SPOT  PRICE
  1  runpod   CA-MTL-1  18xCPU, 100GB, A5000:24GB    yes   $0.12
  2  runpod   EU-SE-1   18xCPU, 100GB, A5000:24GB    yes   $0.12
  3  gcp      us-west4  27xCPU, 150GB, A5000:24GB:2  yes   $0.23
@@ -79,7 +78,7 @@ Provisioning...
 ```
 </div>
 
-If no gateway is created, the model will be available via the OpenAI-compatible endpoint 
+If no gateway is created, the model will be available via the OpenAI-compatible endpoint
 at `<dstack server URL>/proxy/models/<project name>/`.
 
 <div class="termy">
@@ -107,12 +106,12 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
 is available at `https://gateway.<gateway domain>/`.
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/inference/vllm` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm).
 
 ## What's next?
diff --git a/examples/llms/deepseek/README.md b/examples/llms/deepseek/README.md
index b1390fd525..ac098fa70c 100644
--- a/examples/llms/deepseek/README.md
+++ b/examples/llms/deepseek/README.md
@@ -2,19 +2,18 @@
 
 This example walks you through how to deploy and
 train [Deepseek :material-arrow-top-right-thin:{ .external }](https://huggingface.co/deepseek-ai){:target="_blank"}
-models with `dstack`. 
+models with `dstack`.
 
 > We used Deepseek-R1 distilled models and Deepseek-V2-Lite, a 16B model with the same architecture as Deepseek-R1 (671B). Deepseek-V2-Lite retains MLA and DeepSeekMoE but requires less memory, making it ideal for testing and fine-tuning on smaller GPUs.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
     </div>
 
@@ -52,13 +51,13 @@ Here's an example of a service that deploys `Deepseek-R1-Distill-Llama-70B` usin
     </div>
 
 === "vLLM"
-    
+
     <div editor-title="examples/llms/deepseek/sglang/amd/.dstack.yml">
 
     ```yaml
     type: service
     name: deepseek-r1-amd
-    
+
     image: rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
     env:
       - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
@@ -68,7 +67,7 @@ Here's an example of a service that deploys `Deepseek-R1-Distill-Llama-70B` usin
         --max-model-len $MAX_MODEL_LEN
         --trust-remote-code
     port: 8000
-    
+
     model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
 
     resources:
@@ -83,7 +82,7 @@ Note, when using `Deepseek-R1-Distill-Llama-70B` with `vLLM` with a 192GB GPU, w
 
 Here's an example of a service that deploys `Deepseek-R1-Distill-Llama-70B`
 using [TGI on Gaudi :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/tgi-gaudi){:target="_blank"}
-and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/HabanaAI/vllm-fork){:target="_blank"} (Gaudi fork) with Intel Gaudi 2. 
+and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/HabanaAI/vllm-fork){:target="_blank"} (Gaudi fork) with Intel Gaudi 2.
 
 > Both [TGI on Gaudi :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/tgi-gaudi){:target="_blank"}
 > and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/HabanaAI/vllm-fork){:target="_blank"} do not support `Deepseek-V2-Lite`.
@@ -151,7 +150,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/Haban
     env:
       - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
       - HABANA_VISIBLE_DEVICES=all
-      - OMPI_MCA_btl_vader_single_copy_mechanism=none  
+      - OMPI_MCA_btl_vader_single_copy_mechanism=none
 
     commands:
       - git clone https://github.com/HabanaAI/vllm-fork.git
@@ -166,13 +165,13 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/Haban
 
     port: 8000
     ```
-    </div>  
+    </div>
 
 ### NVIDIA
 
 Here's an example of a service that deploys `Deepseek-R1-Distill-Llama-8B`
 using [SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang){:target="_blank"}
-and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm){:target="_blank"} with NVIDIA GPUs. 
+and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm){:target="_blank"} with NVIDIA GPUs.
 Both SGLang and vLLM also support `Deepseek-V2-Lite`.
 
 === "SGLang"
@@ -181,7 +180,7 @@ Both SGLang and vLLM also support `Deepseek-V2-Lite`.
     ```yaml
     type: service
     name: deepseek-r1-nvidia
-    
+
     image: lmsysorg/sglang:latest
     env:
       - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
@@ -190,10 +189,10 @@ Both SGLang and vLLM also support `Deepseek-V2-Lite`.
           --model-path $MODEL_ID
           --port 8000
           --trust-remote-code
-    
+
     port: 8000
     model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
-    
+
     resources:
       gpu: 24GB
     ```
@@ -205,17 +204,17 @@ Both SGLang and vLLM also support `Deepseek-V2-Lite`.
     ```yaml
     type: service
     name: deepseek-r1-nvidia
-    
+
     image: vllm/vllm-openai:latest
     env:
       - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
       - MAX_MODEL_LEN=4096
     commands:
       - vllm serve $MODEL_ID
-        --max-model-len $MAX_MODEL_LEN 
-    port: 8000 
+        --max-model-len $MAX_MODEL_LEN
+    port: 8000
     model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
-    
+
     resources:
       gpu: 24GB
     ```
@@ -253,9 +252,9 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 ```shell
 $ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml
 
- #  BACKEND  REGION     RESOURCES                         SPOT  PRICE   
- 1  runpod   EU-RO-1   24xCPU, 283GB, 1xMI300X (192GB)    no    $2.49  
-    
+ #  BACKEND  REGION     RESOURCES                         SPOT  PRICE
+ 1  runpod   EU-RO-1   24xCPU, 283GB, 1xMI300X (192GB)    no    $2.49
+
 Submit the run deepseek-r1-amd? [y/n]: y
 
 Provisioning...
@@ -291,7 +290,7 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 ```
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
 is available at `https://gateway.<gateway domain>/`.
 
 ## Fine-tuning
@@ -371,19 +370,21 @@ Here are the examples of LoRA fine-tuning of `Deepseek-V2-Lite` and GRPO fine-tu
     type: task
     name: trl-train-grpo
 
-    image: rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0 
+    image: rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0
 
     env:
       - WANDB_API_KEY
       - WANDB_PROJECT
       - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+    files:
+      - grpo_train.py
     commands:
       - pip install trl
       - pip install datasets
       # numPy version less than 2 is required for the scipy installation with AMD.
       - pip install "numpy<2"
       - mkdir -p grpo_example
-      - cp examples/llms/deepseek/trl/amd/grpo_train.py grpo_example/grpo_train.py
+      - cp grpo_train.py grpo_example/grpo_train.py
       - cd grpo_example
       - python grpo_train.py
         --model_name_or_path $MODEL_ID
@@ -529,43 +530,43 @@ on NVIDIA GPU using HuggingFace's [TRL :material-arrow-top-right-thin:{ .externa
       - pip install bitsandbytes
       - cd peft/examples/sft
       - python train.py
-        --seed 100 
-        --model_name_or_path "deepseek-ai/DeepSeek-V2-Lite" 
-        --dataset_name "smangrul/ultrachat-10k-chatml" 
-        --chat_template_format "chatml" 
-        --add_special_tokens False 
-        --append_concat_token False 
-        --splits "train,test" 
-        --max_seq_len 512 
-        --num_train_epochs 1 
-        --logging_steps 5 
-        --log_level "info" 
-        --logging_strategy "steps" 
-        --eval_strategy "epoch" 
-        --save_strategy "epoch" 
-        --hub_private_repo True 
-        --hub_strategy "every_save" 
-        --bf16 True 
-        --packing True 
-        --learning_rate 1e-4 
-        --lr_scheduler_type "cosine" 
-        --weight_decay 1e-4 
-        --warmup_ratio 0.0 
-        --max_grad_norm 1.0 
-        --output_dir "mistral-sft-lora" 
-        --per_device_train_batch_size 8 
-        --per_device_eval_batch_size 8 
-        --gradient_accumulation_steps 4 
-        --gradient_checkpointing True 
-        --use_reentrant True 
-        --dataset_text_field "content" 
-        --use_peft_lora True 
-        --lora_r 16 
-        --lora_alpha 16 
-        --lora_dropout 0.05 
-        --lora_target_modules "all-linear" 
-        --use_4bit_quantization True 
-        --use_nested_quant True 
+        --seed 100
+        --model_name_or_path "deepseek-ai/DeepSeek-V2-Lite"
+        --dataset_name "smangrul/ultrachat-10k-chatml"
+        --chat_template_format "chatml"
+        --add_special_tokens False
+        --append_concat_token False
+        --splits "train,test"
+        --max_seq_len 512
+        --num_train_epochs 1
+        --logging_steps 5
+        --log_level "info"
+        --logging_strategy "steps"
+        --eval_strategy "epoch"
+        --save_strategy "epoch"
+        --hub_private_repo True
+        --hub_strategy "every_save"
+        --bf16 True
+        --packing True
+        --learning_rate 1e-4
+        --lr_scheduler_type "cosine"
+        --weight_decay 1e-4
+        --warmup_ratio 0.0
+        --max_grad_norm 1.0
+        --output_dir "mistral-sft-lora"
+        --per_device_train_batch_size 8
+        --per_device_eval_batch_size 8
+        --gradient_accumulation_steps 4
+        --gradient_checkpointing True
+        --use_reentrant True
+        --dataset_text_field "content"
+        --use_peft_lora True
+        --lora_r 16
+        --lora_alpha 16
+        --lora_dropout 0.05
+        --lora_target_modules "all-linear"
+        --use_4bit_quantization True
+        --use_nested_quant True
         --bnb_4bit_compute_dtype "bfloat16"
 
     resources:
@@ -598,10 +599,9 @@ needs 7–10GB due to intermediate hidden states.
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/llms/deepseek` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek).
 
 !!! info "What's next?"
-    1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), 
+    1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
        [services](https://dstack.ai/docs/services), and [protips](https://dstack.ai/docs/protips).
-   
diff --git a/examples/llms/deepseek/trl/amd/grpo.dstack.yml b/examples/llms/deepseek/trl/amd/grpo.dstack.yml
index f866bb1ca0..c1a76e528b 100644
--- a/examples/llms/deepseek/trl/amd/grpo.dstack.yml
+++ b/examples/llms/deepseek/trl/amd/grpo.dstack.yml
@@ -9,15 +9,15 @@ env:
   - WANDB_API_KEY
   - WANDB_PROJECT
   - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+# Mount files
+files:
+  - grpo_train.py
 # Commands of the task
 commands:
   - pip install trl
   - pip install datasets
   # numpy version less than 2 is required for the scipy installation with AMD.
   - pip install "numpy<2"
-  - mkdir -p grpo_example
-  - cp examples/llms/deepseek/trl/amd/grpo_train.py grpo_example/grpo_train.py
-  - cd grpo_example
   - python grpo_train.py
     --model_name_or_path $MODEL_ID
     --dataset_name trl-lib/tldr
diff --git a/examples/llms/llama/README.md b/examples/llms/llama/README.md
index 7fe051c2f7..89e716d403 100644
--- a/examples/llms/llama/README.md
+++ b/examples/llms/llama/README.md
@@ -3,14 +3,13 @@
 This example walks you through how to deploy Llama 4 Scout model with `dstack`.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -18,9 +17,9 @@ This example walks you through how to deploy Llama 4 Scout model with `dstack`.
 ## Deployment
 
 ### AMD
-Here's an example of a service that deploys 
-[`Llama-4-Scout-17B-16E-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct){:target="_blank"} 
-using [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm){:target="_blank"} 
+Here's an example of a service that deploys
+[`Llama-4-Scout-17B-16E-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct){:target="_blank"}
+using [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm){:target="_blank"}
 with AMD `MI300X` GPUs.
 
 <div editor-title="examples/llms/llama/vllm/amd/.dstack.yml">
@@ -35,7 +34,7 @@ env:
   - MODEL_ID=meta-llama/Llama-4-Scout-17B-16E-Instruct
   - VLLM_WORKER_MULTIPROC_METHOD=spawn
   - VLLM_USE_MODELSCOPE=False
-  - VLLM_USE_TRITON_FLASH_ATTN=0 
+  - VLLM_USE_TRITON_FLASH_ATTN=0
   - MAX_MODEL_LEN=256000
 
 commands:
@@ -47,7 +46,7 @@ commands:
        --max-num-seqs 64 \
        --override-generation-config='{"attn_temperature_tuning": true}'
 
-   
+
 port: 8000
 # Register the model
 model: meta-llama/Llama-4-Scout-17B-16E-Instruct
@@ -59,9 +58,9 @@ resources:
 </div>
 
 ### NVIDIA
-Here's an example of a service that deploys 
-[`Llama-4-Scout-17B-16E-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct){:target="_blank"} 
-using [SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm){:target="_blank"} 
+Here's an example of a service that deploys
+[`Llama-4-Scout-17B-16E-Instruct` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct){:target="_blank"}
+using [SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm){:target="_blank"}
 with NVIDIA `H200` GPUs.
 
 === "SGLang"
@@ -96,7 +95,7 @@ with NVIDIA `H200` GPUs.
     </div>
 
 === "vLLM"
-    
+
     <div editor-title="examples/llms/llama/vllm/nvidia/.dstack.yml">
 
     ```yaml
@@ -128,12 +127,12 @@ with NVIDIA `H200` GPUs.
     </div>
 
 !!! info "NOTE:"
-    With vLLM, add `--override-generation-config='{"attn_temperature_tuning": true}'` to 
+    With vLLM, add `--override-generation-config='{"attn_temperature_tuning": true}'` to
     improve accuracy for [contexts longer than 32K tokens :material-arrow-top-right-thin:{ .external }](https://blog.vllm.ai/2025/04/05/llama4.html){:target="_blank"}.
 
 ### Memory requirements
 
-Below are the approximate memory requirements for loading the model. 
+Below are the approximate memory requirements for loading the model.
 This excludes memory for the model context and CUDA kernel reservations.
 
 | Model         | Size     | FP16   | FP8    | INT4   |
@@ -153,11 +152,11 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 $ HF_TOKEN=...
 $ dstack apply -f examples/llms/llama/sglang/nvidia/.dstack.yml
 
- #  BACKEND  REGION     RESOURCES                      SPOT PRICE   
- 1  vastai   is-iceland 48xCPU, 128GB, 2xH200 (140GB)  no   $7.87   
- 2  runpod   EU-SE-1    40xCPU, 128GB, 2xH200 (140GB)  no   $7.98  
+ #  BACKEND  REGION     RESOURCES                      SPOT PRICE
+ 1  vastai   is-iceland 48xCPU, 128GB, 2xH200 (140GB)  no   $7.87
+ 2  runpod   EU-SE-1    40xCPU, 128GB, 2xH200 (140GB)  no   $7.98
+
 
- 
 Submit the run llama4-scout? [y/n]: y
 
 Provisioning...
@@ -195,7 +194,7 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint
 is available at `https://<run name>.<gateway domain>/`.
 
 [//]: # (TODO: https://github.com/dstackai/dstack/issues/1777)
@@ -224,9 +223,9 @@ env:
 # Commands of the task
 commands:
   - wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
-  - axolotl train scout-qlora-fsdp1.yaml 
-            --wandb-project $WANDB_PROJECT 
-            --wandb-name $WANDB_NAME 
+  - axolotl train scout-qlora-fsdp1.yaml
+            --wandb-project $WANDB_PROJECT
+            --wandb-name $WANDB_NAME
             --hub-model-id $HUB_MODEL_ID
 
 resources:
@@ -242,7 +241,7 @@ The task uses Axolotl's Docker image, where Axolotl is already pre-installed.
 
 ### Memory requirements
 
-Below are the approximate memory requirements for loading the model. 
+Below are the approximate memory requirements for loading the model.
 This excludes memory for the model context and CUDA kernel reservations.
 
 | Model         | Size     | Full fine-tuning   | LoRA   | QLoRA  |
@@ -279,11 +278,11 @@ $ dstack apply -f examples/single-node-training/axolotl/.dstack.yml
 
 ## Source code
 
-The source-code for deployment examples can be found in 
+The source-code for deployment examples can be found in
 [`examples/llms/llama` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/llama) and the source-code for the finetuning example can be found in [`examples/single-node-training/axolotl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl){:target="_blank"}.
 
 ## What's next?
 
-1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), 
+1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
    [services](https://dstack.ai/docs/services), and [protips](https://dstack.ai/docs/protips).
 2. Browse [Llama 4 with SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang/blob/main/docs/references/llama4.md), [Llama 4 with vLLM :material-arrow-top-right-thin:{ .external }](https://blog.vllm.ai/2025/04/05/llama4.html), [Llama 4 with AMD :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/llama4-day-0-support/README.html) and [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/OpenAccess-AI-Collective/axolotl){:target="_blank"}.
diff --git a/examples/llms/llama31/README.md b/examples/llms/llama31/README.md
index 66bf686faf..bc07d74da9 100644
--- a/examples/llms/llama31/README.md
+++ b/examples/llms/llama31/README.md
@@ -3,14 +3,13 @@
 This example walks you through how to deploy and fine-tuning Llama 3.1 with `dstack`.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -22,12 +21,12 @@ Here's an example of a service that deploys Llama 3.1 8B using vLLM, TGI, and NI
 
 === "vLLM"
 
-    <div editor-title="examples/llms/llama31/vllm/.dstack.yml"> 
+    <div editor-title="examples/llms/llama31/vllm/.dstack.yml">
 
     ```yaml
     type: service
     name: llama31
-    
+
     python: "3.11"
     env:
       - HF_TOKEN
@@ -41,14 +40,14 @@ Here's an example of a service that deploys Llama 3.1 8B using vLLM, TGI, and NI
     port: 8000
     # Register the model
     model: meta-llama/Meta-Llama-3.1-8B-Instruct
-    
+
     # Uncomment to leverage spot instances
     #spot_policy: auto
 
     # Uncomment to cache downloaded models
     #volumes:
     #  - /root/.cache/huggingface/hub:/root/.cache/huggingface/hub
-    
+
     resources:
       gpu: 24GB
       # Uncomment if using multiple GPUs
@@ -59,12 +58,12 @@ Here's an example of a service that deploys Llama 3.1 8B using vLLM, TGI, and NI
 
 === "TGI"
 
-    <div editor-title="examples/llms/llama31/tgi/.dstack.yml"> 
+    <div editor-title="examples/llms/llama31/tgi/.dstack.yml">
 
     ```yaml
     type: service
     name: llama31
-    
+
     image: ghcr.io/huggingface/text-generation-inference:latest
     env:
       - HF_TOKEN
@@ -76,14 +75,14 @@ Here's an example of a service that deploys Llama 3.1 8B using vLLM, TGI, and NI
     port: 80
     # Register the model
     model: meta-llama/Meta-Llama-3.1-8B-Instruct
-    
+
     # Uncomment to leverage spot instances
     #spot_policy: auto
 
-    # Uncomment to cache downloaded models  
+    # Uncomment to cache downloaded models
     #volumes:
     #  - /data:/data
-    
+
     resources:
       gpu: 24GB
       # Uncomment if using multiple GPUs
@@ -94,12 +93,12 @@ Here's an example of a service that deploys Llama 3.1 8B using vLLM, TGI, and NI
 
 === "NIM"
 
-    <div editor-title="examples/llms/llama31/ollama/.dstack.yml"> 
+    <div editor-title="examples/llms/llama31/ollama/.dstack.yml">
 
     ```yaml
     type: service
     name: llama31
-    
+
     image: nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
     env:
       - NGC_API_KEY
@@ -110,14 +109,14 @@ Here's an example of a service that deploys Llama 3.1 8B using vLLM, TGI, and NI
     port: 8000
     # Register the model
     model: meta/llama-3.1-8b-instruct
-    
+
     # Uncomment to leverage spot instances
     #spot_policy: auto
-    
+
     # Cache downloaded models
     volumes:
       - /root/.cache/nim:/opt/nim/.cache
-    
+
     resources:
       gpu: 24GB
       # Uncomment if using multiple GPUs
@@ -130,7 +129,7 @@ Note, when using Llama 3.1 8B with a 24GB GPU, we must limit the context size to
 
 ### Memory requirements
 
-Below are the approximate memory requirements for loading the model. 
+Below are the approximate memory requirements for loading the model.
 This excludes memory for the model context and CUDA kernel reservations.
 
 | Model size | FP16  | FP8   | INT4  |
@@ -171,7 +170,7 @@ $ dstack apply -f examples/llms/llama31/vllm/.dstack.yml
  1  runpod   CA-MTL-1  18xCPU, 100GB, A5000:24GB    yes   $0.12
  2  runpod   EU-SE-1   18xCPU, 100GB, A5000:24GB    yes   $0.12
  3  gcp      us-west4  27xCPU, 150GB, A5000:24GB:2  yes   $0.23
- 
+
 Submit the run llama31? [y/n]: y
 
 Provisioning...
@@ -208,7 +207,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
 is available at `https://gateway.<gateway domain>/`.
 
 [//]: # (TODO: How to prompting and tool calling)
@@ -222,14 +221,14 @@ is available at `https://gateway.<gateway domain>/`.
 Below is the task configuration file of fine-tuning Llama 3.1 8B using TRL on the
 [`OpenAssistant/oasst_top1_2023-08-25` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/OpenAssistant/oasst_top1_2023-08-25) dataset.
 
-<div editor-title="examples/single-node-training/trl/train.dstack.yml"> 
+<div editor-title="examples/single-node-training/trl/train.dstack.yml">
 
 ```yaml
 type: task
 name: trl-train
 
 python: 3.12
-# Ensure nvcc is installed (req. for Flash Attention) 
+# Ensure nvcc is installed (req. for Flash Attention)
 nvcc: true
 env:
   - HF_TOKEN
@@ -245,7 +244,7 @@ commands:
   - pip install .
   - accelerate launch
     --config_file=examples/accelerate_configs/multi_gpu.yaml
-    --num_processes $DSTACK_GPUS_PER_NODE 
+    --num_processes $DSTACK_GPUS_PER_NODE
     examples/scripts/sft.py
     --model_name meta-llama/Meta-Llama-3.1-8B
     --dataset_name OpenAssistant/oasst_top1_2023-08-25
@@ -278,7 +277,7 @@ shm_size: 24GB
 
 </div>
 
-Change the `resources` property to specify more GPUs. 
+Change the `resources` property to specify more GPUs.
 
 ### Memory requirements
 
@@ -296,7 +295,7 @@ The requirements can be significantly reduced with certain optimizations.
 
 For more memory-efficient use of multiple GPUs, consider using DeepSpeed and ZeRO Stage 3.
 
-To do this, use the `examples/accelerate_configs/deepspeed_zero3.yaml` configuration file instead of 
+To do this, use the `examples/accelerate_configs/deepspeed_zero3.yaml` configuration file instead of
 `examples/accelerate_configs/multi_gpu.yaml`.
 
 ### Running on multiple nodes
@@ -304,7 +303,7 @@ To do this, use the `examples/accelerate_configs/deepspeed_zero3.yaml` configura
 In case the model doesn't feet into a single GPU, consider running a `dstack` task on multiple nodes.
 Below is the corresponding task configuration file.
 
-<div editor-title="examples/single-node-training/trl/train.dstack.yml"> 
+<div editor-title="examples/single-node-training/trl/train.dstack.yml">
 
 ```yaml
 type: task
@@ -314,7 +313,7 @@ name: trl-train-distrib
 nodes: 2
 
 python: "3.10"
-# Ensure nvcc is installed (req. for Flash Attention) 
+# Ensure nvcc is installed (req. for Flash Attention)
 nvcc: true
 
 env:
@@ -330,7 +329,7 @@ commands:
   - cd trl
   - pip install .
   - accelerate launch
-    --config_file=examples/accelerate_configs/fsdp_qlora.yaml 
+    --config_file=examples/accelerate_configs/fsdp_qlora.yaml
     --main_process_ip=$DSTACK_MASTER_NODE_IP
     --main_process_port=8008
     --machine_rank=$DSTACK_NODE_RANK
@@ -374,14 +373,14 @@ resources:
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/llms/llama31` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/llama31) and [`examples/single-node-training/trl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/trl).
 
 ## What's next?
 
-1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), 
+1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
    [services](https://dstack.ai/docs/services), and [protips](https://dstack.ai/docs/protips).
-2. Browse [Llama 3.1 on HuggingFace :material-arrow-top-right-thin:{ .external }](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f), 
-   [HuggingFace's Llama recipes :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/huggingface-llama-recipes), 
-   [Meta's Llama recipes :material-arrow-top-right-thin:{ .external }](https://github.com/meta-llama/llama-recipes) 
+2. Browse [Llama 3.1 on HuggingFace :material-arrow-top-right-thin:{ .external }](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f),
+   [HuggingFace's Llama recipes :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/huggingface-llama-recipes),
+   [Meta's Llama recipes :material-arrow-top-right-thin:{ .external }](https://github.com/meta-llama/llama-recipes)
    and [Llama Agentic System :material-arrow-top-right-thin:{ .external }](https://github.com/meta-llama/llama-agentic-system/).
diff --git a/examples/llms/llama32/README.md b/examples/llms/llama32/README.md
index 1a232eb7ba..484720ee25 100644
--- a/examples/llms/llama32/README.md
+++ b/examples/llms/llama32/README.md
@@ -3,14 +3,13 @@
 This example walks you through how to deploy Llama 3.2 vision model with `dstack` using `vLLM`.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -19,7 +18,7 @@ This example walks you through how to deploy Llama 3.2 vision model with `dstack
 
 Here's an example of a service that deploys Llama 3.2 11B using vLLM.
 
-<div editor-title="examples/llms/llama32/vllm/.dstack.yml"> 
+<div editor-title="examples/llms/llama32/vllm/.dstack.yml">
 
 ```yaml
 type: service
@@ -56,7 +55,7 @@ resources:
 
 ### Memory requirements
 
-Below are the approximate memory requirements for loading the model. 
+Below are the approximate memory requirements for loading the model.
 This excludes memory for the model context and CUDA kernel reservations.
 
 | Model size | FP16  |
@@ -76,12 +75,12 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 $ HF_TOKEN=...
 $ dstack apply -f examples/llms/llama32/vllm/.dstack.yml
 
- #  BACKEND  REGION     RESOURCES                    SPOT  PRICE   
- 1  runpod   CA-MTL-1   9xCPU, 50GB, 1xA40 (48GB)    yes   $0.24   
- 2  runpod   EU-SE-1    9xCPU, 50GB, 1xA40 (48GB)    yes   $0.24   
- 3  runpod   EU-SE-1    9xCPU, 50GB, 1xA6000 (48GB)  yes   $0.25   
+ #  BACKEND  REGION     RESOURCES                    SPOT  PRICE
+ 1  runpod   CA-MTL-1   9xCPU, 50GB, 1xA40 (48GB)    yes   $0.24
+ 2  runpod   EU-SE-1    9xCPU, 50GB, 1xA40 (48GB)    yes   $0.24
+ 3  runpod   EU-SE-1    9xCPU, 50GB, 1xA6000 (48GB)  yes   $0.25
+
 
- 
 Submit the run llama32? [y/n]: y
 
 Provisioning...
@@ -115,19 +114,19 @@ $ curl http://127.0.0.1:3000/proxy/services/main/llama32/v1/chat/completions \
 
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint 
+When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint
 is available at `https://<run name>.<gateway domain>/`.
 
 [//]: # (TODO: https://github.com/dstackai/dstack/issues/1777)
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/llms/llama32` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/llms/llama32).
 
 ## What's next?
 
-1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), 
+1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
    [services](https://dstack.ai/docs/services), and [protips](https://dstack.ai/docs/protips).
 2. Browse [Llama 3.2 on HuggingFace :material-arrow-top-right-thin:{ .external }](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf)
    and [LLama 3.2 on vLLM :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/models/supported_models.html#multimodal-language-models).
diff --git a/examples/misc/airflow/README.md b/examples/misc/airflow/README.md
index 13598d8e05..21687de743 100644
--- a/examples/misc/airflow/README.md
+++ b/examples/misc/airflow/README.md
@@ -29,8 +29,7 @@ def pipeline(...):
         return (
             f"source {DSTACK_VENV_PATH}/bin/activate"
             f" && cd {DSTACK_REPO_PATH}"
-            " && dstack init"
-            " && dstack apply -y -f task.dstack.yml"
+            " && dstack apply -y -f task.dstack.yml --repo ."
         )
 ```
 
@@ -78,5 +77,5 @@ def pipeline(...):
 
 ## Source code
 
-The source code for this example can be found in 
+The source code for this example can be found in
 [`examples/misc/airflow` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/misc/airflow).
diff --git a/examples/misc/airflow/dags/dstack_tasks.py b/examples/misc/airflow/dags/dstack_tasks.py
index 8328b83fc0..30741dbcbb 100644
--- a/examples/misc/airflow/dags/dstack_tasks.py
+++ b/examples/misc/airflow/dags/dstack_tasks.py
@@ -38,7 +38,7 @@ def dstack_cli_apply() -> str:
         dstack is installed into the main Airflow environment.
         NOT RECOMMENDED since dstack and Airflow may have conflicting dependencies.
         """
-        return f"cd {DSTACK_REPO_PATH} && dstack init && dstack apply -y -f task.dstack.yml"
+        return f"cd {DSTACK_REPO_PATH} && dstack apply -y -f task.dstack.yml --repo ."
 
     @task.bash
     def dstack_cli_apply_venv() -> str:
@@ -49,8 +49,7 @@ def dstack_cli_apply_venv() -> str:
         return (
             f"source {DSTACK_VENV_PATH}/bin/activate"
             f" && cd {DSTACK_REPO_PATH}"
-            " && dstack init"
-            " && dstack apply -y -f task.dstack.yml"
+            " && dstack apply -y -f task.dstack.yml --repo ."
         )
 
     @task.external_python(task_id="external_python", python=DSTACK_VENV_PYTHON_BINARY_PATH)
diff --git a/examples/misc/docker-compose/.dstack.yml b/examples/misc/docker-compose/.dstack.yml
index 2cf006bbf2..0967f72b81 100644
--- a/examples/misc/docker-compose/.dstack.yml
+++ b/examples/misc/docker-compose/.dstack.yml
@@ -6,6 +6,8 @@ env:
   - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct
   - HF_TOKEN
 ide: vscode
+files:
+  - compose.yaml
 
 # Uncomment to leverage spot instances
 #spot_policy: auto
diff --git a/examples/misc/docker-compose/README.md b/examples/misc/docker-compose/README.md
index dc2b4dec31..262f2abfdf 100644
--- a/examples/misc/docker-compose/README.md
+++ b/examples/misc/docker-compose/README.md
@@ -8,14 +8,13 @@ serving [Llama-3.2-3B-Instruct :material-arrow-top-right-thin:{ .external }](htt
 using [Docker Compose :material-arrow-top-right-thin:{ .external }](https://docs.docker.com/compose/){:target="_blank"}.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
@@ -26,31 +25,32 @@ using [Docker Compose :material-arrow-top-right-thin:{ .external }](https://docs
 
 === "`task.dstack.yml`"
 
-    <div editor-title="examples/misc/docker-compose/task.dstack.yml"> 
-    
+    <div editor-title="examples/misc/docker-compose/task.dstack.yml">
+
     ```yaml
     type: task
     name: chat-ui-task
-    
+
     docker: true
     env:
       - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct
       - HF_TOKEN
-    working_dir: examples/misc/docker-compose
+    files:
+      - compose.yaml
     commands:
       - docker compose up
     ports:
       - 9000
-    
+
     resources:
       gpu: "nvidia:24GB"
     ```
-    
+
     </div>
 
 === "`compose.yaml`"
 
-    <div editor-title="examples/misc/docker-compose/compose.yaml"> 
+    <div editor-title="examples/misc/docker-compose/compose.yaml">
 
     ```yaml
     services:
@@ -71,7 +71,7 @@ using [Docker Compose :material-arrow-top-right-thin:{ .external }](https://docs
         depends_on:
           - tgi
           - db
-    
+
       tgi:
         image: ghcr.io/huggingface/text-generation-inference:sha-704a58c
         volumes:
@@ -87,12 +87,12 @@ using [Docker Compose :material-arrow-top-right-thin:{ .external }](https://docs
                 - driver: nvidia
                   count: all
                   capabilities: [gpu]
-    
+
       db:
         image: mongo:latest
         volumes:
           - db_data:/data/db
-    
+
     volumes:
       tgi_data:
       db_data:
@@ -119,7 +119,7 @@ $ dstack apply -f examples/examples/misc/docker-compose/task.dstack.yml
  1  runpod   CA-MTL-1  18xCPU, 100GB, A5000:24GB    yes   $0.12
  2  runpod   EU-SE-1   18xCPU, 100GB, A5000:24GB    yes   $0.12
  3  gcp      us-west4  27xCPU, 150GB, A5000:24GB:2  yes   $0.23
- 
+
 Submit the run chat-ui-task? [y/n]: y
 
 Provisioning...
@@ -133,7 +133,7 @@ Provisioning...
 To persist data between runs, create a [volume](https://dstack.ai/docs/concepts/volumes/) and attach it to the run
 configuration.
 
-<div editor-title="examples/misc/docker-compose/task.dstack.yml"> 
+<div editor-title="examples/misc/docker-compose/task.dstack.yml">
 
 ```yaml
 type: task
@@ -144,7 +144,8 @@ image: dstackai/dind
 env:
   - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct
   - HF_TOKEN
-working_dir: examples/misc/docker-compose
+files:
+  - compose.yaml
 commands:
   - start-dockerd
   - docker compose up
@@ -170,10 +171,10 @@ be persisted.
 
 ## Source code
 
-The source-code of this example can be found in 
+The source-code of this example can be found in
 [`examples/misc/docker-compose` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/misc/docker-compose).
 
 ## What's next?
 
-1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), 
+1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
    [services](https://dstack.ai/docs/services), and [protips](https://dstack.ai/docs/protips).
diff --git a/examples/misc/docker-compose/service.dstack.yml b/examples/misc/docker-compose/service.dstack.yml
index 7234ce1b64..b33b900fbd 100644
--- a/examples/misc/docker-compose/service.dstack.yml
+++ b/examples/misc/docker-compose/service.dstack.yml
@@ -5,7 +5,8 @@ docker: true
 env:
   - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct
   - HF_TOKEN
-working_dir: examples/misc/docker-compose
+files:
+  - compose.yaml
 commands:
   - docker compose up
 port: 9000
diff --git a/examples/misc/docker-compose/task.dstack.yml b/examples/misc/docker-compose/task.dstack.yml
index 148b6a11dc..e7af43f383 100644
--- a/examples/misc/docker-compose/task.dstack.yml
+++ b/examples/misc/docker-compose/task.dstack.yml
@@ -5,7 +5,8 @@ docker: true
 env:
   - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct
   - HF_TOKEN
-working_dir: examples/misc/docker-compose
+files:
+  - compose.yaml
 commands:
   - docker compose up
 ports:
diff --git a/examples/single-node-training/axolotl/README.md b/examples/single-node-training/axolotl/README.md
index bceee1a488..e99be93de0 100644
--- a/examples/single-node-training/axolotl/README.md
+++ b/examples/single-node-training/axolotl/README.md
@@ -3,21 +3,20 @@
 This example shows how to use [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/OpenAccess-AI-Collective/axolotl){:target="_blank"} with `dstack` to fine-tune 4-bit Quantized `Llama-4-Scout-17B-16E` using SFT with FSDP and QLoRA.
 
 ??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
+    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
 
     <div class="termy">
  
     ```shell
     $ git clone https://github.com/dstackai/dstack
     $ cd dstack
-    $ dstack init
     ```
  
     </div>
 
 ## Define a configuration
 
-Axolotl reads the model, QLoRA, and dataset arguments, as well as trainer configuration from a [`scout-qlora-fsdp1.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-4/scout-qlora-fsdp1.yaml){:target="_blank"} file. The configuration uses 4-bit axolotl quantized version of `meta-llama/Llama-4-Scout-17B-16E`, requiring only ~43GB VRAM/GPU with 4K context length.
+Axolotl reads the model, QLoRA, and dataset arguments, as well as trainer configuration from a [`scout-qlora-flexattn-fsdp2.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-4/scout-qlora-flexattn-fsdp2.yaml){:target="_blank"} file. The configuration uses 4-bit axolotl quantized version of `meta-llama/Llama-4-Scout-17B-16E`, requiring only ~43GB VRAM/GPU with 4K context length.
 
 Below is a task configuration that does fine-tuning.
 
@@ -37,12 +36,11 @@ env:
   - WANDB_API_KEY
   - WANDB_PROJECT
   - HUB_MODEL_ID
-  - DSTACK_RUN_NAME
 # Commands of the task
 commands:
-  - wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
+  - wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-flexattn-fsdp2.yaml
   - |
-    axolotl train scout-qlora-fsdp1.yaml \
+    axolotl train scout-qlora-flexattn-fsdp2.yaml \
       --wandb-project $WANDB_PROJECT \
       --wandb-name $DSTACK_RUN_NAME \
       --hub-model-id $HUB_MODEL_ID
@@ -76,9 +74,9 @@ $ WANDB_PROJECT=...
 $ HUB_MODEL_ID=...
 $ dstack apply -f examples/single-node-training/axolotl/.dstack.yml
 
- #  BACKEND              RESOURCES                     INSTANCE TYPE  PRICE     
- 1  vastai (cz-czechia)  cpu=64 mem=128GB H100:80GB:2  18794506       $3.8907   
- 2  vastai (us-texas)    cpu=52 mem=64GB  H100:80GB:2  20442365       $3.6926   
+ #  BACKEND              RESOURCES                     INSTANCE TYPE  PRICE
+ 1  vastai (cz-czechia)  cpu=64 mem=128GB H100:80GB:2  18794506       $3.8907
+ 2  vastai (us-texas)    cpu=52 mem=64GB  H100:80GB:2  20442365       $3.6926
  3  vastai (fr-france)   cpu=64 mem=96GB  H100:80GB:2  20379984       $3.7389
 
 Submit the run axolotl-nvidia-llama-scout-train? [y/n]:
@@ -97,6 +95,6 @@ The source-code of this example can be found in
 ## What's next?
 
 1. Browse the [Axolotl distributed training](https://dstack.ai/docs/examples/distributed-training/axolotl) example
-2. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks), 
+2. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
    [services](https://dstack.ai/docs/services), [fleets](https://dstack.ai/docs/concepts/fleets)
 3. See the [AMD](https://dstack.ai/examples/accelerators/amd#axolotl) example
diff --git a/examples/single-node-training/axolotl/config.yaml b/examples/single-node-training/axolotl/config.yaml
deleted file mode 100644
index 57b7935f5b..0000000000
--- a/examples/single-node-training/axolotl/config.yaml
+++ /dev/null
@@ -1,82 +0,0 @@
-base_model: meta-llama/Meta-Llama-3-8B
-model_type: LlamaForCausalLM
-tokenizer_type: AutoTokenizer  # PreTrainedTokenizerFast
-
-load_in_8bit: false
-load_in_4bit: true
-strict: false
-
-datasets:
-  - path: tatsu-lab/alpaca
-    type: alpaca
-dataset_prepared_path: last_run_prepared
-val_set_size: 0.05
-output_dir: ./out/qlora-llama3-8B
-
-adapter: qlora
-lora_model_dir:
-
-sequence_len: 512
-sample_packing: false
-pad_to_sequence_len: true
-
-lora_r: 8
-lora_alpha: 16
-lora_dropout: 0.05
-lora_target_modules:
-lora_target_linear: true
-lora_fan_in_fan_out:
-
-wandb_project: dstack+axolotl
-wandb_entity:
-wandb_watch:
-wandb_name: llama3-8b-fp16-fsdp+qlora
-wandb_log_model:
-
-gradient_accumulation_steps: 4
-micro_batch_size: 1
-num_epochs: 4
-optimizer: adamw_torch
-lr_scheduler: cosine
-learning_rate: 0.00001
-
-train_on_inputs: false
-group_by_length: false
-bf16: auto
-fp16:
-tf32: false
-
-gradient_checkpointing: true
-gradient_checkpointing_kwargs:
-  use_reentrant: true
-early_stopping_patience:
-resume_from_checkpoint:
-local_rank:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-
-warmup_steps: 10
-evals_per_epoch: 4
-eval_table_size:
-saves_per_epoch: 1
-debug:
-deepspeed:
-weight_decay: 0.0
-fsdp:
-  - full_shard
-  - auto_wrap
-fsdp_config:
-  fsdp_limit_all_gathers: true
-  fsdp_sync_module_states: true
-  fsdp_offload_params: true
-  fsdp_use_orig_params: false
-  fsdp_cpu_ram_efficient_loading: true
-  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
-  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
-  fsdp_state_dict_type: FULL_STATE_DICT
-  fsdp_sharding_strategy: FULL_SHARD
-special_tokens:
-  pad_token: <|end_of_text|>
-
-hub_model_id: peterschmidt85/axolotl_llama3_8b_fsdp_qlora
diff --git a/examples/single-node-training/optimum-tpu/llama31/.dstack.yml b/examples/single-node-training/optimum-tpu/llama31/.dstack.yml
index 482840c209..c93862e678 100644
--- a/examples/single-node-training/optimum-tpu/llama31/.dstack.yml
+++ b/examples/single-node-training/optimum-tpu/llama31/.dstack.yml
@@ -8,12 +8,17 @@ python: "3.11"
 env:
   - HF_TOKEN
 
+# Mount files
+files:
+  - train.py
+  - config.yaml
+
 # Commands of the task
 commands:
   - git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git
   - mkdir -p optimum-tpu/examples/custom/
-  - cp examples/single-node-training/optimum-tpu/llama31/train.py optimum-tpu/examples/custom/train.py
-  - cp examples/single-node-training/optimum-tpu/llama31/config.yaml optimum-tpu/examples/custom/config.yaml
+  - cp train.py optimum-tpu/examples/custom/train.py
+  - cp config.yaml optimum-tpu/examples/custom/config.yaml
   - cd optimum-tpu
   - pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html
   - pip install datasets evaluate
diff --git a/examples/single-node-training/qlora/.dstack.yml b/examples/single-node-training/qlora/.dstack.yml
index 090912e49f..7d87f630aa 100644
--- a/examples/single-node-training/qlora/.dstack.yml
+++ b/examples/single-node-training/qlora/.dstack.yml
@@ -6,10 +6,14 @@ env:
   - HF_TOKEN
   - HF_HUB_ENABLE_HF_TRANSFER=1
 
+files:
+  - requirements.txt
+  - train.py
+
 commands:
-  - pip install -r examples/single-node-training/qlora/requirements.txt
+  - pip install -r requirements.txt
   - tensorboard --logdir results/runs &
-  - python examples/single-node-training/qlora/train.py --merge_and_push ${{ run.args }}
+  - python train.py --merge_and_push ${{ run.args }}
 ports:
   - 6006
 
diff --git a/examples/single-node-training/trl/amd/.dstack.yml b/examples/single-node-training/trl/amd/.dstack.yml
index ecc3845f9a..8e6baad788 100644
--- a/examples/single-node-training/trl/amd/.dstack.yml
+++ b/examples/single-node-training/trl/amd/.dstack.yml
@@ -9,6 +9,9 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04
 env:
   - HF_TOKEN
 
+files:
+  - train.py
+
 commands:
   - export PATH=/opt/conda/envs/py_3.10/bin:$PATH
   - git clone https://github.com/ROCm/bitsandbytes
@@ -22,7 +25,7 @@ commands:
   - pip install peft
   - pip install transformers datasets huggingface-hub scipy
   - cd ..
-  - python examples/single-node-training/trl/amd/train.py
+  - python train.py
 
 # Uncomment to leverage spot instances
 #spot_policy: auto