You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/blog/posts/amd-mi300x-inference-benchmark.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ categories:
10
10
11
11
# Benchmarking Llama 3.1 405B on 8x AMD MI300X GPUs
12
12
13
-
At `dstack`, we've been adding support for AMD GPUs with [SSH fleets](../../docs/concepts/fleets.md#ssh),
13
+
At `dstack`, we've been adding support for AMD GPUs with [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets),
14
14
so we saw this as a great chance to test our integration by benchmarking AMD GPUs. Our friends at
15
15
[Hot Aisle :material-arrow-top-right-thin:{ .external }](https://hotaisle.xyz/){:target="_blank"}, who build top-tier
16
16
bare metal compute for AMD GPUs, kindly provided the hardware for the benchmark.
@@ -34,7 +34,7 @@ Here is the spec of the bare metal machine we got:
34
34
??? info "Set up an SSH fleet"
35
35
36
36
Hot Aisle provided us with SSH access to the machine. To make it accessible via `dstack`,
37
-
we created an [SSH fleet](../../docs/concepts/fleets.md#ssh) using the following configuration:
37
+
we created an [SSH fleet](../../docs/concepts/fleets.md#ssh-fleets) using the following configuration:
38
38
39
39
<div editor-title="hotaisle.dstack.yml">
40
40
@@ -215,7 +215,7 @@ If you have questions, feedback, or want to help improve the benchmark, please r
215
215
is the primary sponsor of this benchmark, and we are sincerely grateful for their hardware and support.
216
216
217
217
If you'd like to use top-tier bare metal compute with AMD GPUs, we recommend going
218
-
with Hot Aisle. Once you gain access to a cluster, it can be easily accessed via `dstack`'s [SSH fleet](../../docs/concepts/fleets.md#ssh) easily.
218
+
with Hot Aisle. Once you gain access to a cluster, it can be easily accessed via `dstack`'s [SSH fleet](../../docs/concepts/fleets.md#ssh-fleets) easily.
219
219
220
220
### RunPod
221
221
If you’d like to use on-demand compute with AMD GPUs at affordable prices, you can configure `dstack` to
Copy file name to clipboardExpand all lines: docs/docs/concepts/backends.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,9 +9,8 @@ They can be configured via `~/.dstack/server/config.yml` or through the [project
9
9
*[Container-based](#container-based) – use either `dstack`'s native integration with cloud providers or Kubernetes to orchestrate container-based runs; provisioning in this case is delegated to the cloud provider or Kubernetes.
10
10
*[On-prem](#on-prem) – use `dstack`'s native support for on-prem servers without needing Kubernetes.
11
11
12
-
??? info "dstack Sky"
13
-
If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
14
-
you can either configure your own backends or use the pre-configured backend that gives you access to compute from the GPU marketplace.
12
+
!!! info "dstack Sky"
13
+
If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}, backend configuration is optional. dstack Sky lets you use pre-configured backends to access GPU marketplace.
Copy file name to clipboardExpand all lines: docs/docs/concepts/fleets.md
+44-53Lines changed: 44 additions & 53 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,36 +4,41 @@ Fleets act both as pools of instances and as templates for how those instances a
4
4
5
5
`dstack` supports two kinds of fleets:
6
6
7
-
*[Standard fleets](#standard) – dynamically provisioned through configured backends; they are supported with any type of backends: [VM-based](backends.md#vm-based), [container-based](backends.md#container-based), and [Kubernetes](backends.md#kubernetes)
7
+
*[Backend fleets](#backend) – dynamically provisioned through configured backends; they are supported with any type of backends: [VM-based](backends.md#vm-based) and [container-based](backends.md#container-based) (incl. [`kubernetes`](backends.md#kubernetes))
8
8
*[SSH fleets](#ssh) – created using on-prem servers; do not require backends
9
9
10
-
## Standard fleets { #standard }
10
+
When you run `dstack apply` to start a dev environment, task, or service, `dstack` will reuse idle instances from an existing fleet whenever available.
11
11
12
-
When you run `dstack apply` to start a dev environment, task, or service, `dstack` will reuse idle instances
13
-
from an existing fleet whenever available.
12
+
## Backend fleets { #backend-fleets }
14
13
15
-
If no fleet meets the requirements or has idle capacity, `dstack` can create a new fleet on the fly.
16
-
However, it’s generally better to define fleets explicitly in configuration files for greater control.
14
+
If you configured [backends](backends.md), `dstack` can provision fleets on the fly.
15
+
However, it’s recommended to define fleets explicitly.
17
16
18
17
### Apply a configuration
19
18
20
-
Define a fleet configuration as a YAML file in your project directory. The file must have a
19
+
To create a backend fleet, define a configuration as a YAML file in your project directory. The file must have a
21
20
`.dstack.yml` extension (e.g. `.dstack.yml` or `fleet.dstack.yml`).
my-fleet 0 gcp (europe-west-1) L4:24GB (spot) $0.1624 idle 3 mins ago
53
-
1 gcp (europe-west-1) L4:24GB (spot) $0.1624 idle 3 mins ago
56
+
FLEET INSTANCE BACKEND GPU PRICE STATUS CREATED
57
+
my-fleet - - - - - -
54
58
```
55
59
56
60
</div>
57
61
58
-
Once the status of instances changes to `idle`, they can be used by dev environments, tasks, and services.
62
+
`dstack` always keeps the minimum number of nodes provisioned. Additional instances, up to the maximum limit, are provisioned on demand.
59
63
60
-
??? info "Container-based backends"
61
-
[Container-based](backends.md#container-based) backends don’t support pre-provisioning,
62
-
so `nodes` can only be set to a range starting with `0`.
63
-
64
-
This means instances are created only when a run starts, and once it finishes, they’re terminated and released back to the provider (either a cloud service or Kubernetes).
64
+
!!! info "Container-based backends"
65
+
For [container-based](backends.md#container-based) backends (such as `kubernetes`, `runpod`, etc), `nodes` must be defined as a range starting with `0`. In these cases, instances are provisioned on demand as needed.
65
66
66
-
<div editor-title=".dstack.yml">
67
+
<!-- TODO: Ensure the user sees the error or warning otherwise -->
67
68
68
-
```yaml
69
-
type: fleet
70
-
# The name is optional, if not specified, generated randomly
71
-
name: my-fleet
72
-
73
-
# Specify the number of instances
74
-
nodes: 0..2
75
-
# Uncomment to ensure instances are inter-connected
76
-
#placement: cluster
77
-
78
-
resources:
79
-
gpu: 24GB
80
-
```
69
+
??? info "Target number of nodes"
81
70
82
-
</div>
71
+
If `nodes` is defined as a range, you can start with more than the minimum number of instances by using the `target` parameter when creating the fleet.
83
72
84
-
### Configuration options
73
+
<div editor-title=".dstack.yml">
85
74
86
-
#### Nodes { #nodes }
75
+
```yaml
76
+
type: fleet
87
77
88
-
The `nodes` property controls how many instances to provision and maintain in the fleet:
78
+
name: my-fleet
89
79
90
-
<diveditor-title=".dstack.yml">
80
+
nodes:
81
+
min: 0
82
+
max: 2
91
83
92
-
```yaml
93
-
type: fleet
84
+
# Provision 2 instances initially
85
+
target: 2
94
86
95
-
name: my-fleet
87
+
# Deprovision instances above the minimum if they remain idle
88
+
idle_duration: 1h
89
+
```
96
90
97
-
nodes:
98
-
min: 1# Always maintain at least 1 idle instance. Can be 0.
max: 3# (Optional) Do not allow more than 3 instances
101
-
```
91
+
</div>
102
92
103
-
</div>
93
+
By default, when you submit a [dev environment](dev-environments.md), [task](tasks.md), or [service](services.md), `dstack` tries all available fleets. However, you can explicitly specify the [`fleets`](../reference/dstack.yml/dev-environment.md#fleets) in your run configuration
94
+
or via [`--fleet`](../reference/cli/dstack/apply.md#fleet) with `dstack apply`.
104
95
105
-
`dstack` ensures the fleet always has at least `nodes.min` instances, creating new instances in the background if necessary. If you don't need to keep instances in the fleet forever, you can set `nodes.min` to `0`. By default, `dstack apply` also provisions `nodes.min` instances. The `nodes.target` property allows provisioning more instances initially than needs to be maintained.
96
+
### Configuration options
106
97
107
-
#### Placement { #standard-placement }
98
+
#### Placement { #backend-placement }
108
99
109
100
To ensure instances are interconnected (e.g., for
110
101
[distributed tasks](tasks.md#distributed-tasks)), set `placement` to `cluster`.
@@ -190,9 +181,9 @@ and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10
190
181
> If you’re unsure which offers (hardware configurations) are available from the configured backends, use the
191
182
> [`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command to list them.
192
183
193
-
#### Blocks { #standard-blocks }
184
+
#### Blocks { #backend-blocks }
194
185
195
-
For standard fleets, `blocks` function the same way as in SSH fleets.
186
+
For backend fleets, `blocks` function the same way as in SSH fleets.
196
187
See the [`Blocks`](#ssh-blocks) section under SSH fleets for details on the blocks concept.
197
188
198
189
<div editor-title=".dstack.yml">
@@ -272,13 +263,13 @@ retry:
272
263
</div>
273
264
274
265
!!! info "Reference"
275
-
Standard fleets support many more configuration options,
266
+
Backend fleets support many more configuration options,
Copy file name to clipboardExpand all lines: docs/docs/quickstart.md
+15-5Lines changed: 15 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ $ mkdir quickstart && cd quickstart
16
16
17
17
## Create a fleet
18
18
19
-
Before submitting runs, you need to create a fleet where new instances will be provisioned.
19
+
If [backends](concepts/backends.md) are configured, `dstack` can create a new [backend fleet](concepts/fleets.md#backend-fleets) on the fly. However, it’s recommended to create them explicitly.
20
20
21
21
<h3>Define a configuration</h3>
22
22
@@ -28,7 +28,15 @@ Create the following fleet configuration inside your project folder:
28
28
type: fleet
29
29
name: default
30
30
31
-
nodes: 0..
31
+
# Allow to provision of up to 2 instances
32
+
nodes: 0..2
33
+
34
+
# Deprovision instances above the minimum if they remain idle
35
+
idle_duration: 1h
36
+
37
+
resources:
38
+
# Allow to provision up to 8 GPUs
39
+
gpu: 0..8
32
40
```
33
41
34
42
</div>
@@ -55,13 +63,15 @@ Create the fleet? [y/n]: y
55
63
56
64
</div>
57
65
66
+
Alternatively, you can create an [SSH fleet](concepts/fleets#ssh-fleets).
67
+
58
68
## Submit your first run
59
69
60
70
`dstack`supports three types of run configurations.
61
71
62
72
=== "Dev environment"
63
73
64
-
A dev environment lets you provision an instance and access it with your desktop IDE.
74
+
A [dev environment](concepts/dev-environments.md) lets you provision an instance and access it with your desktop IDE.
65
75
66
76
<h3>Define a configuration</h3>
67
77
@@ -117,7 +127,7 @@ Create the fleet? [y/n]: y
117
127
118
128
=== "Task"
119
129
120
-
A task allows you to schedule a job or run a web app. Tasks can be distributed and can forward ports.
130
+
A [task](concepts/tasks.md) allows you to schedule a job or run a web app. Tasks can be distributed and can forward ports.
121
131
122
132
<h3>Define a configuration</h3>
123
133
@@ -181,7 +191,7 @@ Create the fleet? [y/n]: y
181
191
182
192
=== "Service"
183
193
184
-
A service allows you to deploy a model or any web app as an endpoint.
194
+
A [service](concepts/services.md) allows you to deploy a model or any web app as an endpoint.
Copy file name to clipboardExpand all lines: examples/clusters/nccl-tests/README.md
+5-23Lines changed: 5 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,27 +1,9 @@
1
1
# NCCL tests
2
2
3
-
This example shows how to run distributed [NCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/NVIDIA/nccl-tests){:target="_blank"} with MPI using `dstack`.
3
+
This example shows how to run [NCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/NVIDIA/nccl-tests){:target="_blank"} on a cluster using [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks).
4
4
5
-
??? info "Fleet"
6
-
Before running NCCL tests, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example:
> For more details on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
5
+
!!! info "Prerequisites"
6
+
Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#backend-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
25
7
26
8
## Running as a task
27
9
@@ -97,5 +79,5 @@ The source-code of this example can be found in
Copy file name to clipboardExpand all lines: examples/clusters/rccl-tests/README.md
+3-20Lines changed: 3 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,26 +1,9 @@
1
1
# RCCL tests
2
2
3
-
This example shows how to run distributed [RCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rccl-tests){:target="_blank"} with MPI using `dstack`.
3
+
This example shows how to run distributed [RCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rccl-tests){:target="_blank"} using [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks).
4
4
5
-
??? info "Fleet"
6
-
Before running RCCL tests, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example:
> For more details on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
5
+
!!! info "Prerequisites"
6
+
Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#backend-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
Copy file name to clipboardExpand all lines: examples/distributed-training/axolotl/README.md
+3-32Lines changed: 3 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,38 +1,9 @@
1
1
# Axolotl
2
2
3
-
This example walks you through how to run distributed fine-tune using [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl){:target="_blank"} with `dstack`.
3
+
This example walks you through how to run distributed fine-tune using [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl){:target="_blank"} and [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks).
4
4
5
-
??? info "Prerequisites"
6
-
Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
7
-
8
-
<div class="termy">
9
-
10
-
```shell
11
-
$ git clone https://github.com/dstackai/dstack
12
-
$ cd dstack
13
-
```
14
-
</div>
15
-
16
-
??? info "Fleet"
17
-
Before submitting distributed training runs, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example:
> For more details on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
5
+
!!! info "Prerequisites"
6
+
Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#backend-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
0 commit comments