Skip to content

Commit f966126

Browse files
[Docs] Improve fleet documentation to reflect fleet-first UX changes
1 parent 3c3b846 commit f966126

File tree

12 files changed

+82
-193
lines changed

12 files changed

+82
-193
lines changed

docker/server/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Configuration is updated at ~/.dstack/config.yml
3939
## Create SSH fleets
4040

4141
If you want the `dstack` server to run containers on your on-prem servers,
42-
use [fleets](https://dstack.ai/docs/concepts/fleets#ssh).
42+
use [fleets](https://dstack.ai/docs/concepts/fleets#ssh-fleets).
4343

4444
## More information
4545

docs/blog/posts/amd-mi300x-inference-benchmark.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ categories:
1010

1111
# Benchmarking Llama 3.1 405B on 8x AMD MI300X GPUs
1212

13-
At `dstack`, we've been adding support for AMD GPUs with [SSH fleets](../../docs/concepts/fleets.md#ssh),
13+
At `dstack`, we've been adding support for AMD GPUs with [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets),
1414
so we saw this as a great chance to test our integration by benchmarking AMD GPUs. Our friends at
1515
[Hot Aisle :material-arrow-top-right-thin:{ .external }](https://hotaisle.xyz/){:target="_blank"}, who build top-tier
1616
bare metal compute for AMD GPUs, kindly provided the hardware for the benchmark.
@@ -34,7 +34,7 @@ Here is the spec of the bare metal machine we got:
3434
??? info "Set up an SSH fleet"
3535

3636
Hot Aisle provided us with SSH access to the machine. To make it accessible via `dstack`,
37-
we created an [SSH fleet](../../docs/concepts/fleets.md#ssh) using the following configuration:
37+
we created an [SSH fleet](../../docs/concepts/fleets.md#ssh-fleets) using the following configuration:
3838

3939
<div editor-title="hotaisle.dstack.yml">
4040

@@ -215,7 +215,7 @@ If you have questions, feedback, or want to help improve the benchmark, please r
215215
is the primary sponsor of this benchmark, and we are sincerely grateful for their hardware and support.
216216

217217
If you'd like to use top-tier bare metal compute with AMD GPUs, we recommend going
218-
with Hot Aisle. Once you gain access to a cluster, it can be easily accessed via `dstack`'s [SSH fleet](../../docs/concepts/fleets.md#ssh) easily.
218+
with Hot Aisle. Once you gain access to a cluster, it can be easily accessed via `dstack`'s [SSH fleet](../../docs/concepts/fleets.md#ssh-fleets) easily.
219219

220220
### RunPod
221221
If you’d like to use on-demand compute with AMD GPUs at affordable prices, you can configure `dstack` to

docs/docs/concepts/backends.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,8 @@ They can be configured via `~/.dstack/server/config.yml` or through the [project
99
* [Container-based](#container-based) – use either `dstack`'s native integration with cloud providers or Kubernetes to orchestrate container-based runs; provisioning in this case is delegated to the cloud provider or Kubernetes.
1010
* [On-prem](#on-prem) – use `dstack`'s native support for on-prem servers without needing Kubernetes.
1111

12-
??? info "dstack Sky"
13-
If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
14-
you can either configure your own backends or use the pre-configured backend that gives you access to compute from the GPU marketplace.
12+
!!! info "dstack Sky"
13+
If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}, backend configuration is optional. dstack Sky lets you use pre-configured backends to access GPU marketplace.
1514

1615
See the examples of backend configuration below.
1716

docs/docs/concepts/fleets.md

Lines changed: 44 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -4,36 +4,41 @@ Fleets act both as pools of instances and as templates for how those instances a
44

55
`dstack` supports two kinds of fleets:
66

7-
* [Standard fleets](#standard) – dynamically provisioned through configured backends; they are supported with any type of backends: [VM-based](backends.md#vm-based), [container-based](backends.md#container-based), and [Kubernetes](backends.md#kubernetes)
7+
* [Backend fleets](#backend) – dynamically provisioned through configured backends; they are supported with any type of backends: [VM-based](backends.md#vm-based) and [container-based](backends.md#container-based) (incl. [`kubernetes`](backends.md#kubernetes))
88
* [SSH fleets](#ssh) – created using on-prem servers; do not require backends
99

10-
## Standard fleets { #standard }
10+
When you run `dstack apply` to start a dev environment, task, or service, `dstack` will reuse idle instances from an existing fleet whenever available.
1111

12-
When you run `dstack apply` to start a dev environment, task, or service, `dstack` will reuse idle instances
13-
from an existing fleet whenever available.
12+
## Backend fleets { #backend-fleets }
1413

15-
If no fleet meets the requirements or has idle capacity, `dstack` can create a new fleet on the fly.
16-
However, it’s generally better to define fleets explicitly in configuration files for greater control.
14+
If you configured [backends](backends.md), `dstack` can provision fleets on the fly.
15+
However, it’s recommended to define fleets explicitly.
1716

1817
### Apply a configuration
1918

20-
Define a fleet configuration as a YAML file in your project directory. The file must have a
19+
To create a backend fleet, define a configuration as a YAML file in your project directory. The file must have a
2120
`.dstack.yml` extension (e.g. `.dstack.yml` or `fleet.dstack.yml`).
2221

2322
<div editor-title="examples/misc/fleets/.dstack.yml">
2423

2524
```yaml
2625
type: fleet
2726
# The name is optional, if not specified, generated randomly
28-
name: my-fleet
27+
name: default-fleet
2928

3029
# Can be a range or a fixed number
31-
nodes: 2
30+
# Allow to provision of up to 2 instances
31+
nodes: 0..2
32+
3233
# Uncomment to ensure instances are inter-connected
3334
#placement: cluster
35+
36+
# Deprovision instances above the minimum if they remain idle
37+
idle_duration: 1h
3438

3539
resources:
36-
gpu: 24GB
40+
# Allow to provision up to 8 GPUs
41+
gpu: 0..8
3742
```
3843

3944
</div>
@@ -48,63 +53,49 @@ $ dstack apply -f examples/misc/fleets/.dstack.yml
4853
Provisioning...
4954
---> 100%
5055

51-
FLEET INSTANCE BACKEND GPU PRICE STATUS CREATED
52-
my-fleet 0 gcp (europe-west-1) L4:24GB (spot) $0.1624 idle 3 mins ago
53-
1 gcp (europe-west-1) L4:24GB (spot) $0.1624 idle 3 mins ago
56+
FLEET INSTANCE BACKEND GPU PRICE STATUS CREATED
57+
my-fleet - - - - - -
5458
```
5559

5660
</div>
5761

58-
Once the status of instances changes to `idle`, they can be used by dev environments, tasks, and services.
62+
`dstack` always keeps the minimum number of nodes provisioned. Additional instances, up to the maximum limit, are provisioned on demand.
5963

60-
??? info "Container-based backends"
61-
[Container-based](backends.md#container-based) backends don’t support pre-provisioning,
62-
so `nodes` can only be set to a range starting with `0`.
63-
64-
This means instances are created only when a run starts, and once it finishes, they’re terminated and released back to the provider (either a cloud service or Kubernetes).
64+
!!! info "Container-based backends"
65+
For [container-based](backends.md#container-based) backends (such as `kubernetes`, `runpod`, etc), `nodes` must be defined as a range starting with `0`. In these cases, instances are provisioned on demand as needed.
6566

66-
<div editor-title=".dstack.yml">
67+
<!-- TODO: Ensure the user sees the error or warning otherwise -->
6768

68-
```yaml
69-
type: fleet
70-
# The name is optional, if not specified, generated randomly
71-
name: my-fleet
72-
73-
# Specify the number of instances
74-
nodes: 0..2
75-
# Uncomment to ensure instances are inter-connected
76-
#placement: cluster
77-
78-
resources:
79-
gpu: 24GB
80-
```
69+
??? info "Target number of nodes"
8170

82-
</div>
71+
If `nodes` is defined as a range, you can start with more than the minimum number of instances by using the `target` parameter when creating the fleet.
8372

84-
### Configuration options
73+
<div editor-title=".dstack.yml">
8574

86-
#### Nodes { #nodes }
75+
```yaml
76+
type: fleet
8777

88-
The `nodes` property controls how many instances to provision and maintain in the fleet:
78+
name: my-fleet
8979

90-
<div editor-title=".dstack.yml">
80+
nodes:
81+
min: 0
82+
max: 2
9183

92-
```yaml
93-
type: fleet
84+
# Provision 2 instances initially
85+
target: 2
9486

95-
name: my-fleet
87+
# Deprovision instances above the minimum if they remain idle
88+
idle_duration: 1h
89+
```
9690

97-
nodes:
98-
min: 1 # Always maintain at least 1 idle instance. Can be 0.
99-
target: 2 # (Optional) Provision 2 instances initially
100-
max: 3 # (Optional) Do not allow more than 3 instances
101-
```
91+
</div>
10292

103-
</div>
93+
By default, when you submit a [dev environment](dev-environments.md), [task](tasks.md), or [service](services.md), `dstack` tries all available fleets. However, you can explicitly specify the [`fleets`](../reference/dstack.yml/dev-environment.md#fleets) in your run configuration
94+
or via [`--fleet`](../reference/cli/dstack/apply.md#fleet) with `dstack apply`.
10495

105-
`dstack` ensures the fleet always has at least `nodes.min` instances, creating new instances in the background if necessary. If you don't need to keep instances in the fleet forever, you can set `nodes.min` to `0`. By default, `dstack apply` also provisions `nodes.min` instances. The `nodes.target` property allows provisioning more instances initially than needs to be maintained.
96+
### Configuration options
10697

107-
#### Placement { #standard-placement }
98+
#### Placement { #backend-placement }
10899

109100
To ensure instances are interconnected (e.g., for
110101
[distributed tasks](tasks.md#distributed-tasks)), set `placement` to `cluster`.
@@ -190,9 +181,9 @@ and their quantity. Examples: `nvidia` (one NVIDIA GPU), `A100` (one A100), `A10
190181
> If you’re unsure which offers (hardware configurations) are available from the configured backends, use the
191182
> [`dstack offer`](../reference/cli/dstack/offer.md#list-gpu-offers) command to list them.
192183

193-
#### Blocks { #standard-blocks }
184+
#### Blocks { #backend-blocks }
194185

195-
For standard fleets, `blocks` function the same way as in SSH fleets.
186+
For backend fleets, `blocks` function the same way as in SSH fleets.
196187
See the [`Blocks`](#ssh-blocks) section under SSH fleets for details on the blocks concept.
197188

198189
<div editor-title=".dstack.yml">
@@ -272,13 +263,13 @@ retry:
272263
</div>
273264

274265
!!! info "Reference"
275-
Standard fleets support many more configuration options,
266+
Backend fleets support many more configuration options,
276267
incl. [`backends`](../reference/dstack.yml/fleet.md#backends),
277268
[`regions`](../reference/dstack.yml/fleet.md#regions),
278269
[`max_price`](../reference/dstack.yml/fleet.md#max_price), and
279270
among [others](../reference/dstack.yml/fleet.md).
280271

281-
## SSH fleets { #ssh }
272+
## SSH fleets { #ssh-fleets }
282273

283274
If you have a group of on-prem servers accessible via SSH, you can create an SSH fleet.
284275

docs/docs/quickstart.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ $ mkdir quickstart && cd quickstart
1616

1717
## Create a fleet
1818

19-
Before submitting runs, you need to create a fleet where new instances will be provisioned.
19+
If [backends](concepts/backends.md) are configured, `dstack` can create a new [backend fleet](concepts/fleets.md#backend-fleets) on the fly. However, it’s recommended to create them explicitly.
2020

2121
<h3>Define a configuration</h3>
2222

@@ -28,7 +28,15 @@ Create the following fleet configuration inside your project folder:
2828
type: fleet
2929
name: default
3030

31-
nodes: 0..
31+
# Allow to provision of up to 2 instances
32+
nodes: 0..2
33+
34+
# Deprovision instances above the minimum if they remain idle
35+
idle_duration: 1h
36+
37+
resources:
38+
# Allow to provision up to 8 GPUs
39+
gpu: 0..8
3240
```
3341
3442
</div>
@@ -55,13 +63,15 @@ Create the fleet? [y/n]: y
5563

5664
</div>
5765

66+
Alternatively, you can create an [SSH fleet](concepts/fleets#ssh-fleets).
67+
5868
## Submit your first run
5969

6070
`dstack` supports three types of run configurations.
6171

6272
=== "Dev environment"
6373

64-
A dev environment lets you provision an instance and access it with your desktop IDE.
74+
A [dev environment](concepts/dev-environments.md) lets you provision an instance and access it with your desktop IDE.
6575

6676
<h3>Define a configuration</h3>
6777

@@ -117,7 +127,7 @@ Create the fleet? [y/n]: y
117127

118128
=== "Task"
119129

120-
A task allows you to schedule a job or run a web app. Tasks can be distributed and can forward ports.
130+
A [task](concepts/tasks.md) allows you to schedule a job or run a web app. Tasks can be distributed and can forward ports.
121131

122132
<h3>Define a configuration</h3>
123133

@@ -181,7 +191,7 @@ Create the fleet? [y/n]: y
181191

182192
=== "Service"
183193

184-
A service allows you to deploy a model or any web app as an endpoint.
194+
A [service](concepts/services.md) allows you to deploy a model or any web app as an endpoint.
185195

186196
<h3>Define a configuration</h3>
187197

examples/clusters/nccl-tests/README.md

Lines changed: 5 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,9 @@
11
# NCCL tests
22

3-
This example shows how to run distributed [NCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/NVIDIA/nccl-tests){:target="_blank"} with MPI using `dstack`.
3+
This example shows how to run [NCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/NVIDIA/nccl-tests){:target="_blank"} on a cluster using [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks).
44

5-
??? info "Fleet"
6-
Before running NCCL tests, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example:
7-
8-
<div editor-title="examples/clusters/nccl-tests/fleet.dstack.yml">
9-
10-
```yaml
11-
type: fleet
12-
name: cluster-fleet
13-
14-
nodes: 2
15-
placement: cluster
16-
17-
resources:
18-
gpu: nvidia:1..8
19-
shm_size: 16GB
20-
```
21-
22-
</div>
23-
24-
> For more details on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
5+
!!! info "Prerequisites"
6+
Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#backend-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
257

268
## Running as a task
279

@@ -97,5 +79,5 @@ The source-code of this example can be found in
9779

9880
## What's next?
9981

100-
1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
101-
[services](https://dstack.ai/docs/services), and [fleets](https://dstack.ai/docs/concepts/fleets).
82+
1. Check [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks),
83+
[services](https://dstack.ai/docsconcepts/services), and [fleets](https://dstack.ai/docs/concepts/fleets).

examples/clusters/rccl-tests/README.md

Lines changed: 3 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,9 @@
11
# RCCL tests
22

3-
This example shows how to run distributed [RCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rccl-tests){:target="_blank"} with MPI using `dstack`.
3+
This example shows how to run distributed [RCCL tests :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rccl-tests){:target="_blank"} using [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks).
44

5-
??? info "Fleet"
6-
Before running RCCL tests, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example:
7-
8-
<div editor-title="examples/clusters/rccl-tests/fleet.dstack.yml">
9-
10-
```yaml
11-
type: fleet
12-
name: cluster-fleet
13-
14-
nodes: 2
15-
placement: cluster
16-
17-
resources:
18-
gpu: MI300X:8
19-
```
20-
21-
</div>
22-
23-
> For more details on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
5+
!!! info "Prerequisites"
6+
Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#backend-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
247

258

269
## Running as a task

examples/distributed-training/axolotl/README.md

Lines changed: 3 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,9 @@
11
# Axolotl
22

3-
This example walks you through how to run distributed fine-tune using [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl){:target="_blank"} with `dstack`.
3+
This example walks you through how to run distributed fine-tune using [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl){:target="_blank"} and [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks).
44

5-
??? info "Prerequisites"
6-
Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
7-
8-
<div class="termy">
9-
10-
```shell
11-
$ git clone https://github.com/dstackai/dstack
12-
$ cd dstack
13-
```
14-
</div>
15-
16-
??? info "Fleet"
17-
Before submitting distributed training runs, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example:
18-
19-
<div editor-title="examples/distributed-training/axolotl/fleet.dstack.yml">
20-
21-
```yaml
22-
type: fleet
23-
name: axolotl-fleet
24-
25-
nodes: 2
26-
placement: cluster
27-
28-
resources:
29-
gpu: 80GB:8
30-
shm_size: 128GB
31-
```
32-
33-
</div>
34-
35-
> For more details on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
5+
!!! info "Prerequisites"
6+
Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#backend-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
367

378
## Define a configuration
389

0 commit comments

Comments
 (0)