-
-
Notifications
You must be signed in to change notification settings - Fork 223
Fleet-first docs #3242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fleet-first docs #3242
Changes from 7 commits
b193f0f
9490efa
9c89041
c0695ea
4199fc0
a449185
b641470
1534e4d
3a2d076
0c0c9d4
3c3b846
f966126
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -190,11 +190,9 @@ See more Docker examples [here](https://github.com/dstackai/dstack/tree/master/e | |
| ### Creation policy | ||
|
|
||
| By default, when you run `dstack apply` with a dev environment, task, or service, | ||
| `dstack` reuses `idle` instances from an existing [fleet](../concepts/fleets.md). | ||
| If no `idle` instances match the requirements, `dstack` automatically creates a new fleet | ||
| using configured backends. | ||
| if no `idle` instances from the available fleets meet the requirements, `dstack` provisions a new instance using configured backends. | ||
|
|
||
| To ensure `dstack apply` doesn't create a new fleet but reuses an existing one, | ||
| To ensure `dstack apply` doesn't provision a new instance but reuses an existing one, | ||
| pass `-R` (or `--reuse`) to `dstack apply`. | ||
|
|
||
| <div class="termy"> | ||
|
|
@@ -205,16 +203,14 @@ $ dstack apply -R -f examples/.dstack.yml | |
|
|
||
| </div> | ||
|
|
||
| Or, set [`creation_policy`](../reference/dstack.yml/dev-environment.md#creation_policy) to `reuse` in the run configuration. | ||
|
|
||
| ### Idle duration | ||
|
|
||
| If a fleet is created automatically, it stays `idle` for 5 minutes by default and can be reused within that time. | ||
| If the fleet is not reused within this period, it is automatically terminated. | ||
| If a run provisions a new instance, the instance stays `idle` for 5 minutes by default and can be reused within that time. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Haven't we dropped the defaults for
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dropped max_duration default long ago but not idle_duration
This comment was marked as resolved.
Sorry, something went wrong.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh, its something else. Should I create a new issue about dropping
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think with |
||
| If the instance is not reused within this period, it is automatically terminated. | ||
| To change the default idle duration, set | ||
| [`idle_duration`](../reference/dstack.yml/fleet.md#idle_duration) in the run configuration (e.g., `0s`, `1m`, or `off` for | ||
| unlimited). | ||
|
|
||
| > For greater control over fleet provisioning, configuration, and lifecycle management, it is recommended to use | ||
| > [fleets](../concepts/fleets.md) directly. | ||
| [`idle_duration`](../reference/dstack.yml/fleet.md#idle_duration) in the run configuration (e.g., `0s`, `1m`, or `off` for unlimited). | ||
|
|
||
| ## Volumes | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| # Quickstart | ||
|
|
||
| > Before using `dstack`, ensure you've [installed](installation/index.md) the server, or signed up for [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}. | ||
| > Before using `dstack`, ensure you've [installed](installation/index.md) the server, or signed up for [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"} | ||
|
|
||
| ## Set up a directory | ||
|
|
||
|
|
@@ -14,6 +14,46 @@ $ mkdir quickstart && cd quickstart | |
|
|
||
| </div> | ||
|
|
||
| ## Create a fleet | ||
|
|
||
| Before submitting runs, you need to create a fleet where new instances will be provisioned. | ||
|
|
||
| ### Define a configuration | ||
|
r4victor marked this conversation as resolved.
Outdated
|
||
|
|
||
| Create the following fleet configuration inside your project folder: | ||
|
|
||
| <div editor-title="fleet.dstack.yml"> | ||
|
|
||
| ```yaml | ||
| type: fleet | ||
| name: default-fleet | ||
|
r4victor marked this conversation as resolved.
Outdated
|
||
| nodes: 0.. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps, I'd also add
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See #3249
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe once #3249 is fixed then? |
||
| ``` | ||
|
|
||
| </div> | ||
|
|
||
| ### Apply the configuration | ||
|
r4victor marked this conversation as resolved.
Outdated
|
||
|
|
||
| Apply the configuration via [`dstack apply`](reference/cli/dstack/apply.md): | ||
|
|
||
| <div class="termy"> | ||
|
|
||
| ```shell | ||
| $ dstack apply -f fleet.dstack.yml | ||
|
|
||
| # BACKEND REGION RESOURCES SPOT PRICE | ||
| 1 gcp us-west4 2xCPU, 8GB, 100GB (disk) yes $0.010052 | ||
| 2 azure westeurope 2xCPU, 8GB, 100GB (disk) yes $0.0132 | ||
| 3 gcp europe-central2 2xCPU, 8GB, 100GB (disk) yes $0.013248 | ||
|
|
||
| Fleet cloud-fleet does not exist yet. | ||
| Create the fleet? [y/n]: y | ||
| FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED | ||
| defalut-fleet - - - - - 10:36 | ||
| ``` | ||
|
|
||
| </div> | ||
|
|
||
| ## Submit your first run | ||
|
|
||
| `dstack` supports three types of run configurations. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| type: fleet | ||
| name: my-efa-fleet | ||
|
|
||
| nodes: 2 | ||
| placement: cluster | ||
|
|
||
| resources: | ||
| gpu: H100:8 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| type: fleet | ||
| name: cluster-fleet | ||
|
|
||
| nodes: 2 | ||
| placement: cluster | ||
|
|
||
| resources: | ||
| gpu: nvidia:1..8 | ||
| shm_size: 16GB |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| type: fleet | ||
| name: cluster-fleet | ||
|
|
||
| nodes: 2 | ||
| placement: cluster | ||
|
|
||
| resources: | ||
| gpu: MI300X:8 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| type: fleet | ||
| name: axolotl-fleet | ||
|
|
||
| nodes: 2 | ||
| placement: cluster | ||
|
|
||
| resources: | ||
| gpu: 80GB:8 | ||
| shm_size: 128GB |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| type: fleet | ||
| name: ray-ragen-cluster-fleet | ||
|
|
||
| nodes: 2 | ||
| placement: cluster | ||
|
|
||
| resources: | ||
| gpu: 80GB:8 | ||
| shm_size: 128GB |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,11 +15,27 @@ This example walks you through how to run distributed fine-tune using [TRL :mate | |
|
|
||
| ## Create fleet | ||
|
|
||
| Before submitting distributed training runs, make sure to create a fleet with a `placement` set to `cluster`. | ||
| Before submitting distributed training runs, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I'd probably make this section collapsed by default, as it repeats everywhere.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assumed users may already have a suitable fleet since cluster is not required – for the same reason Tasks, Services, Dev environments pages don't have Create fleet. But we can add Create fleet section everywhere if you like that. |
||
|
|
||
| > For more detials on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide. | ||
| <div editor-title="examples/distributed-training/trl/fleet.dstack.yml"> | ||
|
|
||
| ## Define a configurtation | ||
| ```yaml | ||
| type: fleet | ||
| name: trl-train-fleet | ||
|
|
||
| nodes: 2 | ||
| placement: cluster | ||
|
|
||
| resources: | ||
| gpu: 80GB:8 | ||
| shm_size: 128GB | ||
| ``` | ||
|
|
||
| </div> | ||
|
|
||
| > For more details on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide. | ||
|
|
||
| ## Define a configuration | ||
|
|
||
| Once the fleet is created, define a distributed task configuration. Here's an example of such a task. | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| type: fleet | ||
| name: trl-train-fleet | ||
|
|
||
| nodes: 2 | ||
| placement: cluster | ||
|
|
||
| resources: | ||
| gpu: 80GB:8 | ||
| shm_size: 128GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
Fleets, I wopuld probably make it more visible that user can set either fixed number of nodes or a range. Currently we only show a fixed number. A range is going to be even more popular choice. I would show both and explicitely tell why one or the other should be used.Let me know if you want me to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the range example should also mention
idle_durationexplicitely.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to push a commit