You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feature: implement replica groups in service configurations
type: service
port: 8000
commands: ["python app.py"]
replica_groups:
- name: l40s-gpu
replicas: 1..3 # autoscalable
resources:
gpu: L40S
- name: h100-gpu
replicas: 2 # fixed
regions: [us-east]
resources:
gpu: H100
- Added the ability to define multiple replica groups with distinct configurations, including resource requirements and autoscaling behavior.
- Updated relevant documentation to reflect the new replica groups feature.
- Enhanced CLI output to display job plans with group names for better clarity.
- Ensured backward compatibility by excluding replica groups from JSON when not set.
- Added tests to validate the functionality and backward compatibility of replica groups.
This change allows for more flexible service configurations, enabling users to manage different types of resources and scaling strategies within a single service.
Copy file name to clipboardExpand all lines: contributing/RUNS-AND-JOBS.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ Runs are created from run configurations. There are three types of run configura
13
13
2.`task` — runs the user's bash script until completion.
14
14
3.`service` — runs the user's bash script and exposes a port through [dstack-proxy](PROXY.md).
15
15
16
-
A run can spawn one or multiple jobs, depending on the configuration. A task that specifies multiple `nodes` spawns a job for every node (a multi-node task). A service that specifies multiple `replicas` spawns a job for every replica. A job submission is always assigned to one particular instance. If a job fails and the configuration allows retrying, the server creates a new job submission for the job.
16
+
A run can spawn one or multiple jobs, depending on the configuration. A task that specifies multiple `nodes` spawns a job for every node (a multi-node task). A service that specifies multiple `replicas`or `replica_groups`spawns a job for every replica. Each job in a replica group is tagged with `replica_group_name` to track which group it belongs to. A job submission is always assigned to one particular instance. If a job fails and the configuration allows retrying, the server creates a new job submission for the job.
Copy file name to clipboardExpand all lines: docs/docs/concepts/services.md
+60Lines changed: 60 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -160,6 +160,66 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
160
160
161
161
> The `scaling` property requires creating a [gateway](gateways.md).
162
162
163
+
### Replica Groups (Advanced)
164
+
165
+
For advanced use cases, you can define multiple **replica groups** with different instance types, resources, and configurations within a single service. This is useful when you want to:
166
+
167
+
- Run different GPU types in the same service (e.g., H100 for primary, RTX5090 for overflow)
168
+
- Configure different backends or regions per replica type
# Define multiple replica groups with different configurations
186
+
replica_groups:
187
+
- name: primary
188
+
replicas: 1 # Always 1 H100 (fixed)
189
+
resources:
190
+
gpu: H100:1
191
+
backends: [aws]
192
+
regions: [us-west-2]
193
+
194
+
- name: overflow
195
+
replicas: 0..5 # Autoscales 0-5 RTX5090s
196
+
resources:
197
+
gpu: RTX5090:1
198
+
backends: [runpod]
199
+
200
+
scaling:
201
+
metric: rps
202
+
target: 10
203
+
```
204
+
205
+
</div>
206
+
207
+
In this example:
208
+
209
+
- The `primary` group always runs 1 H100 replica on AWS (fixed, never scaled)
210
+
- The `overflow` group scales 0-5 RTX5090 replicas on RunPod based on load
211
+
- Scale operations only affect groups with autoscaling ranges (min != max)
212
+
213
+
Each replica group can override any [profile parameter](../reference/profiles.yml.md) including `backends`, `regions`, `instance_types`, `spot_policy`, etc. Group-level settings override service-level settings.
214
+
215
+
> **Note:** When using `replica_groups`, you cannot use the simple `replicas` field. They are mutually exclusive.
216
+
217
+
**When to use replica groups:**
218
+
219
+
- You need different GPU types in the same service
220
+
- Different replicas should run in different regions or clouds
221
+
- Some replicas should be fixed while others autoscale
222
+
163
223
### Model
164
224
165
225
If the service is running a chat model with an OpenAI-compatible interface,
Each replica group inherits from [ProfileParams](../profiles.yml.md) and can override any profile parameter including `backends`, `regions`, `instance_types`, `spot_policy`, etc.
0 commit comments