Skip to content

Commit eece104

Browse files
committed
feature: implement replica groups in service configurations
type: service port: 8000 commands: ["python app.py"] replica_groups: - name: l40s-gpu replicas: 1..3 # autoscalable resources: gpu: L40S - name: h100-gpu replicas: 2 # fixed regions: [us-east] resources: gpu: H100 - Added the ability to define multiple replica groups with distinct configurations, including resource requirements and autoscaling behavior. - Updated relevant documentation to reflect the new replica groups feature. - Enhanced CLI output to display job plans with group names for better clarity. - Ensured backward compatibility by excluding replica groups from JSON when not set. - Added tests to validate the functionality and backward compatibility of replica groups. This change allows for more flexible service configurations, enabling users to manage different types of resources and scaling strategies within a single service.
1 parent d4e0c75 commit eece104

File tree

22 files changed

+2330
-105
lines changed

22 files changed

+2330
-105
lines changed

contributing/AUTOSCALING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
- STEP 7: `scale_run_replicas` terminates or starts replicas.
1212
- `SUBMITTED` and `PROVISIONING` replicas get terminated before `RUNNING`.
1313
- Replicas are terminated by descending `replica_num` and launched by ascending `replica_num`.
14+
- For services with `replica_groups`, only groups with autoscaling ranges (min != max) participate in scaling.
15+
- Scale operations respect per-group minimum and maximum constraints.
1416

1517
## RPSAutoscaler
1618

contributing/RUNS-AND-JOBS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Runs are created from run configurations. There are three types of run configura
1313
2. `task` — runs the user's bash script until completion.
1414
3. `service` — runs the user's bash script and exposes a port through [dstack-proxy](PROXY.md).
1515

16-
A run can spawn one or multiple jobs, depending on the configuration. A task that specifies multiple `nodes` spawns a job for every node (a multi-node task). A service that specifies multiple `replicas` spawns a job for every replica. A job submission is always assigned to one particular instance. If a job fails and the configuration allows retrying, the server creates a new job submission for the job.
16+
A run can spawn one or multiple jobs, depending on the configuration. A task that specifies multiple `nodes` spawns a job for every node (a multi-node task). A service that specifies multiple `replicas` or `replica_groups` spawns a job for every replica. Each job in a replica group is tagged with `replica_group_name` to track which group it belongs to. A job submission is always assigned to one particular instance. If a job fails and the configuration allows retrying, the server creates a new job submission for the job.
1717

1818
## Run's Lifecycle
1919

docs/docs/concepts/services.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,66 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
160160

161161
> The `scaling` property requires creating a [gateway](gateways.md).
162162

163+
### Replica Groups (Advanced)
164+
165+
For advanced use cases, you can define multiple **replica groups** with different instance types, resources, and configurations within a single service. This is useful when you want to:
166+
167+
- Run different GPU types in the same service (e.g., H100 for primary, RTX5090 for overflow)
168+
- Configure different backends or regions per replica type
169+
- Set different autoscaling behavior per group
170+
171+
<div editor-title="service.dstack.yml">
172+
173+
```yaml
174+
type: service
175+
name: llama31-service
176+
177+
python: 3.12
178+
env:
179+
- HF_TOKEN
180+
commands:
181+
- uv pip install vllm
182+
- vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
183+
port: 8000
184+
185+
# Define multiple replica groups with different configurations
186+
replica_groups:
187+
- name: primary
188+
replicas: 1 # Always 1 H100 (fixed)
189+
resources:
190+
gpu: H100:1
191+
backends: [aws]
192+
regions: [us-west-2]
193+
194+
- name: overflow
195+
replicas: 0..5 # Autoscales 0-5 RTX5090s
196+
resources:
197+
gpu: RTX5090:1
198+
backends: [runpod]
199+
200+
scaling:
201+
metric: rps
202+
target: 10
203+
```
204+
205+
</div>
206+
207+
In this example:
208+
209+
- The `primary` group always runs 1 H100 replica on AWS (fixed, never scaled)
210+
- The `overflow` group scales 0-5 RTX5090 replicas on RunPod based on load
211+
- Scale operations only affect groups with autoscaling ranges (min != max)
212+
213+
Each replica group can override any [profile parameter](../reference/profiles.yml.md) including `backends`, `regions`, `instance_types`, `spot_policy`, etc. Group-level settings override service-level settings.
214+
215+
> **Note:** When using `replica_groups`, you cannot use the simple `replicas` field. They are mutually exclusive.
216+
217+
**When to use replica groups:**
218+
219+
- You need different GPU types in the same service
220+
- Different replicas should run in different regions or clouds
221+
- Some replicas should be fixed while others autoscale
222+
163223
### Model
164224

165225
If the service is running a chat model with an OpenAI-compatible interface,

docs/docs/reference/dstack.yml/service.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,22 @@ The `service` configuration type allows running [services](../../concepts/servic
1010
type:
1111
required: true
1212

13+
### `replica_groups`
14+
15+
Define multiple replica groups with different configurations within a single service.
16+
17+
> **Note:** Cannot be used together with `replicas`.
18+
19+
#### `replica_groups[n]`
20+
21+
#SCHEMA# dstack._internal.core.models.configurations.ReplicaGroup
22+
overrides:
23+
show_root_heading: false
24+
type:
25+
required: true
26+
27+
Each replica group inherits from [ProfileParams](../profiles.yml.md) and can override any profile parameter including `backends`, `regions`, `instance_types`, `spot_policy`, etc.
28+
1329
### `model` { data-toc-label="model" }
1430

1531
=== "OpenAI"

src/dstack/_internal/cli/utils/run.py

Lines changed: 150 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,32 @@ def th(s: str) -> str:
119119
if include_run_properties:
120120
props.add_row(th("Configuration"), run_spec.configuration_path)
121121
props.add_row(th("Type"), run_spec.configuration.type)
122-
props.add_row(th("Resources"), pretty_req)
122+
123+
from dstack._internal.core.models.configurations import ServiceConfiguration
124+
125+
if (
126+
include_run_properties
127+
and isinstance(run_spec.configuration, ServiceConfiguration)
128+
and run_spec.configuration.replica_groups
129+
):
130+
groups_info = []
131+
for group in run_spec.configuration.replica_groups:
132+
group_parts = [f"[cyan]{group.name}[/cyan]"]
133+
134+
if group.replicas.min == group.replicas.max:
135+
group_parts.append(f"×{group.replicas.max}")
136+
else:
137+
group_parts.append(f"×{group.replicas.min}..{group.replicas.max}")
138+
group_parts.append("[dim](autoscalable)[/dim]")
139+
140+
group_parts.append(f"[dim]({group.resources.pretty_format()})[/dim]")
141+
142+
groups_info.append(" ".join(group_parts))
143+
144+
props.add_row(th("Replica groups"), "\n".join(groups_info))
145+
else:
146+
props.add_row(th("Resources"), pretty_req)
147+
123148
props.add_row(th("Spot policy"), spot_policy)
124149
props.add_row(th("Max price"), max_price)
125150
if include_run_properties:
@@ -138,45 +163,130 @@ def th(s: str) -> str:
138163
offers.add_column("INSTANCE TYPE", style="grey58", no_wrap=True, ratio=2)
139164
offers.add_column("PRICE", style="grey58", ratio=1)
140165
offers.add_column()
166+
167+
# For replica groups, show offers from all job plans
168+
if len(run_plan.job_plans) > 1:
169+
# Multiple jobs - aggregate offers from all groups
170+
all_offers = []
171+
groups_with_no_offers = []
172+
total_offers_count = 0
173+
174+
for jp in run_plan.job_plans:
175+
group_name = jp.job_spec.replica_group_name or "default"
176+
if jp.total_offers == 0:
177+
groups_with_no_offers.append(group_name)
178+
for offer in jp.offers[:max_offers] if max_offers else jp.offers:
179+
all_offers.append((group_name, offer))
180+
total_offers_count += jp.total_offers
181+
182+
# Sort by price
183+
all_offers.sort(key=lambda x: x[1].price)
184+
if max_offers:
185+
all_offers = all_offers[:max_offers]
186+
187+
# Show groups with no offers FIRST
188+
for group_name in groups_with_no_offers:
189+
offers.add_row(
190+
"",
191+
f"[cyan]{group_name}[/cyan]:",
192+
"[red]No matching instance offers available.[/red]\n"
193+
"Possible reasons: https://dstack.ai/docs/guides/troubleshooting/#no-offers",
194+
"",
195+
"",
196+
"",
197+
style="secondary",
198+
)
199+
200+
# Then show groups with offers
201+
for i, (group_name, offer) in enumerate(all_offers, start=1):
202+
r = offer.instance.resources
141203

142-
job_plan.offers = job_plan.offers[:max_offers] if max_offers else job_plan.offers
204+
availability = ""
205+
if offer.availability in {
206+
InstanceAvailability.NOT_AVAILABLE,
207+
InstanceAvailability.NO_QUOTA,
208+
InstanceAvailability.IDLE,
209+
InstanceAvailability.BUSY,
210+
}:
211+
availability = offer.availability.value.replace("_", " ").lower()
212+
instance = offer.instance.name
213+
if offer.total_blocks > 1:
214+
instance += f" ({offer.blocks}/{offer.total_blocks})"
215+
216+
# Add group name prefix for multi-group display
217+
backend_display = f"[cyan]{group_name}[/cyan]: {offer.backend.replace('remote', 'ssh')} ({offer.region})"
218+
219+
offers.add_row(
220+
f"{i}",
221+
backend_display,
222+
r.pretty_format(include_spot=True),
223+
instance,
224+
f"${offer.price:.4f}".rstrip("0").rstrip("."),
225+
availability,
226+
style=None if i == 1 or not include_run_properties else "secondary",
227+
)
228+
229+
if total_offers_count > len(all_offers):
230+
offers.add_row("", "...", style="secondary")
231+
else:
232+
# Single job - original logic
233+
job_plan.offers = job_plan.offers[:max_offers] if max_offers else job_plan.offers
143234

144-
for i, offer in enumerate(job_plan.offers, start=1):
145-
r = offer.instance.resources
235+
for i, offer in enumerate(job_plan.offers, start=1):
236+
r = offer.instance.resources
146237

147-
availability = ""
148-
if offer.availability in {
149-
InstanceAvailability.NOT_AVAILABLE,
150-
InstanceAvailability.NO_QUOTA,
151-
InstanceAvailability.IDLE,
152-
InstanceAvailability.BUSY,
153-
}:
154-
availability = offer.availability.value.replace("_", " ").lower()
155-
instance = offer.instance.name
156-
if offer.total_blocks > 1:
157-
instance += f" ({offer.blocks}/{offer.total_blocks})"
158-
offers.add_row(
159-
f"{i}",
160-
offer.backend.replace("remote", "ssh") + " (" + offer.region + ")",
161-
r.pretty_format(include_spot=True),
162-
instance,
163-
f"${offer.price:.4f}".rstrip("0").rstrip("."),
164-
availability,
165-
style=None if i == 1 or not include_run_properties else "secondary",
166-
)
167-
if job_plan.total_offers > len(job_plan.offers):
168-
offers.add_row("", "...", style="secondary")
238+
availability = ""
239+
if offer.availability in {
240+
InstanceAvailability.NOT_AVAILABLE,
241+
InstanceAvailability.NO_QUOTA,
242+
InstanceAvailability.IDLE,
243+
InstanceAvailability.BUSY,
244+
}:
245+
availability = offer.availability.value.replace("_", " ").lower()
246+
instance = offer.instance.name
247+
if offer.total_blocks > 1:
248+
instance += f" ({offer.blocks}/{offer.total_blocks})"
249+
offers.add_row(
250+
f"{i}",
251+
offer.backend.replace("remote", "ssh") + " (" + offer.region + ")",
252+
r.pretty_format(include_spot=True),
253+
instance,
254+
f"${offer.price:.4f}".rstrip("0").rstrip("."),
255+
availability,
256+
style=None if i == 1 or not include_run_properties else "secondary",
257+
)
258+
if job_plan.total_offers > len(job_plan.offers):
259+
offers.add_row("", "...", style="secondary")
169260

170261
console.print(props)
171262
console.print()
172-
if len(job_plan.offers) > 0:
263+
264+
# Check if we have offers to display
265+
has_offers = False
266+
if len(run_plan.job_plans) > 1:
267+
has_offers = any(len(jp.offers) > 0 for jp in run_plan.job_plans)
268+
else:
269+
has_offers = len(job_plan.offers) > 0
270+
271+
if has_offers:
173272
console.print(offers)
174-
if job_plan.total_offers > len(job_plan.offers):
175-
console.print(
176-
f"[secondary] Shown {len(job_plan.offers)} of {job_plan.total_offers} offers, "
177-
f"${job_plan.max_price:3f}".rstrip("0").rstrip(".")
178-
+ "max[/]"
179-
)
273+
# Show summary for multi-job plans
274+
if len(run_plan.job_plans) > 1:
275+
if total_offers_count > len(all_offers):
276+
max_price_overall = max((jp.max_price for jp in run_plan.job_plans if jp.max_price), default=None)
277+
if max_price_overall:
278+
console.print(
279+
f"[secondary] Shown {len(all_offers)} of {total_offers_count} offers, "
280+
f"${max_price_overall:3f}".rstrip("0").rstrip(".")
281+
+ " max[/]"
282+
)
283+
else:
284+
if job_plan.total_offers > len(job_plan.offers):
285+
console.print(
286+
f"[secondary] Shown {len(job_plan.offers)} of {job_plan.total_offers} offers, "
287+
f"${job_plan.max_price:3f}".rstrip("0").rstrip(".")
288+
+ " max[/]"
289+
)
180290
console.print()
181291
else:
182292
console.print(NO_OFFERS_WARNING)
@@ -233,8 +343,14 @@ def get_runs_table(
233343
if verbose and latest_job_submission.inactivity_secs:
234344
inactive_for = format_duration_multiunit(latest_job_submission.inactivity_secs)
235345
status += f" (inactive for {inactive_for})"
346+
347+
job_name_parts = [f" replica={job.job_spec.replica_num}"]
348+
if job.job_spec.replica_group_name:
349+
job_name_parts.append(f"[cyan]group={job.job_spec.replica_group_name}[/cyan]")
350+
job_name_parts.append(f"job={job.job_spec.job_num}")
351+
236352
job_row: Dict[Union[str, int], Any] = {
237-
"NAME": f" replica={job.job_spec.replica_num} job={job.job_spec.job_num}"
353+
"NAME": " ".join(job_name_parts)
238354
+ (
239355
f" deployment={latest_job_submission.deployment_num}"
240356
if show_deployment_num

src/dstack/_internal/core/compatibility/runs.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,9 @@ def get_run_spec_excludes(run_spec: RunSpec) -> IncludeExcludeDictType:
151151
configuration_excludes["schedule"] = True
152152
if profile is not None and profile.schedule is None:
153153
profile_excludes.add("schedule")
154+
# Exclude replica_groups for backward compatibility with older servers
155+
if isinstance(configuration, ServiceConfiguration) and configuration.replica_groups is None:
156+
configuration_excludes["replica_groups"] = True
154157
configuration_excludes["repos"] = True
155158

156159
if configuration_excludes:

0 commit comments

Comments
 (0)