You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/concepts/services.md
+3-12Lines changed: 3 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -233,16 +233,6 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
233
233
??? info "Disaggregated serving"
234
234
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
235
235
236
-
### Model
237
-
238
-
If the service is running a chat model with an OpenAI-compatible interface (i.e., `/v1/chat/completions`),
239
-
set the [`model`](../reference/dstack.yml/service.md#model) property to make the model accessible via `dstack`'s
240
-
global OpenAI-compatible endpoint, and also accessible via `dstack`'s UI.
241
-
242
-
When `model` is set, `dstack` automatically configures [`probes`](#probes) to verify model health.
243
-
To customize or disable this, set `probes` explicitly.
244
-
245
-
246
236
### Authorization
247
237
248
238
By default, the service enables authorization, meaning the service endpoint requires a `dstack` user token.
@@ -341,8 +331,6 @@ Probes are executed for each service replica while the replica is `running`. A p
341
331
??? info "Model"
342
332
If you set the [`model`](#model) property but don't explicitly configure `probes`,
343
333
`dstack`automatically configures a default probe that tests the model using the `/v1/chat/completions` API.
344
-
This default probe sends a minimal chat completion request to verify the model is responding correctly.
345
-
346
334
To disable probes entirely when `model` is set, explicitly set `probes` to an empty list.
347
335
348
336
See the [reference](../reference/dstack.yml/service.md#probes) for more probe configuration options.
@@ -442,6 +430,9 @@ Limits apply to the whole service (all replicas) and per client (by IP). Clients
442
430
If the service runs a model with an OpenAI-compatible interface, you can set the [`model`](#model) property to make the model accessible through `dstack`'s chat UI on the `Models` page.
443
431
In this case, `dstack` will use the service's `/v1/chat/completions` service.
444
432
433
+
When `model` is set, `dstack` automatically configures [`probes`](#probes) to verify model health.
434
+
To customize or disable this, set `probes` explicitly.
435
+
445
436
### Resources
446
437
447
438
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a
0 commit comments