Skip to content

Commit 1254c68

Browse files
[Blog] How Toffee streamlines inference and cut GPU costs with dstack
Minor edit
1 parent 46a3687 commit 1254c68

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/blog/posts/toffee.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ They needed **a unified orchestration layer** that:
4242

4343
> *Since we switched to `dstack`, we’ve cut the overhead of GPU-cloud orchestration by more than 50%. What used to take hours of custom Terraform + CLI scripting now deploys in minutes with a single declarative config — freeing us to focus on modelling, not infrastructure.*
4444
>
45-
> *— Nikita Shupeyko, AI/ML & Cloud Infrastructure Architect at Toffee*
45+
> *[Nikita Shupeyko](https://www.linkedin.com/in/nikita-shupeyko/), AI/ML & Cloud Infrastructure Architect at Toffee*
4646
4747
Toffee primarily uses these `dstack` components:
4848

@@ -70,7 +70,7 @@ Beyond oechestration, Toffee relies on `dstack`’s UI as a central observabilit
7070

7171
> *Thanks to dstack’s seamless integration with GPU neoclouds like RunPod and Vast.ai, we’ve been able to shift most workloads off hyperscalers — reducing our effective GPU spend by roughly 2–3× without changing a single line of model code.*
7272
>
73-
> *— Nikita Shupeyko, Machine Learning Platform Engineer at Toffee*
73+
> *[Nikita Shupeyko](https://www.linkedin.com/in/nikita-shupeyko/), AI/ML & Cloud Infrastructure Architect at Toffee*
7474
7575
Before adopting `dstack`, there were serious drawbacks:
7676

0 commit comments

Comments
 (0)