[Blog] How Toffee streamlines inference and cut GPU costs with dstack

peterschmidt85 · peterschmidt85 · commit 1254c6893e04 · 2025-12-04T20:08:04.000+01:00
Minor edit
diff --git a/docs/blog/posts/toffee.md b/docs/blog/posts/toffee.md
@@ -42,7 +42,7 @@ They needed **a unified orchestration layer** that:
 
 > *Since we switched to `dstack`, we’ve cut the overhead of GPU-cloud orchestration by more than 50%. What used to take hours of custom Terraform + CLI scripting now deploys in minutes with a single declarative config — freeing us to focus on modelling, not infrastructure.*
 >
-> *— Nikita Shupeyko, AI/ML & Cloud Infrastructure Architect at Toffee*
+> *— [Nikita Shupeyko](https://www.linkedin.com/in/nikita-shupeyko/), AI/ML & Cloud Infrastructure Architect at Toffee*
 
 Toffee primarily uses these `dstack` components:
 
@@ -70,7 +70,7 @@ Beyond oechestration, Toffee relies on `dstack`’s UI as a central observabilit
 
 > *Thanks to dstack’s seamless integration with GPU neoclouds like RunPod and Vast.ai, we’ve been able to shift most workloads off hyperscalers — reducing our effective GPU spend by roughly 2–3× without changing a single line of model code.*
 >
-> *— Nikita Shupeyko, Machine Learning Platform Engineer at Toffee*
+> *— [Nikita Shupeyko](https://www.linkedin.com/in/nikita-shupeyko/), AI/ML & Cloud Infrastructure Architect at Toffee*
 
 Before adopting `dstack`, there were serious drawbacks:
 

Original file line number	Diff line number	Diff line change
`@@ -42,7 +42,7 @@ They needed a unified orchestration layer that:`
`42`	`42`
`43`	`43`	> Since we switched to `dstack`, we’ve cut the overhead of GPU-cloud orchestration by more than 50%. What used to take hours of custom Terraform + CLI scripting now deploys in minutes with a single declarative config — freeing us to focus on modelling, not infrastructure.
`44`	`44`	`>`
`45`		`-> — Nikita Shupeyko, AI/ML & Cloud Infrastructure Architect at Toffee`
	`45`	`+> — [Nikita Shupeyko](https://www.linkedin.com/in/nikita-shupeyko/), AI/ML & Cloud Infrastructure Architect at Toffee`
`46`	`46`
`47`	`47`	Toffee primarily uses these `dstack` components:
`48`	`48`
@@ -70,7 +70,7 @@ Beyond oechestration, Toffee relies on `dstack`’s UI as a central observabilit
`70`	`70`
`71`	`71`	`> Thanks to dstack’s seamless integration with GPU neoclouds like RunPod and Vast.ai, we’ve been able to shift most workloads off hyperscalers — reducing our effective GPU spend by roughly 2–3× without changing a single line of model code.`
`72`	`72`	`>`
`73`		`-> — Nikita Shupeyko, Machine Learning Platform Engineer at Toffee`
	`73`	`+> — [Nikita Shupeyko](https://www.linkedin.com/in/nikita-shupeyko/), AI/ML & Cloud Infrastructure Architect at Toffee`
`74`	`74`
`75`	`75`	Before adopting `dstack`, there were serious drawbacks:
`76`	`76`