Conversation
Two-track guide (Pods and Serverless) with three-tab code examples showing RunPod API, Vast CLI, and Vast REST API side-by-side. Covers instance creation, Docker config, networking, storage, logs, lifecycle management, PyWorker migration, and a full API/CLI reference table.
The serverless migration section previously led with a PyWorker vs RunPod handler code comparison, making it look like writing custom PyWorker code was required. Now it leads with pre-built templates (vLLM, TGI, ComfyUI), adds a "Calling Your Endpoint" section with SDK client code from the official quickstart, and links to PyWorker docs for advanced users only.
| ```bash Vast API | ||
| # Create an endpoint | ||
| curl -X POST "https://console.vast.ai/api/v0/endptjobs/" \ | ||
| -H "Authorization: Bearer $VAST_API_KEY" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "endpoint_name": "my-llm-endpoint", | ||
| "max_workers": 5, | ||
| "cold_workers": 1, | ||
| "target_util": 0.9 | ||
| }' | ||
|
|
||
| # Create a workergroup | ||
| curl -X POST "https://console.vast.ai/api/v0/workergroups/" \ | ||
| -H "Authorization: Bearer $VAST_API_KEY" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "endpoint_name": "my-llm-endpoint", | ||
| "template_hash": "<TEMPLATE_HASH>", | ||
| "gpu_ram": 24 | ||
| }' | ||
| ``` | ||
|
|
||
| </CodeGroup> |
There was a problem hiding this comment.
We should not be including the API in guides.
|
|
||
| ## Migrating from Serverless | ||
|
|
||
| RunPod **Serverless** lets you deploy a handler function that scales to zero — you send a request, RunPod spins up a worker, runs your handler, and tears it down. You pay per second of compute, not for idle GPUs. |
There was a problem hiding this comment.
Why are we advertising why RunPod is better?
|
|
||
| RunPod **Serverless** lets you deploy a handler function that scales to zero — you send a request, RunPod spins up a worker, runs your handler, and tears it down. You pay per second of compute, not for idle GPUs. | ||
|
|
||
| Vast **Serverless** serves the same purpose — autoscaling inference without managing instances — but the architecture is different. Instead of wrapping a handler function, you pick a pre-built template (vLLM, TGI, ComfyUI) and Vast runs it behind a managed proxy that handles routing, queueing, and autoscaling. For most migrations, no custom code is needed. |
There was a problem hiding this comment.
The serverless templates are just that -- templates. By no means is the Vast Serverless platform built around these templates fundamentally. They mostly serve as examples. We expect people to implement their own templates, their own API wrappers, and configure their endpoints. Yes, you can use the templates out of the box, but this is more akin to using RunPod's pre-built templates. You are absolutely implementing handlers for Vast serverless, just in a different way than RunPod.
- Replace cloud.vast.ai/search/ → cloud.vast.ai/create/ (3 occurrences) - Replace "25–50% discount/savings" → "up to 50%" to match official docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@wbrennan899 @LucasArmandVast I will get more eyeballs on this |
| 1. **Your existing Runpod images will likely work as-is** — Most Runpod-compatible Docker images run on Vast with minimal or no modification. | ||
| 2. **Often cheaper for the same GPU** — Marketplace competition drives prices down. You'll frequently find the same hardware at lower rates than fixed-tier providers. | ||
| 3. **You pick the individual machine, not just the GPU type** — Every offer shows reliability score, network speed, CPU, location, and other critical specs. Two A100s at the same price can be very different machines. Vast gives you the data to choose the right one. | ||
| 4. **Bandwidth is metered** — Runpod includes free bandwidth; on Vast, egress is charged per GB at a rate shown on each offer (typically much lower than AWS).5. **Set your disk size right at launch** — Resizing requires recreating the container. Storage is cheap — err on the side of more space. |
There was a problem hiding this comment.
Vast charges for both inbound and outbound data transfers. Users will often pull large models so we should be very transparent here
| | Pod | Instance | Docker container with exclusive GPU access | | ||
| | Serverless Endpoint -> Worker | Serverless Endpoint -> Workergroup -> Worker | Vast has managed autoscaling inference — see [Migrating from Serverless](#migrating-from-serverless) | | ||
| | Template | Template / Docker image | Specify a Docker image and configuration at launch | | ||
| | Network Volume | (Local) Volume | Vast volumes are currently local to one machine, not network-portable — see [Storage](#storage) | |
There was a problem hiding this comment.
Our volumes are more closely related to RP volume-disk. Benefit is attachment to one of many GPUs in a single node
| | Serverless Endpoint -> Worker | Serverless Endpoint -> Workergroup -> Worker | Vast has managed autoscaling inference — see [Migrating from Serverless](#migrating-from-serverless) | | ||
| | Template | Template / Docker image | Specify a Docker image and configuration at launch | | ||
| | Network Volume | (Local) Volume | Vast volumes are currently local to one machine, not network-portable — see [Storage](#storage) | | ||
| | Hub | [Model Library](/documentation/serverless/getting-started-with-serverless) + [Template Library](/documentation/templates/introduction) | Vast has official templates for specific models in addition to base templates for lower-level control | |
There was a problem hiding this comment.
Official templates for many popular inference engines & applications along with specific model configs through the model library - Or similar?
| If you have a working Runpod template, you likely already have a Docker image that works on Vast. Most Runpod-compatible images run as-is — just specify the image in the `--image` flag. | ||
|
|
||
| To minimize cold start times: | ||
| - Use **Vast base images** which are pre-cached on many hosts |
There was a problem hiding this comment.
Needs further explanation and link to the base-image github/dockerhub repos. Docs there should help users
|
|
||
| Both platforms provide proxy access to services. On Runpod, proxy URLs are static: `https://<POD_ID>-<PORT>.proxy.runpod.net`. On Vast, there are two proxy mechanisms: | ||
|
|
||
| - **HTTP/HTTPS proxy** — instances using [Vast base images](https://github.com/vast-ai/base-image/) get auto-generated Cloudflare tunnel URLs (`https://four-random-words.trycloudflare.com`) per open port via the [Instance Portal](/documentation/instances/connect/instance-portal). |
There was a problem hiding this comment.
Cloudflare tunnels are best effort and may not always be available. User can configure their instance to use the built-in jupyter cert to ensure TLS
| Runpod **Serverless** lets you deploy a handler function that scales to zero — you send a request, Runpod spins up a worker, runs your handler, and tears it down. You pay per second of compute, not for idle GPUs. | ||
|
|
||
| Vast **Serverless** delivers autoscaling inference at marketplace rates — no usage tiers, no hidden surcharges, just per-second billing across 68+ GPU types globally. Rather than wrapping a handler function, you select a pre-built template (vLLM, TGI, ComfyUI) and Vast handles routing, queueing, and autoscaling automatically. | ||
|
|
||
| **Pricing:** Runpod charges a premium for Serverless GPU time on top of the base instance cost. On Vast, Serverless workers run on the same marketplace instances you'd rent directly — you pay the same rate, just with autoscaling on top. |
There was a problem hiding this comment.
Just want to point out that having "—" everywhere makes it very obvious that this text was LLM generated. Even if it wasn't, people will assume that it was. That's not necessarily a problem, but feels a bit unprofessional to me. Just my opinion though.
Summary
Adds a comprehensive migration guide for users moving GPU workloads from RunPod to Vast.ai. The guide covers the full surface area: account setup, instance management (Pods → Instances), serverless migration, networking, and API/CLI reference.
What's included
docs.jsonunder Examples → MigrationsRevisions (post-review)
Vast APIcurl blocks from tutorial sections; users are nudged toward thevastaiCLI and Python SDK instead. CodeGroups with only one remaining block were unwrapped. A note at the API reference table directs API users to the reference docs.