diff --git a/docs.json b/docs.json
index 49168dd..4a50f73 100644
--- a/docs.json
+++ b/docs.json
@@ -201,6 +201,13 @@
"examples/serving-infrastructure/sglang-router-vast"
]
},
+ {
+ "group": "Migration Guides",
+ "icon": "arrow-right-arrow-left",
+ "pages": [
+ "examples/migrations/runpod-to-vast"
+ ]
+ },
{
"group": "AI Agents",
"icon": "robot",
diff --git a/examples/migrations/runpod-to-vast.mdx b/examples/migrations/runpod-to-vast.mdx
new file mode 100644
index 0000000..58f1721
--- /dev/null
+++ b/examples/migrations/runpod-to-vast.mdx
@@ -0,0 +1,758 @@
+---
+title: "Migrate from Runpod to Vast.ai"
+slug: "migrate-from-runpod"
+createdAt: "Mon Mar 17 2026 16:00:00 GMT+0000 (Coordinated Universal Time)"
+updatedAt: "Tue Mar 18 2026 16:00:00 GMT+0000 (Coordinated Universal Time)"
+---
+
+
+
+A guide for users moving GPU workloads from Runpod to Vast.ai. Whether you use Runpod Pods for training and development or Runpod Serverless for inference, this guide maps every concept to its Vast.ai equivalent and walks you through the transition.
+
+**What you gain:** transparent marketplace pricing, a broader GPU selection with transparent specs, and the ability to shop across hosts for the best price-to-performance ratio anywhere.
+
+**What is different:** you have more control. You choose your specific host, see exactly what you are getting, and set the reliability level you need — rather than accepting an opaque allocation. This guide addresses each of these differences head-on so there are no surprises.
+
+
+**What Runpod users need to know about Vast.ai:**
+
+1. **Your existing Runpod images may work as-is** — Many Runpod-compatible Docker images run on Vast with little or no modification.
+2. **Often cheaper for the same GPU** — Marketplace competition drives prices down. You'll frequently find the same hardware at lower rates than fixed-tier providers.
+3. **You pick the individual machine, not just the GPU type** — Every offer shows reliability score, network speed, CPU, location, and other critical specs. Two A100s at the same price can be very different machines. Vast gives you the data to choose the right one.
+4. **Bandwidth is transparent** — Egress is billed per GB at a rate shown on each offer, so you pay only for what you actually use. Providers that advertise "free" bandwidth build those costs into their GPU rates; Vast's marketplace pricing makes compute and bandwidth costs separately visible.
+5. **Set your disk size right at launch** — Resizing requires recreating the container. Storage is cheap — err on the side of more space.
+
+
+## In This Guide
+
+1. [Concept Mapping](#concept-mapping) — Runpod terms → Vast equivalents
+2. [Account Setup](#account-setup)
+3. **[Migrating from Pods](#migrating-from-pods)** — instances, Docker config, storage, networking, SSH, logs, lifecycle/cost
+4. **[Migrating from Serverless](#migrating-from-serverless)** — endpoints, PyWorker
+5. [API and CLI Reference](#api-and-cli-reference) — full side-by-side table
+
+## Concept Mapping
+
+| Runpod | Vast.ai | Notes |
+|---|---|---|
+| Pod | Instance | Docker container with exclusive GPU access |
+| Serverless Endpoint -> Worker | Serverless Endpoint -> Workergroup -> Worker | Vast has managed autoscaling inference — see [Migrating from Serverless](#migrating-from-serverless) |
+| Community Cloud / Secure Cloud | Verified Machines (default) / Secure Cloud | Vast defaults to verified machines; Secure Cloud filters to datacenter-grade hosts |
+| Template | Template / Docker image | Specify a Docker image and configuration at launch |
+| Network Volume | (Local) Volume | Vast local volumes are high-performance on-machine storage; for data that moves across hosts, use object storage (S3, R2, GCS) — see [Storage](#storage) |
+| Hub | [Model Library](/documentation/serverless/getting-started-with-serverless) + [Template Library](/documentation/templates/introduction) | Vast has official templates for specific models in addition to base templates for lower-level control |
+| Pod API | Vast REST API / `vastai` CLI | Full programmatic control over instances |
+| GPU Type selector | Search filters (CUDA, VRAM, price/hr) | Vast is a marketplace — you search and filter offers |
+| On-Demand Pod | On-Demand Instance | Fixed pricing, guaranteed resources |
+| Spot Pod | Interruptible Instance | You set a max $/hr; higher-priority on-demand renters can displace you |
+| Savings Plan | Reserved Instance | Pre-pay an on-demand instance for up to 50% discount |
+| Runpod Console | [Vast Console](https://cloud.vast.ai) | Web UI for managing instances, billing, and templates |
+
+## Account Setup
+
+1. **Create an account** at [cloud.vast.ai](https://cloud.vast.ai)
+2. **Add credits** — Similar to Runpod, Vast is prepaid. Add funds via the [Billing page](https://cloud.vast.ai/billing/) before renting.
+3. **Add your SSH public key** at [cloud.vast.ai/manage-keys/](https://cloud.vast.ai/manage-keys/). If you do not have one, generate it with `ssh-keygen -t ed25519`. Keys are applied at container creation time — if you forgot, use the SSH key button on the instance card to add one without recreating.
+
+That's all you need to get started via the console. **If you plan to use the CLI or REST API**, also:
+
+4. **Generate an API key** at [API Keys](https://cloud.vast.ai/api-keys/) and authenticate:
+
+```bash Vast CLI
+pip install vastai
+vastai set api-key
+```
+
+## Migrating from Pods
+
+A Runpod **Pod** is a Docker container running on a GPU-equipped machine — you pick a GPU type and a template (Docker image + config), and Runpod assigns you a machine from its managed fleet. The Vast.ai equivalent is an **Instance**: also a Docker container with exclusive GPU access, but rented from an open marketplace of independent hosts rather than a single provider.
+
+The core workflow is the same — pick a GPU, choose an image, launch — but how you find that GPU is different.
+
+### Finding and Creating Instances
+
+Runpod presents a curated list of GPUs at set prices. Vast.ai is a marketplace: hosts list their machines with specs and asking prices, and you search through available offers using filters. Two A100 80GB offers at the same price can be on very different machines — Vast surfaces reliability scores, network speeds, and location for every offer so you can pick the right one for your workload.
+
+**Always check these before renting:**
+- **Reliability score** — historical uptime percentage. Look for 0.95+ for production workloads. If reliability is critical, filter for Secure Cloud machines only.
+- **Network speed** — `inet_down` and `inet_up` in Mbps. Matters for model downloads and data transfer.
+- **Geolocation** — filter by region for latency-sensitive workloads.
+
+
+
+ 1. Go to [Search](https://cloud.vast.ai/create/)
+ 2. Use the GPU type, VRAM, reliability, and region filters to narrow results
+ 3. Review each offer's reliability score and network speed before renting
+ 4. Click **Rent** on your chosen offer and configure image, disk, and Docker options in the dialog
+
+
+
+
+ ```bash Runpod API
+ curl -X POST "https://rest.runpod.io/v1/pods" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "my-pod",
+ "imageName": "pytorch/pytorch:latest",
+ "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
+ "containerDiskInGb": 50,
+ "cloudType": "SECURE"
+ }'
+ ```
+
+ ```bash Vast CLI
+ # Step 1: Search for matching offers
+ vastai search offers 'gpu_name = RTX_4090 reliability > 0.95' \
+ --order "dph_base" --limit 5
+
+ # Step 2: Create instance from an offer ID
+ vastai create instance \
+ --image pytorch/pytorch:latest \
+ --disk 50
+ ```
+
+
+
+
+ Common GPU searches:
+
+ ```bash Vast CLI
+ # RTX 4090 (24GB) — popular for inference
+ vastai search offers 'gpu_name = RTX_4090' --order "dph_base" --limit 5
+
+ # A100 80GB — training and large model inference
+ vastai search offers 'gpu_name in ["A100 SXM4","A100 PCIE"] gpu_ram >= 80' --order "dph_base" --limit 5
+
+ # H100 — highest performance
+ vastai search offers 'gpu_name in ["H100 SXM","H100 NVL"]' --order "dph_base" --limit 5
+
+ # Multi-GPU for distributed training
+ vastai search offers 'num_gpus >= 4 gpu_ram >= 80' --order "dph_base" --limit 5
+
+ # US only
+ vastai search offers 'gpu_ram >= 24 geolocation = US' --order "dph_base" --limit 10
+ ```
+
+
+
+### Docker Environment
+
+#### Images
+
+If you have a working Runpod template, you likely already have a Docker image that works on Vast. Many Runpod-compatible images may run as-is — just specify the image in the `--image` flag.
+
+To minimize cold start times:
+- Use **Vast base images** which are pre-cached on many hosts
+- Use smaller, optimized images where possible
+- For very large images, build on top of a pre-cached base
+
+#### Environment Variables
+
+On Runpod, environment variables are set in the template UI or passed as a JSON object in the API. Vast works the same way.
+
+
+
+ In the instance creation dialog or template editor, set environment variables using the GUI fields or the **Docker Options** field with Docker syntax:
+
+ ```
+ -e HF_TOKEN=hf_xxxxx -e MODEL_NAME=meta-llama/Llama-3-8B
+ ```
+
+
+
+
+ ```bash Runpod API
+ curl -X POST "https://rest.runpod.io/v1/pods" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "my-pod",
+ "imageName": "your-image:latest",
+ "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
+ "env": {
+ "HF_TOKEN": "hf_xxxxx",
+ "MODEL_NAME": "meta-llama/Llama-3-8B"
+ }
+ }'
+ ```
+
+ ```bash Vast CLI
+ vastai create instance \
+ --image your-image:latest \
+ --env "-e HF_TOKEN=hf_xxxxx -e MODEL_NAME=meta-llama/Llama-3-8B" \
+ --disk 100
+ ```
+
+
+
+
+ The Vast CLI `--env` flag takes **raw Docker run options**, not just `KEY=VALUE` pairs. Port mappings go here too — for example: `--env "-e MY_VAR=foo -p 8080:8080"`.
+
+
+
+
+#### Entrypoint Arguments
+
+Runpod's "Docker Command" field passes arguments to the container's ENTRYPOINT (`"dockerStartCmd"` in the API). On Vast, use `--args`.
+
+
+
+ In the template editor or instance creation dialog, enter entrypoint arguments in the **Docker Options** field.
+
+
+
+
+ ```bash Runpod API
+ curl -X POST "https://rest.runpod.io/v1/pods" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "vllm-server",
+ "imageName": "vllm/vllm-openai:latest",
+ "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
+ "containerDiskInGb": 40,
+ "ports": ["8000/http"],
+ "dockerStartCmd": ["--model", "Qwen/Qwen2.5-7B-Instruct", "--max-model-len", "4096"]
+ }'
+ ```
+
+ ```bash Vast CLI
+ vastai create instance \
+ --image vllm/vllm-openai:latest \
+ --env "-p 8000:8000" \
+ --disk 40 \
+ --args --model Qwen/Qwen2.5-7B-Instruct --max-model-len 4096
+ ```
+
+
+
+
+
+#### Startup Scripts
+
+Vast has an on-start script that runs a shell command *after* the container starts. Runpod does not have a direct equivalent — the closest is baking the command into a custom Docker image.
+
+
+
+ In the template editor or instance creation dialog, enter your startup commands in the **On-start Script** field.
+
+
+ ```bash Vast CLI
+ vastai create instance \
+ --image pytorch/pytorch:latest \
+ --env "-p 8000:8000" \
+ --disk 50 \
+ --onstart-cmd "cd /workspace && pip install -r requirements.txt && python server.py"
+ ```
+
+
+
+#### Converting a Runpod Template
+
+Runpod templates bundle an image, environment, ports, and a Docker command into a reusable config. Here is how each field maps to Vast:
+
+| Runpod template field | Vast console | Vast CLI flag |
+|---|---|---|
+| Container Image (`imageName`) | Image field | `--image` |
+| Container Disk (`containerDiskInGb`) | Disk field | `--disk` |
+| Exposed Ports (`"8000/http"`) | Docker Options: `-p 8000:8000` | `--env "-p 8000:8000"` |
+| Environment Variables (`env: {...}`) | Docker Options: `-e KEY=VALUE` | `--env "-e KEY=VALUE"` |
+| Docker Command (`dockerStartCmd`) | Docker Options (entrypoint args) | `--args` |
+| *(no equivalent)* | On-start Script | `--onstart-cmd` |
+
+**Full example** — a Runpod vLLM template converted to Vast:
+
+
+
+```bash Runpod (API)
+curl -X POST "https://rest.runpod.io/v1/pods" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "imageName": "vllm/vllm-openai:latest",
+ "containerDiskInGb": 40,
+ "ports": ["8000/http"],
+ "env": {"HF_TOKEN": "hf_xxxxx"},
+ "dockerStartCmd": ["--model", "meta-llama/Llama-3.1-8B-Instruct"]
+ }'
+```
+
+```bash Vast CLI
+vastai create instance \
+ --image vllm/vllm-openai:latest \
+ --disk 40 \
+ --env "-p 8000:8000 -e HF_TOKEN=hf_xxxxx" \
+ --args --model meta-llama/Llama-3.1-8B-Instruct
+```
+
+
+
+Once you have a working configuration, save it as a reusable **Vast template** in the [console](https://cloud.vast.ai/templates/) — see the [Templates guide](/documentation/templates/introduction) for details.
+
+### Storage
+
+For model weights and datasets, the recommended approach is to pull from object storage on boot — this works across any host and keeps your instance stateless:
+
+- **Object storage** (S3, R2, GCS) — pull weights on boot. Most flexible, works across any host.
+- **[Cloud Sync](/documentation/instances/storage/cloud-sync)** — Vast's built-in sync tool supports S3, Google Drive, Backblaze, and Dropbox. Access it via the console or `vastai cloud copy`. Docker instances only; use IAM credentials with S3 for bucket-scoped access rather than account-level credentials.
+
+
+Vast local volumes are tied to the physical machine they were created on. For data that needs to persist across instances or move to a new host, use cloud object storage (S3, R2, GCS) — a more portable and provider-agnostic approach than any single vendor's proprietary network volume.
+
+
+### Networking and Ports
+
+Both platforms provide proxy access to services. On Runpod, proxy URLs are static: `https://-.proxy.runpod.net`. On Vast, there are two proxy mechanisms:
+
+- **HTTP/HTTPS proxy** — instances using [Vast base images](https://github.com/vast-ai/base-image/) get auto-generated Cloudflare tunnel URLs (`https://four-random-words.trycloudflare.com`) per open port via the [Instance Portal](/documentation/instances/connect/instance-portal).
+- **SSH proxy** — instances using SSH-compatible images support proxy SSH through Vast's proxy server, which works even on machines without open ports. Direct SSH (faster) is preferred when available.
+
+Instances also have direct access via a **random external port** on the host's public IP, which you discover after launch.
+
+#### Declaring Ports at Launch
+
+
+
+ In the instance creation dialog or template editor, enter port mappings in the **Docker Options** field:
+
+ ```
+ -p 8000:8000 -p 8080:8080
+ ```
+
+
+
+
+ ```bash Runpod API
+ # Ports configured in template UI or API "ports" field
+ # e.g. "ports": ["8000/http", "22/tcp"]
+ # Accessed via proxy URL: https://-8000.proxy.runpod.net
+ ```
+
+ ```bash Vast CLI
+ vastai create instance \
+ --image your-image:latest \
+ --env "-p 8000:8000 -p 8080:8080" \
+ --disk 50
+ ```
+
+
+
+
+
+**Limits:** Maximum 64 open ports per container. For a stable external port number, use internal ports above 70000 — these are identity-mapped (the external port matches the internal port).
+
+#### Discovering Your External Port
+
+
+
+ After the instance starts, click the **Open Ports** button on the instance card to see the external port mapping.
+
+
+
+
+ ```bash Runpod
+ # Access via proxy URL (automatic)
+ curl https://-8000.proxy.runpod.net/v1/models
+ ```
+
+ ```bash Vast CLI
+ vastai show instance --raw | jq '.ports'
+ ```
+
+
+
+
+
+Vast also sets `VAST_TCP_PORT_` environment variables inside the container for each mapped port. Use these in your application code to construct external URLs.
+
+### Connecting to Your Instance
+
+#### SSH
+
+On Runpod, you SSH into a pod using the connection info from the console. On Vast, SSH uses key-only authentication (make sure you've added your public key in [Account Setup](#account-setup)).
+
+
+
+ Click the **SSH** button on the instance card to see the full connection command, then paste it into your terminal.
+
+
+
+
+ ```bash Runpod
+ # SSH connection string from Runpod console
+ ssh @ssh.runpod.io -i ~/.ssh/id_ed25519
+ ```
+
+ ```bash Vast CLI
+ # Get the full SSH connection command
+ vastai ssh-url
+ # Example output: ssh -p 20544 root@142.214.185.187
+
+ # Connect directly
+ $(vastai ssh-url )
+ ```
+
+
+
+
+
+Vast instances connect to a tmux session by default. Use `Ctrl+B C` to open a new window and `Ctrl+B N` to cycle between windows. To disable tmux, create `~/.no_auto_tmux` inside the container.
+
+**Port forwarding** works the same as any SSH connection:
+
+```bash Bash
+ssh -p root@ -L 8080:localhost:8080
+```
+
+#### Jupyter / IDE Access
+
+Both platforms support JupyterLab. On Vast, Jupyter is available out of the box with two access modes:
+
+- **Proxy mode** (default): Click the Jupyter button in the [Vast console](https://cloud.vast.ai). Works immediately, no setup needed.
+- **Direct HTTPS**: Faster connection that bypasses the proxy. Requires installing the Vast TLS root certificate (`jvastai_root.cer`) from the [Vast console](https://cloud.vast.ai).
+
+Both platforms support VS Code / Cursor via Remote-SSH.
+
+#### Logs
+
+
+
+ Click the **Logs** button on the instance card to view live output.
+
+
+ ```bash Vast CLI
+ vastai logs
+
+ # Tail the last 50 lines
+ vastai logs --tail 50
+ ```
+
+
+ Vast logs may take a short time to become available after an instance starts. If you see "waiting on logs" messages, retry after a moment.
+
+
+
+
+### Instance Lifecycle and Cost
+
+#### Start, Stop, and Destroy
+
+
+
+ Use the buttons on the instance card: **Stop** to pause compute, **Destroy** to delete the instance and stop all charges.
+
+
+
+
+ ```bash Runpod API
+ # Stop pod
+ curl -X POST "https://rest.runpod.io/v1/pods/$POD_ID/stop" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY"
+
+ # Terminate pod
+ curl -X DELETE "https://rest.runpod.io/v1/pods/$POD_ID" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY"
+ ```
+
+ ```bash Vast CLI
+ # Stop — pauses compute, storage keeps billing
+ vastai stop instance
+
+ # Destroy — stops all charges, deletes all data
+ vastai destroy instance
+ ```
+
+
+
+
+
+| Action | Compute Charges | Storage Charges | Data Preserved |
+|---|---|---|---|
+| **Stop** | Stop | **Continue** | Yes |
+| **Destroy** | Stop | Stop | **No** |
+
+**If you are done with an instance, destroy it.** Stopped instances continue to cost money for storage. Only stop instances you plan to resume soon.
+
+#### On-Demand vs Interruptible
+
+| Type | Runpod Equivalent | Price | Interruption Risk |
+|---|---|---|---|
+| On-Demand | On-Demand Pod | Standard marketplace rate | None — guaranteed |
+| Interruptible | Spot Pod | Often significantly cheaper | Can be displaced by on-demand renters |
+
+
+
+ On the [Search](https://cloud.vast.ai/create/) page, use the **Instance Type** toggle to switch between on-demand and interruptible offers before renting.
+
+
+
+
+ ```bash Runpod API
+ # On-Demand (Secure Cloud)
+ curl -X POST "https://rest.runpod.io/v1/pods" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "my-pod",
+ "imageName": "pytorch/pytorch:latest",
+ "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
+ "cloudType": "SECURE"
+ }'
+
+ # Spot (Community Cloud)
+ curl -X POST "https://rest.runpod.io/v1/pods" \
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "my-pod",
+ "imageName": "pytorch/pytorch:latest",
+ "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
+ "cloudType": "COMMUNITY"
+ }'
+ ```
+
+ ```bash Vast CLI
+ # On-demand — guaranteed, higher price
+ vastai search offers 'gpu_ram >= 24' --type=on-demand --order "dph_base" --limit 10
+
+ # Interruptible — cheapest, may be interrupted
+ vastai search offers 'gpu_ram >= 24' --type=bid --order "dph_base" --limit 10
+ ```
+
+
+
+
+
+Use interruptible instances for batch inference, training with checkpointing, or any workload that can be restarted if interrupted. Save checkpoints frequently.
+
+#### Reserved Instances
+
+If you run an on-demand instance for days or weeks, convert it to a **reserved instance** for up to 50% savings. Reserved instances lock in a discounted rate in exchange for a time commitment:
+
+
+
+ Not all machines support reserved pricing. To find eligible machines before renting, go to [Search](https://cloud.vast.ai/create/) and switch the **On-demand** filter to **Reserved**. After renting, go to the **Instances** page and click the **green discount badge** on your instance card to open the pre-payment dialog.
+
+
+ ```bash Vast CLI
+ vastai prepay instance 100
+ ```
+
+
+
+
+Reserved instances cannot migrate between hosts. If the host machine goes down, your reservation is tied to that machine.
+
+
+### Next Steps
+
+
+
+ How Vast instances work — GPU access, billing, and connectivity
+
+
+ How compute, storage, and bandwidth charges work
+
+
+
+## Migrating from Serverless
+
+Runpod **Serverless** lets you deploy a handler function that scales to zero — you send a request, Runpod spins up a worker, runs your handler, and tears it down. You pay per second of compute, not for idle GPUs.
+
+Vast **Serverless** delivers autoscaling inference at marketplace rates — no usage tiers, no hidden surcharges, just per-second billing across 68+ GPU types globally. Vast handles routing, queueing, and autoscaling automatically. You can deploy using a pre-built template (vLLM, TGI, ComfyUI) or implement a custom handler with PyWorker — analogous to RunPod's handler pattern.
+
+**Pricing:** Runpod charges a premium for Serverless GPU time on top of the base instance cost. On Vast, Serverless workers run on the same marketplace instances you'd rent directly — you pay the same rate, just with autoscaling on top.
+
+### Vast Serverless Architecture
+
+The system has three layers:
+
+| Component | Purpose |
+|---|---|
+| **Endpoint** | Routes requests, manages autoscaling |
+| **Workergroup** | Defines what code runs and how workers are recruited |
+| **Worker** | Individual GPU instance running your model via PyWorker |
+
+### Deployment Options
+
+Vast Serverless supports two first-class deployment paths:
+
+**Pre-built Templates** — select a template that runs a standard model server behind Vast's managed proxy. Vast provides templates for common frameworks:
+- **vLLM** — LLM inference with OpenAI-compatible API
+- **TGI** — HuggingFace Text Generation Inference
+- **ComfyUI** — Image generation workflows
+
+**PyWorker (Custom Handlers)** — implement a handler function in Python, analogous to RunPod's handler pattern. PyWorker gives you full control over request preprocessing, model loading, and response formatting. See the [PyWorker documentation](/documentation/serverless/creating-new-pyworkers) to get started.
+
+### Creating Endpoints and Workergroups
+
+
+
+ 1. Go to [Serverless](https://cloud.vast.ai/serverless/) in the console
+ 2. Click **New Endpoint** and configure name, max workers, and scaling parameters
+ 3. Add a workergroup by selecting a pre-built template (vLLM, TGI, ComfyUI) and GPU requirements
+
+
+
+
+ ```bash Runpod
+ # Create a serverless endpoint from a template
+ runpodctl create endpoint \
+ --name "my-llm-endpoint" \
+ --templateId "runpod-vllm" \
+ --gpuIds "NVIDIA RTX A5000" \
+ --minWorkers 0 \
+ --maxWorkers 5
+ ```
+
+ ```bash Vast CLI
+ # Create an endpoint with scaling parameters
+ vastai create endpoint \
+ --endpoint_name "my-llm-endpoint" \
+ --max_workers 5 \
+ --cold_workers 1 \
+ --target_util 0.9
+
+ # Create a workergroup using a pre-built vLLM template
+ vastai create workergroup \
+ --endpoint_name "my-llm-endpoint" \
+ --template_hash \
+ --gpu_ram 24
+ ```
+
+
+
+
+
+### Calling Your Endpoint
+
+Install the Vast SDK:
+
+```bash
+pip install vastai_sdk
+```
+
+
+
+```python Runpod
+import runpod
+import os
+
+runpod.api_key = os.environ["RUNPOD_API_KEY"]
+endpoint = runpod.Endpoint("your-endpoint-id")
+
+result = endpoint.run_sync({"input": {"prompt": "Explain quantum computing"}})
+print(result["output"])
+```
+
+```python Vast.ai
+import asyncio
+from vastai import Serverless
+
+MAX_TOKENS = 100
+
+async def main():
+ client = Serverless() # Uses VAST_API_KEY environment variable
+ endpoint = await client.get_endpoint(name="vLLM-Qwen3-8B")
+
+ payload = {
+ "model": "Qwen/Qwen3-8B",
+ "prompt": "Explain quantum computing in simple terms",
+ "max_tokens": MAX_TOKENS,
+ "temperature": 0.7
+ }
+
+ result = await endpoint.request("/v1/completions", payload, cost=MAX_TOKENS)
+ print(result["response"]["choices"][0]["text"])
+
+ await client.close()
+
+asyncio.run(main())
+```
+
+
+
+### Next Steps
+
+
+
+ Deploy your first serverless endpoint
+
+
+ Pay-as-you-go billing, cold workers, and endpoint suspension
+
+
+
+## API and CLI Reference
+
+
+For most use cases, the `vastai` CLI or [Python SDK](https://docs.vast.ai/sdk) are simpler than
+calling the REST API directly. The table below maps operations to their API endpoints for
+reference — see the [API Reference](/api-reference) for full documentation.
+
+
+| Task | Runpod API | Vast CLI | Vast API |
+|---|---|---|---|
+| Authenticate | `Authorization: Bearer ` | `vastai set api-key ` | `Authorization: Bearer ` |
+| Search GPUs | `GET /v1/pods/gpu-types` | `vastai search offers ''` | `POST /api/v0/bundles/` |
+| Create instance | `POST /v1/pods` | `vastai create instance --image img` | `PUT /api/v0/asks//` |
+| List instances | `GET /v1/pods` | `vastai show instances` | `GET /api/v0/instances/` |
+| Show instance | `GET /v1/pods/` | `vastai show instance ` | `GET /api/v0/instances//` |
+| Start instance | `POST /v1/pods//start` | `vastai start instance ` | `PUT /api/v0/instances//` `{"state":"running"}` |
+| Stop instance | `POST /v1/pods//stop` | `vastai stop instance ` | `PUT /api/v0/instances//` `{"state":"stopped"}` |
+| Destroy instance | `DELETE /v1/pods/` | `vastai destroy instance ` | `DELETE /api/v0/instances//` |
+| View logs | *(console only)* | `vastai logs ` | `PUT /api/v0/instances/request_logs//` (returns S3 URL) |
+| SSH connection | `ssh @ssh.runpod.io` | `vastai ssh-url ` | — |
+| Create endpoint | — | `vastai create endpoint --endpoint_name "x"` | `POST /api/v0/endptjobs/` |
+| Create workergroup | — | `vastai create workergroup --endpoint_name "x"` | `POST /api/v0/workergroups/` |
+| Reserve instance | — | `vastai prepay instance ` | — |
+
+For programmatic usage, the Vast CLI supports `--raw` for JSON output that can be parsed in scripts. See the [CLI Reference](/cli/commands) and [API Reference](/api-reference) for full documentation.
+
+## Next Steps
+
+- [CLI Reference](/cli/commands) — full command reference for the `vastai` CLI
+- [API Reference](/api-reference) — REST API documentation
+- [Community Discord](https://discord.gg/vast) — get help from the Vast.ai community
+- [24/7 Technical Support](https://cloud.vast.ai) — live chat support available around the clock from the Vast console