LLM Inference Bake-Off

LLM serving with vLLM and SGLang on Modal.

Overview

Run three experiments to measure different features:

Baseline comparison
Multi-token prediction
Prefix caching
Concurrent requests

Experiments 1-3 use a single request only and show their outputs in a text streaming UI based on FastAPI.

For more detail, see the blog post on Tensorlabbet.

Setup

# Install dependencies
uv sync

# Authenticate with Modal (this requires an account)
uv run modal setup

Deployment

Experiments 1, 2 and 3

uv run modal serve src/llminferencebakeoff/serve.py

Single command deploys everything:

HuggingFace Transformers backend (naive baseline, no continuous batching)
SGLang backend
vLLM backend with PagedAttention
3-way comparison UI

Open the URL and watch real-time performance comparison across all three backends.

Modify config.py to run the different experiments.

First request behavior:

GPU containers spin up on-demand when you click "Generate Comparison"
HuggingFace Transformers: ~35-40s (model loading only)
SGLang/vLLM: ~60-90s (model loading + CUDA graph compilation)
Subsequent requests (within 5 minutes): ~4-5s for all backends
Containers automatically scale down after 5 minutes of inactivity

Cost warning: min_containers=1 keeps all three GPU containers running continuously. Terminate with ctrl + c or run modal app stop LlmInferenceBakeOff when not in use.

Experiment 4

uv run modal run src/llminferencebakeoff/benchmark_concurrent.py

Note that the HuggingFace Transformers backend is currently commented-out to save costs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
src/llminferencebakeoff		src/llminferencebakeoff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Inference Bake-Off

Overview

Setup

Deployment

Experiments 1, 2 and 3

Experiment 4

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Inference Bake-Off

Overview

Setup

Deployment

Experiments 1, 2 and 3

Experiment 4

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages