Skip to content

Clean up inference benchmark#10

Open
lewtun wants to merge 8 commits into
mainfrom
clean-up
Open

Clean up inference benchmark#10
lewtun wants to merge 8 commits into
mainfrom
clean-up

Conversation

@lewtun
Copy link
Copy Markdown
Member

@lewtun lewtun commented May 18, 2026

Current results

Model Backend Batch bp/s
Carbon-500M vLLM dynamic 43,129.69
GENERator 1.2B vLLM dynamic 21,584.02
Carbon-500M-remote vLLM dynamic 20,831.06
Carbon-3B vLLM dynamic 18,915.19
GENERator 3B vLLM dynamic 14,161.35
Carbon-8B-remote vLLM dynamic 12,319.61
Carbon-3B-remote vLLM dynamic 12,288.37
Carbon-8B vLLM dynamic 12,036.17
Evo2 1B Vortex 16 685.76
Evo2 7B Vortex 8 281.51
Evo2 20B Vortex 4 178.20
Evo2 40B Vortex 4 86.28

@lewtun lewtun changed the title Remove deprecatd stuf Clean up inference benchmark May 18, 2026
@lewtun
Copy link
Copy Markdown
Member Author

lewtun commented May 19, 2026

Results if we run inference over 128 prompts instead of 16

Model Status Backend Batch GPUs bp/s Speedup vs Evo2 7B
Carbon-500M completed vllm dynamic 1 116,167.88 419.00x
Carbon-500M-remote completed vllm dynamic 1 105,515.62 380.58x
Carbon-3B completed vllm dynamic 1 75,587.50 272.64x
GENERATOR-v2-eukaryote-3b-base completed vllm dynamic 1 70,053.39 252.67x
Carbon-3B-remote completed vllm dynamic 1 68,619.78 247.50x
Carbon-8B completed vllm dynamic 1 57,367.15 206.92x
Carbon-8B-remote completed vllm dynamic 1 43,735.18 157.75x
Evo2 1B Base completed evo2 16 1 700.55 2.53x
Evo2 7B completed evo2 8 1 277.25 1.00x
Evo2 20B completed evo2 4 1 176.85 0.64x
Evo2 40B completed evo2 4 2 87.73 0.32x
GENERATOR-v2-eukaryote-1.2b-base skipped vllm dynamic 1

@lewtun
Copy link
Copy Markdown
Member Author

lewtun commented May 19, 2026

Tuning the Evo2 batch size to max size in 1 x H100

Model Status Backend Batch GPUs bp/s Speedup vs Evo2 7B
Carbon-500M completed vllm dynamic 1 116,167.88 267.38x
Carbon-500M-remote completed vllm dynamic 1 105,515.62 242.86x
Carbon-3B completed vllm dynamic 1 75,587.50 173.98x
GENERATOR-v2-eukaryote-3b-base completed vllm dynamic 1 70,053.39 161.24x
Carbon-3B-remote completed vllm dynamic 1 68,619.78 157.94x
Carbon-8B completed vllm dynamic 1 57,367.15 132.04x
Carbon-8B-remote completed vllm dynamic 1 43,735.18 100.66x
Evo2 1B Base completed evo2 34 1 1,380.15 3.18x
Evo2 7B completed evo2 13 1 434.47 1.00x
Evo2 20B completed evo2 4 1 178.14 0.41x
Evo2 40B completed evo2 4 2 87.75 0.20x
GENERATOR-v2-eukaryote-1.2b-base skipped vllm dynamic 1

@lewtun
Copy link
Copy Markdown
Member Author

lewtun commented May 19, 2026

As a function of # prompts

prompts Carbon-500M Carbon-3B Carbon-8B Gen 1.2B Gen 3B Evo2 1B Evo2 7B Evo2 20B Evo2 40B
1 3,416.8 1,489.3 811.7 1,712.4 1,079.5 42.3 33.3 44.0 21.8
2 5,395.2 2,916.8 1,737.5 3,075.0 2,032.2 86.5 68.8 86.0 43.0
4 11,071.5 5,561.6 3,274.8 5,825.4 3,864.2 176.8 132.9 169.4 85.1
8 21,723.7 11,309.6 6,576.0 11,383.8 7,510.8 351.6 269.0 175.9 85.7
16 43,443.4 20,109.9 12,675.7 21,399.1 14,067.0 705.3 267.1 176.4 84.9
32 74,971.4 40,027.8 22,560.9 37,086.5 25,590.9 1,409.7 360.9 177.2 87.2
64 110,803.6 65,218.3 39,248.0 64,184.6 43,046.3 1,436.9 441.9 177.9 87.9
128 161,581.9 100,126.9 46,435.8 104,055.0 68,300.9 1,435.3 432.3 180.5 85.6

lewtun and others added 7 commits May 19, 2026 14:29
Add per-spec GPU allocation so Evo2 40B can reserve two visible GPUs while one-GPU serving benchmarks continue to run on individual devices. Also label GENERator vLLM runs from the selected model name.

Co-authored-by: Codex <codex@openai.com>
Record output_bp_per_second in serving benchmark summary rows using the model family's output-token-to-base-pair ratio.

Co-authored-by: Codex <codex@openai.com>
Add commands for dry-run, Carbon vLLM, speculative decoding, full-node comparisons, and direct Evo2 serving benchmarks. Fix the serving wrapper's Evo2 helper path so the documented wrapper can launch it.

Co-authored-by: Codex <codex@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant