Clean up inference benchmark by lewtun · Pull Request #10 · huggingface/carbon

lewtun · 2026-05-18T13:31:02Z

Current results

Model	Backend	Batch	bp/s
Carbon-500M	vLLM	dynamic	43,129.69
GENERator 1.2B	vLLM	dynamic	21,584.02
Carbon-500M-remote	vLLM	dynamic	20,831.06
Carbon-3B	vLLM	dynamic	18,915.19
GENERator 3B	vLLM	dynamic	14,161.35
Carbon-8B-remote	vLLM	dynamic	12,319.61
Carbon-3B-remote	vLLM	dynamic	12,288.37
Carbon-8B	vLLM	dynamic	12,036.17
Evo2 1B	Vortex	16	685.76
Evo2 7B	Vortex	8	281.51
Evo2 20B	Vortex	4	178.20
Evo2 40B	Vortex	4	86.28

lewtun · 2026-05-19T07:31:48Z

Results if we run inference over 128 prompts instead of 16

Model	Status	Backend	Batch	GPUs	bp/s	Speedup vs Evo2 7B
Carbon-500M	completed	vllm	dynamic	1	116,167.88	419.00x
Carbon-500M-remote	completed	vllm	dynamic	1	105,515.62	380.58x
Carbon-3B	completed	vllm	dynamic	1	75,587.50	272.64x
GENERATOR-v2-eukaryote-3b-base	completed	vllm	dynamic	1	70,053.39	252.67x
Carbon-3B-remote	completed	vllm	dynamic	1	68,619.78	247.50x
Carbon-8B	completed	vllm	dynamic	1	57,367.15	206.92x
Carbon-8B-remote	completed	vllm	dynamic	1	43,735.18	157.75x
Evo2 1B Base	completed	evo2	16	1	700.55	2.53x
Evo2 7B	completed	evo2	8	1	277.25	1.00x
Evo2 20B	completed	evo2	4	1	176.85	0.64x
Evo2 40B	completed	evo2	4	2	87.73	0.32x
GENERATOR-v2-eukaryote-1.2b-base	skipped	vllm	dynamic	1	—	—

lewtun · 2026-05-19T08:13:40Z

Tuning the Evo2 batch size to max size in 1 x H100

Model	Status	Backend	Batch	GPUs	bp/s	Speedup vs Evo2 7B
Carbon-500M	completed	vllm	dynamic	1	116,167.88	267.38x
Carbon-500M-remote	completed	vllm	dynamic	1	105,515.62	242.86x
Carbon-3B	completed	vllm	dynamic	1	75,587.50	173.98x
GENERATOR-v2-eukaryote-3b-base	completed	vllm	dynamic	1	70,053.39	161.24x
Carbon-3B-remote	completed	vllm	dynamic	1	68,619.78	157.94x
Carbon-8B	completed	vllm	dynamic	1	57,367.15	132.04x
Carbon-8B-remote	completed	vllm	dynamic	1	43,735.18	100.66x
Evo2 1B Base	completed	evo2	34	1	1,380.15	3.18x
Evo2 7B	completed	evo2	13	1	434.47	1.00x
Evo2 20B	completed	evo2	4	1	178.14	0.41x
Evo2 40B	completed	evo2	4	2	87.75	0.20x
GENERATOR-v2-eukaryote-1.2b-base	skipped	vllm	dynamic	1	—	—

lewtun · 2026-05-19T09:37:48Z

As a function of # prompts

prompts	Carbon-500M	Carbon-3B	Carbon-8B	Gen 1.2B	Gen 3B	Evo2 1B	Evo2 7B	Evo2 20B	Evo2 40B
1	3,416.8	1,489.3	811.7	1,712.4	1,079.5	42.3	33.3	44.0	21.8
2	5,395.2	2,916.8	1,737.5	3,075.0	2,032.2	86.5	68.8	86.0	43.0
4	11,071.5	5,561.6	3,274.8	5,825.4	3,864.2	176.8	132.9	169.4	85.1
8	21,723.7	11,309.6	6,576.0	11,383.8	7,510.8	351.6	269.0	175.9	85.7
16	43,443.4	20,109.9	12,675.7	21,399.1	14,067.0	705.3	267.1	176.4	84.9
32	74,971.4	40,027.8	22,560.9	37,086.5	25,590.9	1,409.7	360.9	177.2	87.2
64	110,803.6	65,218.3	39,248.0	64,184.6	43,046.3	1,436.9	441.9	177.9	87.9
128	161,581.9	100,126.9	46,435.8	104,055.0	68,300.9	1,435.3	432.3	180.5	85.6

Add per-spec GPU allocation so Evo2 40B can reserve two visible GPUs while one-GPU serving benchmarks continue to run on individual devices. Also label GENERator vLLM runs from the selected model name. Co-authored-by: Codex <codex@openai.com>

Record output_bp_per_second in serving benchmark summary rows using the model family's output-token-to-base-pair ratio. Co-authored-by: Codex <codex@openai.com>

Add commands for dry-run, Carbon vLLM, speculative decoding, full-node comparisons, and direct Evo2 serving benchmarks. Fix the serving wrapper's Evo2 helper path so the documented wrapper can launch it. Co-authored-by: Codex <codex@openai.com>

Remove deprecatd stuf

43ac53b

lewtun changed the title ~~Remove deprecatd stuf~~ Clean up inference benchmark May 18, 2026

lewtun and others added 7 commits May 19, 2026 14:29

Merge branch 'main' into clean-up

3e8f16b

Merge branch 'main' into clean-up

743b84a

Support multi-GPU serving benchmark specs

9591313

Add per-spec GPU allocation so Evo2 40B can reserve two visible GPUs while one-GPU serving benchmarks continue to run on individual devices. Also label GENERator vLLM runs from the selected model name. Co-authored-by: Codex <codex@openai.com>

Add bp throughput to serving summaries

ef58fb0

Record output_bp_per_second in serving benchmark summary rows using the model family's output-token-to-base-pair ratio. Co-authored-by: Codex <codex@openai.com>

Rename

eea32a4

Document serving inference benchmarks

75eb7f3

Add commands for dry-run, Carbon vLLM, speculative decoding, full-node comparisons, and direct Evo2 serving benchmarks. Fix the serving wrapper's Evo2 helper path so the documented wrapper can launch it. Co-authored-by: Codex <codex@openai.com>

clean up

eee0d19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up inference benchmark#10

Clean up inference benchmark#10
lewtun wants to merge 8 commits into
mainfrom
clean-up

lewtun commented May 18, 2026 •

edited

Loading

Uh oh!

lewtun commented May 19, 2026

Uh oh!

lewtun commented May 19, 2026

Uh oh!

lewtun commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lewtun commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewtun commented May 19, 2026

Uh oh!

lewtun commented May 19, 2026

Uh oh!

lewtun commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lewtun commented May 18, 2026 •

edited

Loading