Commit 84ea97c
bench: prefill throughput script + 40× gap discovery
Adds scripts/test_prefill.sh and updates the throughput report with the
single biggest gap to llama.cpp: prompt prefill.
Today's quant.cpp does prompt processing one token at a time through
the same single-token forward path as decode. llama.cpp uses batched
matrix-matrix matmul during prefill — 30-50× faster.
Concrete numbers (M1 Pro, 8 threads, ~450 prompt tokens):
| Model | quant.cpp | llama.cpp | Gap |
| Llama-3.2-1B Q8 | 10 tok/s | 359 tok/s | 35× |
| Llama-3.2-3B Q8 | 3 tok/s | 130 tok/s | 41× |
| Phi-3.5 Q4_K_M | 2 tok/s | 91 tok/s | 48× |
| Qwen3.5-4B Q4_K | 2 tok/s | 88 tok/s | 44× |
User-visible impact: a 1000-token prompt to Phi-3.5-mini takes ~10
minutes today. A batched-prefill path should make it under 15 seconds.
Marked as the next major engineering project for the engine.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent e82ddd7 commit 84ea97c
File tree
2 files changed
+97
-0
lines changed- bench/results
- scripts
2 files changed
+97
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
35 | 58 | | |
36 | 59 | | |
37 | 60 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
0 commit comments