Skip to content

speed-bench: add Mac Studio M1 Ultra 64GB streaming numbers#350

Open
ozgursoy wants to merge 1 commit into
antirez:streamingfrom
ozgursoy:m1-ultra-64gb-streaming-bench
Open

speed-bench: add Mac Studio M1 Ultra 64GB streaming numbers#350
ozgursoy wants to merge 1 commit into
antirez:streamingfrom
ozgursoy:m1-ultra-64gb-streaming-bench

Conversation

@ozgursoy
Copy link
Copy Markdown

@ozgursoy ozgursoy commented Jun 7, 2026

Real-hardware 64GB streaming datapoint. The "Flash on 64GB MacBooks" section
currently only documents a 128GB machine with 64GB locked away, this is an
actual 64GB box.

Machine: Mac Studio M1 Ultra, 64GB, macOS 26.5
Model: Flash Q2 (DeepSeek-V4-Flash-IQ2XXS-w2Q2K...imatrix, ~81GB)

Command:
./ds4-bench -m ds4flash.gguf
--ssd-streaming --ssd-streaming-cache-experts 32GB
--ctx-start 2048 --ctx-max 32768 --step-incr 2048 --gen-tokens 128
--csv speed-bench/m1_ultra_64gb_stream.csv

Results (2K-32K ctx): prefill ~108-118 t/s, generation ~5 t/s, both roughly
flat as context grows. Decode is SSD-bound: it does not scale with the Ultra
GPU, so generation stays close to what smaller 64GB Apple Silicon reaches.

Bonus, simulated 32GB (--ssd-streaming --simulate-used-memory 32GB): with only
~32GB available the expert cache can't stay resident for the 81GB model, so
decode collapses to ~0.17 t/s (prefill ~60 t/s) - not practically usable. Can
add a CSV for it if useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant