You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MLX LLM Benchmarks for Agentic Coding on Apple Silicon
Which local LLM runs best for coding on your Mac? Speed and quality benchmarks for MLX models, tested on real Apple hardware.
MLX Metal | int4 quantization | April 2026
Speed: 1024 prompt tokens, 100 generated tokens
Quality: 81 problems across coding, reasoning, tool calling, math, writing (3 runs each, majority vote)
API baseline: Claude Opus 4.6 scores 86.7% on the same quality benchmark (via Anthropic API, not local)
Best Models by Hardware
Hardware
Best Quality
Best Balance
Best Speed
M4 Pro 64GB
Gemma 4 31B-it (13 tok/s, 72.2%)
LFM2-24B-A2B (117 tok/s, 67.3%)
Gemma 4 E2B-it (121 tok/s, 65.3%)
M5 Max 128GB
Qwen3-Coder-30B-A3B (129 tok/s, 75.5%)
Qwen3-Coder-30B-A3B (129 tok/s, 75.5%)
Gemma 4 E2B-it (205 tok/s, 67.8%)
M4 Pro 64GB
Model
RAM
Quality
Gen tok/s
Gemma 4 31B-it (31B dense)
18.9 GiB
72.2%
13
Qwen 3.5-27B (27B dense)
18.8 GiB
68.6%
12
LFM2-24B-A2B (2B MoE)
14.2 GiB
67.3%
117
Qwen 3.5-35B-A3B (3B MoE)
21.9 GiB
66.9%
26
Qwen3-Coder-30B-A3B (3B MoE)
17.8 GiB
65.7%
80
Gemma 4 E2B-it (2.3B dense)
3.5 GiB
65.3%
121
Gemma 4 26B-A4B-it (3.8B MoE)
15.3 GiB
64.1%
65
Qwen 3.5-9B (9B dense)
7.3 GiB
61.6%
39
Gemma 4 E4B-it (4.5B dense)
5.0 GiB
50.6%
69
Qwen 3.5-4B (4B dense)
4.9 GiB
50.6%
67
Qwen 2.5-Coder-3B (3B dense)
2.6 GiB
46.9%
117
Gemma 3-4B-it QAT (4B dense)
3.5 GiB
45.7%
88
GLM-4.7-Flash (3B MoE)
17.6 GiB
45.3%
62
Gemma 3-4B-it (4B dense)
3.5 GiB
44.5%
88
Nemotron-Nano-9B-v2 (9B dense)
8.1 GiB
42.9%
47
Qwen 3.5-2B (2B dense)
3.3 GiB
41.2%
142
Qwen 2.5-3B-it (3B dense)
2.6 GiB
37.6%
111
Qwen 3.5-0.8B (0.8B dense)
2.5 GiB
28.6%
275
DeepSeek-R1-0528-Qwen3-8B (8B dense)
5.4 GiB
17.1%
51
Speed-only models (no quality scores yet)
Model
RAM
Gen tok/s
Prefill tok/s
Qwen 2.5-Coder-0.5B (0.5B dense)
1.1 GiB
420
7054
Qwen 2.5-0.5B-it (0.5B dense)
1.1 GiB
380
7182
Qwen 3-0.6B-it (0.6B dense)
1.3 GiB
334
5329
Gemma 3-1B-it (1B dense)
1.6 GiB
252
3679
Gemma 3-1B-it QAT (1B dense)
1.6 GiB
250
3774
Nemotron-3-Nano-4B (4B dense)
4.5 GiB
102
833
DeepSeek-R1-Distill-7B (7B dense)
5.1 GiB
56
531
Qwen 3-8B-it (8B dense)
5.4 GiB
51
452
Gemma 3-12B-it QAT (12B dense)
8.2 GiB
32
280
Qwen 3-14B-it (14B dense)
9.1 GiB
29
242
Qwen 3.5-27B Opus Distilled (27B dense)
16.9 GiB
16
128
int8 speed results
Model
RAM
Gen tok/s
Prefill tok/s
Qwen 2.5-Coder-0.5B (0.5B dense)
1.3 GiB
272
7032
Qwen 2.5-0.5B-it (0.5B dense)
1.3 GiB
272
6790
Qwen 3-0.6B-it (0.6B dense)
1.6 GiB
238
5183
Qwen 3.5-0.8B (0.8B dense)
2.9 GiB
195
3494
Gemma 3-1B-it QAT (1B dense)
2.2 GiB
171
3722
Gemma 3-1B-it (1B dense)
2.2 GiB
169
3560
Qwen 3.5-2B (2B dense)
4.0 GiB
93
1668
Gemma 4 E2B-it (2.3B dense)
5.8 GiB
78
4265
LFM2-24B-A2B (2B MoE)
25.9 GiB
75
1186
Qwen 2.5-Coder-3B (3B dense)
4.0 GiB
72
1093
Qwen 2.5-3B-it (3B dense)
4.0 GiB
68
1135
Nemotron-3-Nano-4B (4B dense)
6.5 GiB
58
824
Qwen3-Coder-30B-A3B (3B MoE)
33.1 GiB
54
865
Gemma 3-4B-it QAT (4B dense)
5.6 GiB
52
911
Gemma 3-4B-it (4B dense)
5.6 GiB
52
887
Gemma 4 26B-A4B-it (3.8B MoE)
27.7 GiB
45
785
Qwen 3.5-4B (4B dense)
6.8 GiB
43
684
GLM-4.7-Flash (3B MoE)
32.5 GiB
43
674
Gemma 4 E4B-it (4.5B dense)
8.7 GiB
42
1229
Qwen 3-8B-it (8B dense)
9.5 GiB
30
454
DeepSeek-R1-0528-Qwen3-8B (8B dense)
9.5 GiB
29
450
Qwen 3.5-9B (9B dense)
11.7 GiB
24
383
Qwen 3.5-35B-A3B (3B MoE)
39.2 GiB
22
716
Gemma 3-12B-it QAT (12B dense)
14.6 GiB
18
280
Qwen 3-14B-it (14B dense)
16.4 GiB
16
237
Qwen 3.5-27B (27B dense)
32.0 GiB
7
108
Gemma 4 31B-it (31B dense)
34.1 GiB
7
99
M5 Max 128GB
Model
RAM
Quality
Gen tok/s
Qwen3-Coder-30B-A3B (3B MoE)
17.8 GiB
75.5%
129
Gemma 4 31B-it (31B dense)
18.9 GiB
72.7%
17
Qwen 3.5-27B (27B dense)
18.8 GiB
71.0%
25
LFM2-24B-A2B (2B MoE)
14.2 GiB
70.6%
180
Gemma 4 E2B-it (2.3B dense)
3.5 GiB
67.8%
205
Qwen 3.5-35B-A3B (3B MoE)
21.9 GiB
67.8%
44
Gemma 4 26B-A4B-it (3.8B MoE)
15.3 GiB
65.3%
110
Qwen 3.5-9B (9B dense)
7.3 GiB
64.5%
79
Qwen 3.5-4B (4B dense)
4.9 GiB
50.6%
131
Gemma 4 E4B-it (4.5B dense)
5.0 GiB
50.6%
130
Qwen 2.5-Coder-3B (3B dense)
2.6 GiB
48.2%
226
GLM-4.7-Flash (3B MoE)
17.6 GiB
48.2%
96
Gemma 3-4B-it (4B dense)
3.5 GiB
46.9%
171
Nemotron-Nano-9B-v2 (9B dense)
8.1 GiB
46.5%
67
Gemma 3-4B-it QAT (4B dense)
3.5 GiB
45.3%
172
Qwen 2.5-3B-it (3B dense)
2.6 GiB
38.4%
209
Qwen 3.5-2B (2B dense)
3.3 GiB
37.6%
235
Qwen 3.5-0.8B (0.8B dense)
2.5 GiB
24.9%
409
DeepSeek-R1-0528-Qwen3-8B (8B dense)
5.4 GiB
16.3%
104
Speed-only models (no quality scores yet)
Model
RAM
Gen tok/s
Prefill tok/s
Qwen 2.5-Coder-0.5B (0.5B dense)
1.1 GiB
611
20459
Qwen 2.5-0.5B-it (0.5B dense)
1.1 GiB
542
19425
Qwen 3-0.6B-it (0.6B dense)
1.3 GiB
527
17023
Gemma 3-1B-it QAT (1B dense)
1.6 GiB
373
14613
Gemma 3-1B-it (1B dense)
1.6 GiB
371
15039
DeepSeek-R1-Distill-7B (7B dense)
5.1 GiB
114
3116
Qwen 3-8B-it (8B dense)
5.4 GiB
103
2890
Qwen 3-14B-it (14B dense)
9.1 GiB
60
1398
Gemma 3-12B-it QAT (12B dense)
8.2 GiB
48
1275
Qwen 3.5-27B Opus Distilled (27B dense)
16.9 GiB
28
436
int8 speed results
Model
RAM
Gen tok/s
Prefill tok/s
Qwen 3-0.6B-it (0.6B dense)
1.6 GiB
422
16697
Qwen 2.5-Coder-0.5B (0.5B dense)
1.3 GiB
416
20203
Qwen 2.5-0.5B-it (0.5B dense)
1.3 GiB
411
18370
Qwen 3.5-0.8B (0.8B dense)
2.9 GiB
291
9616
Gemma 3-1B-it (1B dense)
2.2 GiB
277
14773
Gemma 3-1B-it QAT (1B dense)
2.2 GiB
276
14428
Qwen 3.5-2B (2B dense)
4.0 GiB
151
5177
Gemma 4 E2B-it (2.3B dense)
5.8 GiB
147
14893
Qwen 2.5-Coder-3B (3B dense)
4.0 GiB
144
6504
Qwen 2.5-3B-it (3B dense)
4.0 GiB
134
6342
LFM2-24B-A2B (2B MoE)
25.9 GiB
132
5297
Gemma 3-4B-it QAT (4B dense)
5.6 GiB
104
5157
Gemma 3-4B-it (4B dense)
5.6 GiB
104
5087
Qwen3-Coder-30B-A3B (3B MoE)
33.1 GiB
96
3353
Gemma 4 26B-A4B-it (3.8B MoE)
27.7 GiB
85
3208
Qwen 3.5-4B (4B dense)
6.8 GiB
85
2584
Gemma 4 E4B-it (4.5B dense)
8.7 GiB
85
6374
GLM-4.7-Flash (3B MoE)
32.5 GiB
72
2565
Qwen 3-8B-it (8B dense)
9.5 GiB
63
2818
DeepSeek-R1-0528-Qwen3-8B (8B dense)
9.5 GiB
62
2719
Qwen 3.5-9B (9B dense)
11.7 GiB
49
1511
Qwen 3.5-35B-A3B (3B MoE)
39.2 GiB
45
2151
Qwen 3-14B-it (14B dense)
16.4 GiB
34
1426
Gemma 3-12B-it QAT (12B dense)
14.6 GiB
32
1060
Qwen 3.5-27B Opus Distilled (27B dense)
30.3 GiB
17
434
Qwen 3.5-27B (27B dense)
32.0 GiB
15
480
Gemma 4 31B-it (31B dense)
34.1 GiB
9
544
M4 Pro 24GB (legacy)
Model
RAM
Gen tok/s
Prefill tok/s
Qwen 2.5-0.5B-it (0.5B dense)
1.4 GiB
376
5530
Qwen 2.5-Coder-0.5B (0.5B dense)
1.4 GiB
371
5464
Qwen 3-0.6B-it (0.6B dense)
1.6 GiB
328
4277
Gemma 3-1B-it QAT (1B dense)
2.0 GiB
222
2858
Gemma 3-1B-it (1B dense)
2.0 GiB
222
2866
Qwen 2.5-Coder-3B (3B dense)
2.9 GiB
118
1050
Qwen 2.5-3B-it (3B dense)
2.9 GiB
112
1051
Gemma 3-4B-it QAT (4B dense)
3.9 GiB
84
756
Gemma 3-4B-it (4B dense)
3.9 GiB
81
739
DeepSeek-R1-Distill-7B (7B dense)
5.3 GiB
57
476
Qwen 3-8B-it (8B dense)
5.6 GiB
52
414
DeepSeek-R1-0528-Qwen3-8B (8B dense)
5.6 GiB
52
416
Gemma 3-12B-it QAT (12B dense)
8.6 GiB
31
256
Qwen 3-14B-it (14B dense)
9.3 GiB
29
224
int8 speed results
Model
RAM
Gen tok/s
Prefill tok/s
Qwen 2.5-0.5B-it (0.5B dense)
1.6 GiB
273
5745
Qwen 2.5-Coder-0.5B (0.5B dense)
1.6 GiB
271
5717
Qwen 3-0.6B-it (0.6B dense)
1.8 GiB
239
4323
Gemma 3-1B-it QAT (1B dense)
2.7 GiB
156
2861
Gemma 3-1B-it (1B dense)
2.7 GiB
155
2850
Qwen 2.5-Coder-3B (3B dense)
4.2 GiB
72
1052
Qwen 2.5-3B-it (3B dense)
4.2 GiB
67
1046
Gemma 3-4B-it QAT (4B dense)
6.1 GiB
51
808
Gemma 3-4B-it (4B dense)
6.1 GiB
50
744
Qwen 3-8B-it (8B dense)
9.7 GiB
30
413
DeepSeek-R1-0528-Qwen3-8B (8B dense)
9.7 GiB
30
413
Gemma 3-12B-it QAT (12B dense)
15.0 GiB
18
254
Reading the Table
Models are sorted by Quality (best first) — a weighted score across 81 agentic coding problems. Harder problems count more: Easy (1x), Hard (2x), Expert (3x), Tool Calling (3x). Each problem runs 3 times with majority vote. Models without quality scores are in a collapsible "Speed-only" section below.
RAM is the int4 memory footprint — check this first to see what fits on your hardware.
Gen tok/s is generation speed at int4 with a 1024-token prompt. As a rule of thumb: 100+ is great for autocomplete, 50+ is comfortable for agentic coding, 30+ is usable for interactive chat, and below 15 feels slow.