initial Q1_0 Metal backend by khosravipasha · Pull Request #14 · PrismML-Eng/llama.cpp

khosravipasha · 2026-04-06T15:26:19Z

Just DRAFT PR to run CI's the main PR will be for llama.cpp
ggml-org#21528

khosravipasha · 2026-04-06T15:27:09Z

after commit 2 (tuning q1_0 metal kernels) much faster now

Copilot

Pull request overview

Adds initial Metal backend support for the Q1_0 quantization format in ggml’s Metal shaders and dispatch logic, enabling mul_mat/mat-vec pathways to run with Q1_0 weights.

Changes:

Implement Q1_0 dequantization helpers and dot-product logic in ggml-metal.metal.
Add a dedicated kernel_mul_mv_q1_0_f32 kernel and wire up mul_mv_ext + mul_mm template instantiations for Q1_0.
Extend Metal op/device dispatch to recognize GGML_TYPE_Q1_0 for mul_mat and pipeline selection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`ggml/src/ggml-metal/ggml-metal.metal`	Adds Q1_0 dequantization, dot helper, new mat-vec kernel, and mul_mv_ext/mul_mm instantiations for Q1_0.
`ggml/src/ggml-metal/ggml-metal-ops.cpp`	Allows Q1_0 as a supported `src0` type for the small-batch mat-vec fast path in `mul_mat`.
`ggml/src/ggml-metal/ggml-metal-impl.h`	Introduces threadgroup tuning constants (`N_R0_Q1_0`, `N_SG_Q1_0`) for Q1_0 mat-vec.
`ggml/src/ggml-metal/ggml-metal-device.cpp`	Adds Q1_0 cases to pipeline selection for `mul_mv` and `mul_mv_id`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ggml/src/ggml-metal/ggml-metal-device.cpp

ggml/src/ggml-metal/ggml-metal.metal

Copilot · 2026-04-06T15:44:58Z

ggml/src/ggml-metal/ggml-metal.metal

+// function for calculate inner product between part of a q1_0 block and 16 floats (yl), sumy is SUM(yl[i])
+// il indicates where the q1 quants begin (0, 16, 32, ..., 112 for 128-element block)
+// Q1_0 encodes weights as {+1, -1}: dot = d * (2 * Σ(yl[i] where bit=1) - sumy)
+inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread float * yl, int il) {


In block_q_n_dot_y for Q1_0, the yl parameter is not modified; it can be declared as thread const float * to better document intent and allow the compiler more freedom for optimization.

Suggested change

inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread float * yl, int il) {

inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread const float * yl, int il) {

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ggml/src/ggml-metal/ggml-metal-ops.cpp

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

github-actions bot added ggml Apple Metal labels Apr 6, 2026

khosravipasha marked this pull request as draft April 6, 2026 15:28

khosravipasha requested a review from Copilot April 6, 2026 15:33

Copilot started reviewing on behalf of khosravipasha April 6, 2026 15:34 View session

khosravipasha force-pushed the q1-metal branch from b9b3f68 to f57995b Compare April 6, 2026 15:35

Copilot AI reviewed Apr 6, 2026

View reviewed changes

khosravipasha force-pushed the q1-metal branch 2 times, most recently from 4123e1f to 52fcb93 Compare April 6, 2026 16:12

khosravipasha requested a review from Copilot April 6, 2026 17:07

Copilot started reviewing on behalf of khosravipasha April 6, 2026 17:07 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

ggml/src/ggml-metal/ggml-metal-ops.cpp Show resolved Hide resolved

khosravipasha added 2 commits April 6, 2026 12:03

initial Q1_0 Metal backend

b0c8c89

tuning q1_0 metal kernels

0776cd2

khosravipasha force-pushed the q1-metal branch from 52fcb93 to 0776cd2 Compare April 6, 2026 19:03

github-actions bot added documentation Improvements or additions to documentation Nvidia GPU python examples testing SYCL AMD ZenDNN devops script server model nix jinja parser OpenCL labels Apr 6, 2026

khosravipasha removed Nvidia GPU ggml examples AMD ZenDNN devops model nix jinja parser Hexagon python testing SYCL script server OpenCL WebGPU labels Apr 6, 2026

khosravipasha requested a review from Copilot April 6, 2026 19:10

Copilot started reviewing on behalf of khosravipasha April 6, 2026 19:10 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

khosravipasha requested a review from Copilot April 6, 2026 19:44

Copilot started reviewing on behalf of khosravipasha April 6, 2026 19:44 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

add Q1_0 to test-backend-ops

6a55d70

github-actions bot added ggml testing labels Apr 6, 2026

khosravipasha mentioned this pull request Apr 6, 2026

metal: Q1_0 backend ggml-org/llama.cpp#21528

Merged

khosravipasha and others added 2 commits April 6, 2026 15:10

add Q1_0<->F32 copy test

74c9bdd

Apply suggestions from code review

a1517c2

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

khosravipasha closed this Apr 7, 2026

khosravipasha deleted the q1-metal branch April 10, 2026 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial Q1_0 Metal backend#14

initial Q1_0 Metal backend#14
khosravipasha wants to merge 5 commits intomasterfrom
q1-metal

khosravipasha commented Apr 6, 2026 •

edited

Loading

Uh oh!

khosravipasha commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread float * yl, int il) {
	inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread const float * yl, int il) {

Conversation

khosravipasha commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khosravipasha commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khosravipasha commented Apr 6, 2026 •

edited

Loading