Conversation
There was a problem hiding this comment.
Pull request overview
Adds initial Metal backend support for the Q1_0 quantization format in ggml’s Metal shaders and dispatch logic, enabling mul_mat/mat-vec pathways to run with Q1_0 weights.
Changes:
- Implement
Q1_0dequantization helpers and dot-product logic inggml-metal.metal. - Add a dedicated
kernel_mul_mv_q1_0_f32kernel and wire upmul_mv_ext+mul_mmtemplate instantiations forQ1_0. - Extend Metal op/device dispatch to recognize
GGML_TYPE_Q1_0formul_matand pipeline selection.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
ggml/src/ggml-metal/ggml-metal.metal |
Adds Q1_0 dequantization, dot helper, new mat-vec kernel, and mul_mv_ext/mul_mm instantiations for Q1_0. |
ggml/src/ggml-metal/ggml-metal-ops.cpp |
Allows Q1_0 as a supported src0 type for the small-batch mat-vec fast path in mul_mat. |
ggml/src/ggml-metal/ggml-metal-impl.h |
Introduces threadgroup tuning constants (N_R0_Q1_0, N_SG_Q1_0) for Q1_0 mat-vec. |
ggml/src/ggml-metal/ggml-metal-device.cpp |
Adds Q1_0 cases to pipeline selection for mul_mv and mul_mv_id. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // function for calculate inner product between part of a q1_0 block and 16 floats (yl), sumy is SUM(yl[i]) | ||
| // il indicates where the q1 quants begin (0, 16, 32, ..., 112 for 128-element block) | ||
| // Q1_0 encodes weights as {+1, -1}: dot = d * (2 * Σ(yl[i] where bit=1) - sumy) | ||
| inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread float * yl, int il) { |
There was a problem hiding this comment.
In block_q_n_dot_y for Q1_0, the yl parameter is not modified; it can be declared as thread const float * to better document intent and allow the compiler more freedom for optimization.
| inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread float * yl, int il) { | |
| inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread const float * yl, int il) { |
4123e1f to
52fcb93
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Just DRAFT PR to run CI's the main PR will be for llama.cpp
ggml-org#21528