Skip to content

initial Q1_0 Metal backend#14

Closed
khosravipasha wants to merge 5 commits intomasterfrom
q1-metal
Closed

initial Q1_0 Metal backend#14
khosravipasha wants to merge 5 commits intomasterfrom
q1-metal

Conversation

@khosravipasha
Copy link
Copy Markdown
Collaborator

@khosravipasha khosravipasha commented Apr 6, 2026

Just DRAFT PR to run CI's the main PR will be for llama.cpp
ggml-org#21528

@khosravipasha
Copy link
Copy Markdown
Collaborator Author

after commit 2 (tuning q1_0 metal kernels) much faster now
Screenshot 2026-04-06 at 08 26 36

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial Metal backend support for the Q1_0 quantization format in ggml’s Metal shaders and dispatch logic, enabling mul_mat/mat-vec pathways to run with Q1_0 weights.

Changes:

  • Implement Q1_0 dequantization helpers and dot-product logic in ggml-metal.metal.
  • Add a dedicated kernel_mul_mv_q1_0_f32 kernel and wire up mul_mv_ext + mul_mm template instantiations for Q1_0.
  • Extend Metal op/device dispatch to recognize GGML_TYPE_Q1_0 for mul_mat and pipeline selection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
ggml/src/ggml-metal/ggml-metal.metal Adds Q1_0 dequantization, dot helper, new mat-vec kernel, and mul_mv_ext/mul_mm instantiations for Q1_0.
ggml/src/ggml-metal/ggml-metal-ops.cpp Allows Q1_0 as a supported src0 type for the small-batch mat-vec fast path in mul_mat.
ggml/src/ggml-metal/ggml-metal-impl.h Introduces threadgroup tuning constants (N_R0_Q1_0, N_SG_Q1_0) for Q1_0 mat-vec.
ggml/src/ggml-metal/ggml-metal-device.cpp Adds Q1_0 cases to pipeline selection for mul_mv and mul_mv_id.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// function for calculate inner product between part of a q1_0 block and 16 floats (yl), sumy is SUM(yl[i])
// il indicates where the q1 quants begin (0, 16, 32, ..., 112 for 128-element block)
// Q1_0 encodes weights as {+1, -1}: dot = d * (2 * Σ(yl[i] where bit=1) - sumy)
inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread float * yl, int il) {
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In block_q_n_dot_y for Q1_0, the yl parameter is not modified; it can be declared as thread const float * to better document intent and allow the compiler more freedom for optimization.

Suggested change
inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread float * yl, int il) {
inline float block_q_n_dot_y(device const block_q1_0 * qb_curr, float sumy, thread const float * yl, int il) {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

khosravipasha and others added 2 commits April 6, 2026 15:10
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@khosravipasha khosravipasha deleted the q1-metal branch April 10, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants