[pull] master from ggml-org:master by pull[bot] · Pull Request #1108 · LongLeCE/llama.cpp

pull · 2026-04-23T08:42:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* shader(im2col): implement the im2col shader * shader(im2col): clean the formatting issues * shader(im2col): clean the editorconfig checker warning * fix(shader): address the workgroup issues of im2col and conv2d

* sycl : fused MoE mul_mat_vec_q for TG Create an MMVQ kernel so ggml_sycl_mul_mat_id can consolidate n_experts_used matmuls in a single kernel launch. The kernel also reads expert IDs directly, removing a per-call host sync. This is similar to the CUDA backend's ggml_cuda_mul_mat_vec_q* paths. All types supported in the current MMVQ are supported here as well: Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0 It will fall back to the existing per-expert path when src0 has been rewritten by opt_for_reorder(), and for any shape the fused path doesn't handle. test-backend-ops passes for supported type/shape combos. Benchmark: Qwen3-Next-35B-A3B Q4_K_M on Intel Arc B70 (SYCL0), baseline 707c0b7, 16k context, -fa 0. build/bin/llama-bench -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M \ -p 1024 -n 128 -d 16384 -ngl 99 -fa 0 -ub 2048 -r 2 -dev SYCL0 Before (3 runs on 707c0b7): | test | run 1 | run 2 | run 3 | | --------------- | ----------------:| ----------------:| ----------------:| | pp1024 @ d16384 | 533.26 ± 4.87 | 535.20 ± 2.78 | 524.27 ± 3.10 | | tg128 @ d16384 | 33.47 ± 0.02 | 33.31 ± 0.02 | 33.17 ± 0.05 | After (3 runs on 707c0b7 + this patch): | test | run 1 | run 2 | run 3 | | --------------- | ----------------:| ----------------:| ----------------:| | pp1024 @ d16384 | 534.06 ± 0.97 | 531.95 ± 0.02 | 520.94 ± 20.10 | | tg128 @ d16384 | 45.85 ± 0.21 | 45.95 ± 0.45 | 46.22 ± 0.12 | disclosure: Claude wrote it, but I reviewed and understand the implementation (albeit my C is a little rusty). * sycl: also support nvfp4 and mxfp4 expert types * sycl: terser comments/nested dispatch in response to review * sycl: more comment cleanup in mmvq.cpp/hpp --------- Co-authored-by: Debian <aaron@openllmi.net.bots.is>

…rt to GGUF (#22247) * Handle ModelOpt produced mixed precision model during convert to GGUF * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

… package. (#22078) * upgrade oneAPI to 2025.3.3 * update * seperate SYCL CI and support release binary package for ubuntu 24 * add dependence * remove wrong copy lines * add missed line * remove other task to test the release for SYCL * rm more for test release * fix file name * correct the error in running * support build for fp32/fp16 * rm ubuntu-24-sycl-fp16 for duplicated * refactor build setting * update guide for ubuntu 24 release package, restore the release.yml for other backend * user docker replace to install oneAPI * use download installation package to replace docker * use wget to download and install oneapi, replace the apt cmd * enable ccache for oneAPI installation * fix format error * enable cache for oneAPI installation * update guide * Update .github/workflows/release.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update .github/workflows/release.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update .github/workflows/build-sycl.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update .github/workflows/release.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Fixes #22237 — the find_library(MATH_LIBRARY m) result was being discarded and the target linked against the literal 'm' string. This prevents users from overriding the math library (e.g. for AMD AOCL) via CMake variables. Now the discovered MATH_LIBRARY is used directly.

* gitignore: add AGENTS.local Assisted-by: llama.cpp:local pi Signed-off-by: Georgi Gerganov <ggerganov@gmail.com> * gitignore: rename AGENTS.local to AGENTS.local.md Assisted-by: llama.cpp:local pi Signed-off-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: Georgi Gerganov <ggerganov@gmail.com>

Constannnnnt and others added 7 commits April 22, 2026 20:17

ggml-webgpu: add support for im2col (#22259)

b76429a

* shader(im2col): implement the im2col shader * shader(im2col): clean the formatting issues * shader(im2col): clean the editorconfig checker warning * fix(shader): address the workgroup issues of im2col and conv2d

metal : fix event synchronization (#22260)

8635e22

pull Bot locked and limited conversation to collaborators Apr 23, 2026

pull Bot added the ⤵️ pull label Apr 23, 2026

pull Bot merged commit 8635e22 into LongLeCE:master Apr 23, 2026

github-actions Bot added documentation Improvements or additions to documentation Apple Metal python ggml SYCL devops WebGPU labels Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#1108

[pull] master from ggml-org:master#1108
pull[bot] merged 7 commits intoLongLeCE:masterfrom
ggml-org:master

pull Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pull Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pull Bot commented Apr 23, 2026 •

edited

Loading