Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

common : allow --offline in llama download
#25091 opened Jun 28, 2026 by angt Member Loading…
sycl: fix check_graph_compatibility() to allow graphs for MoE decode (CONCAT dim!=3, MUL_MAT_ID fused path) ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25089 opened Jun 28, 2026 by Captain-Tripps Loading…
5 tasks done
jinja: add --dump-prog for debugging jinja parser Issues related to the jinja parser testing Everything test related
#25086 opened Jun 27, 2026 by ngxson Collaborator Loading…
hexagon: flash attention rework (optimizations, accuracy improvements, etc) ggml changes relating to the ggml tensor library for machine learning Hexagon
#25085 opened Jun 27, 2026 by max-krasnyansky Member Draft
ui: fix stop and reasoning skip in single-model mode server/ui
#25084 opened Jun 27, 2026 by ServeurpersoCom Contributor Loading…
[SYCL] Use sycl func to fix AOT issue: error: Double type is not supported on this platform. ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25081 opened Jun 27, 2026 by arthw Contributor Loading…
server-fix : accept text tool outputs in responses API server testing Everything test related
#25073 opened Jun 27, 2026 by jesco-absolut Loading…
server: add strict prompt cache RAM limit server
#25070 opened Jun 27, 2026 by tarruda Loading…
docs: add Apple A-chipsets model size estimation guide documentation Improvements or additions to documentation
#25068 opened Jun 26, 2026 by Dhruvizzle101 Loading…
sycl: add Q2_K to DMMV reorder path ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25064 opened Jun 26, 2026 by malsbat Contributor Draft
Improve utilization (and tg t/s) by reducing K_QUANTS_PER_ITERATION to 1 on DMMV path ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25063 opened Jun 26, 2026 by malsbat Contributor Loading…
server : drop the need for vendor subprocess.h server
#25052 opened Jun 26, 2026 by mfuntowicz Contributor Loading…
vulkan: add allreduce function with cross-device CPU proxy and fix Tensor Parallel crash [EXPERIMENTAL] ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#25051 opened Jun 26, 2026 by pwilkin Member Draft
ggml-cpu: replace cyclic chunk distribution with atomic work-stealing ggml changes relating to the ggml tensor library for machine learning
#25048 opened Jun 26, 2026 by dnislno Loading…
server-stream: follow-up on SSE Replay Buffer (#23226) documentation Improvements or additions to documentation server
#25047 opened Jun 26, 2026 by ServeurpersoCom Contributor Loading…
[SYCL] rename the env vars from disable to enable documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25042 opened Jun 26, 2026 by arthw Contributor Loading…
[SYCL] update Q&A for saving power documentation Improvements or additions to documentation merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25041 opened Jun 26, 2026 by arthw Contributor Loading…
ggml : unify tie-breaking to first index across all backends Apple Metal https://en.wikipedia.org/wiki/Metal_(API) CUDA Related to the CUDA backend ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend WebGPU
#25032 opened Jun 26, 2026 by angt Member Draft
ggml : fix tensor-parallel + -ncmoe crash on MoE models ggml changes relating to the ggml tensor library for machine learning
#25028 opened Jun 26, 2026 by liminfei-amd Contributor Loading…
1 task done
SYCL: add oneMKL GEMM flash attention for XMX-accelerated prompt proc… ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25025 opened Jun 26, 2026 by johnkarlhill Loading…
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.