-
Notifications
You must be signed in to change notification settings - Fork 20k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
server : auto-insert media marker in embedding / multimodal prompts
server
#25093
opened Jun 28, 2026 by
TheOneWhoWill
Loading…
sycl: fix check_graph_compatibility() to allow graphs for MoE decode (CONCAT dim!=3, MUL_MAT_ID fused path)
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25089
opened Jun 28, 2026 by
Captain-Tripps
Loading…
5 tasks done
jinja: add --dump-prog for debugging
jinja parser
Issues related to the jinja parser
testing
Everything test related
#25086
opened Jun 27, 2026 by
ngxson
Collaborator
Loading…
hexagon: flash attention rework (optimizations, accuracy improvements, etc)
ggml
changes relating to the ggml tensor library for machine learning
Hexagon
#25085
opened Jun 27, 2026 by
max-krasnyansky
Member
•
Draft
ui: fix stop and reasoning skip in single-model mode
server/ui
#25084
opened Jun 27, 2026 by
ServeurpersoCom
Contributor
Loading…
[SYCL] Use sycl func to fix AOT issue: error: Double type is not supported on this platform.
ggml
changes relating to the ggml tensor library for machine learning
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25081
opened Jun 27, 2026 by
arthw
Contributor
Loading…
server : fix 501 on multimodal models blocking text-only slot save/restore (#21133)
server
#25076
opened Jun 27, 2026 by
CHIPMUNK-T0T
Loading…
1 task done
server-fix : accept text tool outputs in responses API
server
testing
Everything test related
#25073
opened Jun 27, 2026 by
jesco-absolut
Loading…
docs: add Apple A-chipsets model size estimation guide
documentation
Improvements or additions to documentation
#25068
opened Jun 26, 2026 by
Dhruvizzle101
Loading…
Improve utilization (and tg t/s) by reducing K_QUANTS_PER_ITERATION to 1 on DMMV path
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25063
opened Jun 26, 2026 by
malsbat
Contributor
Loading…
fix(context): draft model fit vs load inconsistency
server
#25056
opened Jun 26, 2026 by
wadealexc
Loading…
server : drop the need for vendor subprocess.h
server
#25052
opened Jun 26, 2026 by
mfuntowicz
Contributor
Loading…
ggml-cpu: replace cyclic chunk distribution with atomic work-stealing
ggml
changes relating to the ggml tensor library for machine learning
#25048
opened Jun 26, 2026 by
dnislno
Loading…
server-stream: follow-up on SSE Replay Buffer (#23226)
documentation
Improvements or additions to documentation
server
#25047
opened Jun 26, 2026 by
ServeurpersoCom
Contributor
Loading…
[SYCL] rename the env vars from disable to enable
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25042
opened Jun 26, 2026 by
arthw
Contributor
Loading…
[SYCL] update Q&A for saving power
documentation
Improvements or additions to documentation
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25041
opened Jun 26, 2026 by
arthw
Contributor
Loading…
ui: Custom Prompts library + improvements for System Message, MCP Prompts & Resources UI/UX
documentation
Improvements or additions to documentation
server/ui
ggml : unify tie-breaking to first index across all backends
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
CUDA
Related to the CUDA backend
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
testing
Everything test related
Vulkan
Issues specific to the Vulkan backend
WebGPU
ggml : fix tensor-parallel + -ncmoe crash on MoE models
ggml
changes relating to the ggml tensor library for machine learning
#25028
opened Jun 26, 2026 by
liminfei-amd
Contributor
Loading…
1 task done
SYCL: add oneMKL GEMM flash attention for XMX-accelerated prompt proc…
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25025
opened Jun 26, 2026 by
johnkarlhill
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.