-
Notifications
You must be signed in to change notification settings - Fork 17k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
chat : add MiniMax M2 specialized tool-call handler
testing
Everything test related
#22106
opened Apr 19, 2026 by
doctorjei
Loading…
[Speculative decoding] feat: add DFlash support
examples
model
Model specific
python
python script changes
server
#22105
opened Apr 19, 2026 by
ruixiang63
•
Draft
server: Add URL handling for multimodal_data on /completion endpoint
examples
server
#22104
opened Apr 19, 2026 by
cetarthoriphros
Contributor
Loading…
feat: Support sarashina2.2-vision-3b model
examples
python
python script changes
#22103
opened Apr 19, 2026 by
samuraieng
Loading…
fix: GLM-DSA crash in llama-tokenize when using vocab_only
#22102
opened Apr 19, 2026 by
ssam18
Contributor
Loading…
mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech)
examples
python
python script changes
#22101
opened Apr 19, 2026 by
ReinforcedKnowledge
Loading…
[WebGPU] Implement async tensor api and event api
devops
improvements to build systems and github actions
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#22099
opened Apr 18, 2026 by
nikhilJain17
Contributor
•
Draft
[SYCL] Add Zero-Copy path with Cache Flushing for Intel UMA (Lunar Lake/Meteor Lake)
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22098
opened Apr 18, 2026 by
i-Charlys
Loading…
hip: bypass memory pool for flash attention f16 temp buffers
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
fix(chat): graceful degradation for malformed tool call arguments
#22089
opened Apr 18, 2026 by
huckstack
Loading…
convert : support sentence-transformer 5.4 config files
python
python script changes
#22087
opened Apr 18, 2026 by
Bing-su
Contributor
Loading…
[WIP]hexagon: hmx opt phase2
ggml
changes relating to the ggml tensor library for machine learning
Hexagon
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change)
examples
testing
Everything test related
#22082
opened Apr 18, 2026 by
ngxson
Contributor
Loading…
server: always include usage in streaming responses
examples
server
#22081
opened Apr 18, 2026 by
brywil
Loading…
[SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24 package.
devops
improvements to build systems and github actions
documentation
Improvements or additions to documentation
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22078
opened Apr 18, 2026 by
NeoZhangJianyu
Contributor
Loading…
common/autoparser : allow space after tool call
testing
Everything test related
#22073
opened Apr 18, 2026 by
aldehir
Contributor
Loading…
sycl: Battlemage (BMG) optimizations — AOT, Q5_K reorder, PAD stride fix, new ops, oneMKL routing
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22066
opened Apr 17, 2026 by
aicss-genai
Loading…
GGML: Allow static build with dynamic loaded backends
ggml
changes relating to the ggml tensor library for machine learning
#22059
opened Apr 17, 2026 by
ervanalb
Loading…
2 tasks done
quant: handle shared-KV layer tensors in imatrix-dependent quantization
testing
Everything test related
#22054
opened Apr 17, 2026 by
ajfonthemove
Loading…
3 tasks
CUDA: refactor mma data loading for AMD
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#22051
opened Apr 17, 2026 by
JohannesGaessler
Contributor
Loading…
Reduce CPU overhead in meta backend: cache subgraph splits when cgraph is unchanged
ggml
changes relating to the ggml tensor library for machine learning
#22041
opened Apr 17, 2026 by
gaugarg-nv
Contributor
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.