Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
455 commits
Select commit Hold shift + click to select a range
370c06d
perf: FlashAttention 2nd MM uses TensorFMA and optimizations
marty1885 Mar 29, 2026
f5d1c41
hexagon: dma optimizations (mostly fixing regressions) (#21137)
max-krasnyansky Mar 29, 2026
582db50
cleanup: flashattention reorg
marty1885 Mar 29, 2026
656f770
perf: optimizations and fixes
marty1885 Mar 29, 2026
ec16a07
Optimize MOE GEMV kernel for BS > 1. (#20905)
gaugarg-nv Mar 29, 2026
69b2192
feat: L2SCP API and make FlashAttention support DV = 256 for gemma
marty1885 Mar 29, 2026
7c20367
add missing ROPE_FACTORS_LONG/SHORT for MiniCPM (#21150)
CISC Mar 29, 2026
24670b8
perf: parallelize norms beyond single row
marty1885 Mar 30, 2026
4db780e
feat: GATED_DELTA_NET support and relaxed L2_NORM requirment
marty1885 Mar 30, 2026
4ea3478
feat: loosen RMS_NORM, NORM, ROPE contingous req too
marty1885 Mar 30, 2026
e2b8b12
feat: repeat supports brocasting on dim 0 and loosen cont check
marty1885 Mar 30, 2026
243b7be
feat: FILL and DIAG operator
marty1885 Mar 30, 2026
23530ba
feat: loosen UNARY support chcek
marty1885 Mar 30, 2026
043d91a
feat: TRI support
marty1885 Mar 30, 2026
22da6a1
feat: SOLVE_TRI support
marty1885 Mar 30, 2026
04d62da
feat: basic SET support
marty1885 Mar 30, 2026
3fed43d
feat: loosen CONT req
marty1885 Mar 30, 2026
abf9a62
server: wrap headers for mcp proxy (#21072)
ngxson Mar 30, 2026
7524b04
perf: fp16_to_fp32 use ASM
marty1885 Mar 30, 2026
e2eb39e
ci : bump ty to 0.0.26 (#21156)
CISC Mar 30, 2026
28cbb11
feat: IMROPE support
marty1885 Mar 30, 2026
58f3e1e
feat: PAD support
marty1885 Mar 30, 2026
278521c
llama-model-loader: print warning when using overrides with mmap (#20…
am17an Mar 30, 2026
6623725
feat: global barrier
marty1885 Mar 30, 2026
389c7d4
webui: Fix branching logic on edit message (#21175)
allozaur Mar 30, 2026
cad2d38
rpc : fix misleading error log (#21184)
rgerganov Mar 30, 2026
64ac9ab
CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21…
ORippler Mar 30, 2026
e378631
fix: view must live on the same backend as backing tensor
marty1885 Mar 30, 2026
cc7ac95
feat: relax CONCAT in ET backend
marty1885 Mar 30, 2026
c522cd5
feat: dead simple CUMSUM implementation
marty1885 Mar 30, 2026
d3bd261
feat: basic SSM_CONV support
marty1885 Mar 30, 2026
29636c1
feat: loosen CONCAT req
marty1885 Mar 30, 2026
7a561b0
feat: relax GATED_DELTA_NET and add SET support proper
marty1885 Mar 30, 2026
6f4aa8b
cleanup: cleanup LCM math
marty1885 Mar 30, 2026
ead417f
jinja : handle empty expressions correctly (#20913)
zeph1912 Mar 30, 2026
24ab03c
feat: SWIGLU single input
marty1885 Mar 30, 2026
84ae843
CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122)
ehfd Mar 30, 2026
fe05d58
feat: SSM_SCAN support
marty1885 Mar 30, 2026
08f2145
opencl: add q4_K gemm and gemv kernels for Adreno (#20919)
shaofeiqi Mar 30, 2026
913c266
feat: el_map supports non aligned tensors in best effort
marty1885 Mar 31, 2026
5b93b8f
feat: basic GROUP_NORM support
marty1885 Mar 31, 2026
d5cf7ad
feat: loosen MUL_MAT capablities slightly
marty1885 Mar 31, 2026
faa2678
feat: loosen MUL_MAT and GET_ROWS and add IM2COL
marty1885 Mar 31, 2026
40ed356
feat: special case for softmax 1x1x1x1
marty1885 Mar 31, 2026
5ce013c
common : Disable backend sampling if reasoning budget is enabled (#21…
Galunid Mar 31, 2026
26dac84
vendor : update BoringSSL to 0.20260327.0 (#21211)
angt Mar 31, 2026
93cdc69
feat: loosen SOFT_MAX req in ET backend
marty1885 Mar 31, 2026
539444c
fix: el_map unaligned acse fixes
marty1885 Mar 31, 2026
dcedd0d
perf: optimize zero_acc_vec in flash_attn_ext_f16_me
marty1885 Mar 31, 2026
4453e77
server/webui: cleanup dual representation approach, simplify to opena…
pwilkin Mar 31, 2026
fcc2d59
fix: include API key in CORS proxy requests for MCP connections (#21193)
satishkc7 Mar 31, 2026
d8621ad
perf: use hart 1 for packing in MM and FA for FP16
marty1885 Mar 31, 2026
90aa83c
common: add bounds check in common_init_result::sampler to prevent se…
mtmcp Mar 31, 2026
62278ce
sycl : enhance fattn perf (#21185)
arthw Mar 31, 2026
41361c8
common : move up common_init() and fix Windows UTF-8 logs (#21176)
angt Mar 31, 2026
0be6c7c
ggml : bump version to 0.9.9 (ggml/1449)
ggerganov Mar 30, 2026
9281dd1
sync : ggml
ggerganov Mar 31, 2026
eec6f85
CI: Enable CPU and Vulkan ARM64 Release (#21207)
ehfd Mar 31, 2026
0b6ff47
fix: correct misspellings in code comments (#21217)
lainon1 Mar 31, 2026
624733d
common : gpt-oss handle builtin and unsolicited tool calls (#21213)
aldehir Mar 31, 2026
0acc453
feat: kernel semaphore
marty1885 Mar 31, 2026
81493cb
perf: better instruction sequence in FlashAttention
marty1885 Mar 31, 2026
4a00bbf
server: (webui) no more gzip compression (#21073)
ngxson Mar 31, 2026
632219a
CANN: fix multi-thread set_tensor race conditions (#20151)
hipudding Mar 31, 2026
6307ec0
common : cleanup logs and modernize the progress bar (#21215)
angt Mar 31, 2026
0fcb376
fix: Use lower-case proxy headers naming (#21235)
allozaur Mar 31, 2026
825eb91
ggml-webgpu: port all AOT operators to JIT (#20728)
abhijitramesh Mar 31, 2026
82764c3
ggml webgpu: quantized buffers to u32 + wider browser/device support …
reeselevine Apr 1, 2026
4951250
llama : refactor llama_model_quantize_params to expose a pure C inter…
EAddario Apr 1, 2026
8845816
CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)
anavp-nvidia Apr 1, 2026
73f6302
fix: gated_delta_net with proper masking
marty1885 Apr 1, 2026
865dd09
perf: better parallelization for GATED_DELTA_NET
marty1885 Apr 1, 2026
a8b13a4
perf: parallelize SSM_CONV over nr
marty1885 Apr 1, 2026
2b86e5c
ggml-cpu: fix fallback for RVV kernels without zvfh (#21157)
taimur-10x Apr 1, 2026
d43375f
ggml : fix RWKV ops thread assignment (#21226)
ggerganov Apr 1, 2026
88d5f8f
CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host sele…
IMbackK Apr 1, 2026
02e7e04
perf: vectorize SSM_CONV
marty1885 Apr 1, 2026
e1cb817
memory: respect unified KV cache in hybrid memory for eval tasks (#21…
mudler Apr 1, 2026
84f82e8
ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
michaelw9999 Apr 1, 2026
6b949d1
sycl : support nvfp4 type in mul_mat (#21227)
arthw Apr 1, 2026
296bc05
ggml : bump version to 0.9.10 (ggml/1454)
ggerganov Apr 1, 2026
6422036
sync : ggml
ggerganov Apr 1, 2026
0356e33
scripts: add function call test script (#21234)
ngxson Apr 1, 2026
744c0c7
llama : rotate activations for better quantization (#21038)
ggerganov Apr 1, 2026
1d6d4cf
fix: tool call parsing for LFM2 and LFM2.5 models (#21242)
jbuchananr Apr 1, 2026
8710e5f
hexagon: improve RMS_NORM and DIV accuracy (#21251)
aparmp-quic Apr 1, 2026
5a0ed51
Update Dawn version in WebGPU CI (#20784)
nikhilJain17 Apr 1, 2026
6de97b9
kleidiai: add CPU feature detection to CI run script (#20394)
martin-klacer-arm Apr 1, 2026
86221cf
CUDA: fix FA kernel selection logic (#21271)
JohannesGaessler Apr 1, 2026
12dbf1d
server: Bypass API Key validation for WebUI static bundle assets (#21…
allozaur Apr 1, 2026
95a6eba
opencl: fix leak in Adreno q8_0 path (#21212)
lhez Apr 1, 2026
c30e012
contrib : rewrite AGENTS.md, make it more clear about project values …
ngxson Apr 1, 2026
fbd441c
hexagon : add cumsum op support (#21246)
tboinovski1 Apr 2, 2026
4888137
sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (#21283)
arthw Apr 2, 2026
bc07d55
ggml : bump version to 0.9.11 (ggml/1456)
ggerganov Apr 2, 2026
dae2bf4
sync : ggml
ggerganov Apr 2, 2026
d6dac92
Ignore Transfer-Encoding header. (#20269)
crmky Apr 2, 2026
17193cc
kv-cache : do not quantize SWA KV cache (#21277)
ggerganov Apr 2, 2026
6137c32
chat : add Granite 4.0 chat template with correct tool_call role mapp…
jesus-talavera-ibm Apr 2, 2026
e15efe0
Relax prefill parser to allow space. (#21240)
pwilkin Apr 2, 2026
2233737
common : add commentary rules for gpt-oss-20b (#21286)
aldehir Apr 2, 2026
63f8fe0
model, mtmd: fix gguf conversion for audio/vision mmproj (#21309)
ngxson Apr 2, 2026
5803c8d
tests: allow exporting graph ops from HF file without downloading wei…
0cc4m Apr 2, 2026
a1cfb64
ggml-webgpu: add vectorized flash attention (#20709)
ArberSephirotheca Apr 2, 2026
7992aa7
tests : add unit test coverage for llama_tensor_get_type (#20112)
bartowski1182 Apr 2, 2026
5208e2d
fix: gemma 4 template (#21326)
pwilkin Apr 2, 2026
7c7d6ce
[HIP] Bump ROCm version to 7.2.1 (#21066)
slojosic-amd Apr 2, 2026
6b500e2
perf: optimize MUL_MAT for q8
marty1885 Apr 3, 2026
f49e917
ci : add AMD ZenDNN label to PR labeler (#21345)
z-vishal Apr 3, 2026
c2d00ab
Merge remote-tracking branch 'upstream/master' into backend-dev-2
marty1885 Apr 3, 2026
39b27f0
(revert) kv-cache : do not quantize SWA KV cache (#21332)
ggerganov Apr 3, 2026
57ace0d
chat : avoid including json in chat.h (#21306)
ggerganov Apr 3, 2026
21eeeae
feat: support Gemma 4
marty1885 Apr 3, 2026
0c58ba3
rpc : reuse compute graph buffers (#21299)
rgerganov Apr 3, 2026
b069b10
vocab: fix Gemma4 tokenizer (#21343)
pwilkin Apr 3, 2026
f1ac841
ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
z-vishal Apr 3, 2026
f851fa5
fix: add openssl to nix dependencies (#21353) (#21355)
Tillerino Apr 3, 2026
43a4ee4
HIP: build eatch ci build test for a different architecture (#21337)
IMbackK Apr 3, 2026
d3416a4
fix: remove stale assert (#21369)
pwilkin Apr 3, 2026
887535c
ci: add more binary checks (#21349)
taronaeo Apr 3, 2026
1f34806
jinja: coerce input for string-specific filters (#21370)
CISC Apr 3, 2026
384c007
docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)
jeromew Apr 3, 2026
277ff5f
docker : bump cuda12 to 12.9.1 (#20920)
M1DNYT3 Apr 3, 2026
aeb1783
fix: support multi-device
marty1885 Apr 3, 2026
af5c138
common : fix tool call type detection for nullable and enum schemas (…
sacredvoid Apr 3, 2026
f1f793a
common/parser: fix call ID detection (Mistral parser mostly) + atomic…
pwilkin Apr 3, 2026
50e0ad0
server: save and clear idle slots on new task (`--clear-idle`) (#20993)
yychyo Apr 3, 2026
e439700
ci: Add Windows Vulkan backend testing on Intel (#21292)
rillomas Apr 3, 2026
d006858
ggml-webgpu: move from parameter buffer pool to single buffer with of…
reeselevine Apr 3, 2026
b7ad48e
llama: add custom newline split for Gemma 4 (#21406)
am17an Apr 4, 2026
650bf14
llama-model: read final_logit_softcapping for Gemma 4 (#21390)
ssam18 Apr 4, 2026
d01f627
common : respect specified tag, only fallback when tag is empty (#21413)
angt Apr 4, 2026
9c69907
server: Fix undefined timing measurement errors in server context (#2…
thedanhoffman Apr 4, 2026
b863507
common : add gemma 4 specialized parser (#21418)
aldehir Apr 4, 2026
661e9ac
ci: fix vulkan workflow referencing non-existent action (#21442)
nisparks Apr 5, 2026
c08d28d
ci: lower cuda12 floor to 12.8.1 for broader host compatibility (#21438)
M1DNYT3 Apr 5, 2026
5d3a4a7
server : fix logging of build + system info (#21460)
ddh0 Apr 5, 2026
761797f
ci : use default RISE RISC-V Runners (#21263)
luhenry Apr 5, 2026
af76639
model : add HunyuanOCR support (#21395)
richarddd Apr 5, 2026
58190cc
llama : correct platform-independent loading of BOOL metadata (#21428)
anchortense Apr 5, 2026
25eec6f
hexagon: slight optimization for argosrt output init (#21463)
YardenTal44 Apr 6, 2026
f51fd36
sycl : handle other FA case (#21377)
arthw Apr 6, 2026
400ac8e
convert : set "add bos" == True for Gemma 4 (#21500)
ggerganov Apr 6, 2026
3979f2b
docs: add hunyuan-ocr gguf, also add test [no ci] (#21490)
ngxson Apr 6, 2026
482d862
server : handle unsuccessful sink.write in chunked stream provider (#…
lainon1 Apr 6, 2026
941146b
convert : fix block_ff_dim retrieval for lfm2 (#21508)
CISC Apr 6, 2026
4aa962e
vocab : add byte token handling to BPE detokenizer for Gemma4 (#21488)
aldehir Apr 6, 2026
94ca829
llama-bench: add `-fitc` and `-fitt` to arguments (#21304)
am17an Apr 6, 2026
15f786e
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel (#21159)
gaugarg-nv Apr 6, 2026
506200c
cli: fix stripping of \n in multiline input (#21485)
bipinyadav3175 Apr 6, 2026
2e1f0a8
ggml: add Q1_0 1-bit quantization support (CPU) (#21273)
khosravipasha Apr 6, 2026
d0a6dfe
ggml-webgpu: Add the support of `MUL_MAT_ID` (#21147)
yomaytk Apr 6, 2026
0033f53
docs: fix typo in build.md (emdawbwebgpu -> emdawnwebgpu) (#21518)
CastelDazur Apr 7, 2026
5c08f4a
feat: broader GLU support
marty1885 Apr 7, 2026
12f2b2b
feat: unary ops supports view
marty1885 Apr 7, 2026
0988acc
[SYCL] Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (#…
PMZFX Apr 7, 2026
d13ac81
fix: repair fp16 MM using matrix engine
marty1885 Apr 7, 2026
d1f82e3
Fix rtl text rendering (#21382)
Kabir08 Apr 7, 2026
ecce008
fix: Detect streaming state in reasoning content blocks (#21549)
allozaur Apr 7, 2026
71a81f6
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) …
aviallon Apr 7, 2026
482192f
webui : store reasoning_content so it is sent back in subsequent requ…
aldehir Apr 7, 2026
edd4d9b
vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029)
mkoker Apr 7, 2026
2a619f6
ggml: Vulkan build, Linux -- output error string for errno on fork fa…
tomoverlund Apr 7, 2026
22fc791
ggml : deprecate GGML_OP_ADD1 (#21363)
ggerganov Apr 7, 2026
e8f5082
server : fix restore for checkpoints with pos_min == 0 (#21510)
ggerganov Apr 7, 2026
a8ec0df
llama: remove per-arch tensor name lists (#21531)
JohannesGaessler Apr 7, 2026
0d049d6
unicode : add custom Qwen2 regex handler to fix segfault on long inpu…
nhs000 Apr 7, 2026
69c28f1
llama-server: fix model params not propagated (#21509)
taronaeo Apr 7, 2026
de1aa6f
CUDA: check for buffer overlap before fusing (#21566)
am17an Apr 7, 2026
957d717
ggml-webgpu: parameterize submission size and add iOS specific limits…
reeselevine Apr 7, 2026
4eb1951
kv-cache : support attention rotation for heterogeneous iSWA (#21513)
ggerganov Apr 7, 2026
93bdc61
gguf-py : fix missing comma after bad merge in tensor-mapping (#21558)
danbev Apr 7, 2026
66c4f9d
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (#21168)
iacopPBK Apr 7, 2026
c5ce4bc
CUDA: make cuda graphs props check faster (#21472)
am17an Apr 8, 2026
5c4aae6
devops: kleidiai: provide KleidiAI-Enabled ARM Release Artifact (#21259)
martin-klacer-arm Apr 8, 2026
97508ac
webui: fix syntax highlighting lost after streaming for non-common la…
hmblair Apr 8, 2026
09343c0
model : support step3-vl-10b (#21287)
forforever73 Apr 8, 2026
ece522f
chore: Remove legacy files (#21606)
allozaur Apr 8, 2026
3bd9aa1
chore: Update labeler to have separate labels for `server/webui` and …
allozaur Apr 8, 2026
ae65fbd
tests : remove obsolete .mjs script (#21615)
ggerganov Apr 8, 2026
85d482e
parser: fix MiniMax handling (#21573)
pwilkin Apr 8, 2026
87f4744
examples : disable cb_eval callback for --save-logits (#21553)
danbev Apr 8, 2026
198f64f
perf: handle large N GEMV better
marty1885 Apr 8, 2026
5764d7c
gemma : perform per-layer projections in the first layer (#21612)
ggerganov Apr 8, 2026
dcdcbad
metal: Q1_0 backend (#21528)
khosravipasha Apr 8, 2026
5473949
webgpu : Query for adapter support when registering WebGPU backend (#…
reeselevine Apr 8, 2026
3ba12fe
kv-cache : extend cache quantization checks (#21586)
Green-Sky Apr 8, 2026
e9fd962
Propose fix a couple of typos (#21581)
jeis4wpi Apr 8, 2026
4a05e0c
webui : send both backend_sampling == false/true (#18781)
ggerganov Apr 8, 2026
512b23f
perf: better q8_0 MM
marty1885 Apr 8, 2026
d9a12c8
vocab : remove </s> eog token if gemma4 (#21492)
aldehir Apr 8, 2026
6606000
server: respect the ignore eos flag (#21203)
ykhrustalev Apr 8, 2026
2dcb7f7
fix: free ctx_copy in ggml_opt_free to plug per-training-session leak…
RealOrko Apr 8, 2026
d12cc3d
CUDA: also store `node->src->data` ptrs for equality check (#21635)
am17an Apr 8, 2026
4293919
common : skip non-primary GGUF split files when selecting model (#21633)
angt Apr 9, 2026
8a132fa
vulkan: unify type macros to use Vx instead of _VECx (#21605)
0cc4m Apr 9, 2026
8a65a7a
ci: drop v5 `all:` composition from labeler.yml (#21627)
Marxist-Leninist Apr 9, 2026
b54cb2e
sycl : add flash-attn support for head size 512 (#21654)
qnixsynapse Apr 9, 2026
75511a8
webui: Add option to pre-encode conversation for faster next turns (#…
allozaur Apr 9, 2026
3ee9da0
server : fix grammar commandline args (#21543)
AUTOMATIC1111 Apr 9, 2026
9949ad0
fix: Model Selector choice sync (#21628)
allozaur Apr 9, 2026
5e9c635
metal : add missing mm-id specializations for q1_0 (#21662)
ggerganov Apr 9, 2026
243532e
jinja : support ensure_ascii=true, string repetition and int/float se…
kwajiehao Apr 9, 2026
0ec191e
vocab: add gemma4 tokenizer tests, fix edge case (#21534)
pwilkin Apr 9, 2026
501aeed
mtmd: support dots.ocr (#17575)
ngxson Apr 9, 2026
057dba3
model: fix multimodal padding token for gemma3n/gemma4 (#21625)
ngxson Apr 9, 2026
2622975
common : simplify autoparser tagged parser rules (#21216)
aldehir Apr 9, 2026
ddf03c6
common : fix ambiguous grammar rule in gemma4 (#21661)
aldehir Apr 9, 2026
4ef9301
webui: add "Send message on Enter" setting (#21577)
mourix Apr 9, 2026
c8ac02f
requirements : update transformers to 5.5.1 (#21617)
danbev Apr 9, 2026
009a113
ggml : check return value of CUB calls used in argsort and top-k (the…
fairydreaming Apr 9, 2026
d6f3030
ggml: backend-agnostic tensor parallelism (experimental) (#19378)
JohannesGaessler Apr 9, 2026
d132f22
HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (#21570)
andyluo7 Apr 9, 2026
e34f042
CUDA: fuse muls (#21665)
am17an Apr 10, 2026
e095a48
common : add fluidity to the progress bar (#21671)
angt Apr 10, 2026
7b69125
vulkan: Support Q1_0 (#21539)
jeffbolznv Apr 10, 2026
3f8752b
docs : fix broken link to ggml-openvino in OPENVINO.md (#21709)
ibelem Apr 10, 2026
06622b5
perf: better set_rows
marty1885 Apr 10, 2026
d7ff074
common : enable reasoning budget sampler for gemma4 (#21697)
berkidem Apr 10, 2026
f989a6e
webui: Static build output improvements (#21667)
allozaur Apr 10, 2026
0893f50
common: mark --split-mode tensor as experimental (#21684)
JohannesGaessler Apr 10, 2026
fb38d6f
common : fix when loading a cached HF models with unavailable API (#2…
angt Apr 10, 2026
5dd1025
server : ignore --alias when using --models-preset (#21380)
angt Apr 10, 2026
e4fed9d
ggml-webgpu: address quantization precision and backend lifecycle man…
Constannnnnt Apr 10, 2026
bfd1f45
ggml-webgpu: support non-square subgroup matrix configs for Intel GPU…
SharmaRithik Apr 10, 2026
e62fa13
model : make Gemma 4 shared-KV tail attn_k tensors optional on load (…
MoonRide303 Apr 10, 2026
05b3caa
common : add callback interface for download progress (#21735)
angt Apr 10, 2026
3fc6506
common : better align to the updated official gemma4 template (#21704)
aldehir Apr 10, 2026
9aa2807
hexagon: improved Op queuing, buffer and cache management (#21705)
max-krasnyansky Apr 10, 2026
81069a8
hexagon: add support for linux on snapdragon (#21707)
tboinovski1 Apr 10, 2026
b136b62
fix: Fix broken structured output when using $refs in json_schema (#2…
Galunid Apr 10, 2026
a29e4c0
CUDA: also store node->src ne/nb for graph equality (#21736)
am17an Apr 11, 2026
660386f
py : Bump typer to latest to fix huggingface_hub issue (#21701)
bartowski1182 Apr 11, 2026
2b2cd57
ggml : fix a few instances of missing GGML_TYPE_Q1_0 cases (#21716)
CISC Apr 11, 2026
865ff06
TP: fix Qwen 3 Next data split (#21732)
JohannesGaessler Apr 11, 2026
af1127d
opencl: add basic support for q5_k (#21593)
shaofeiqi Apr 11, 2026
073bb2c
mtmd : add MERaLiON-2 multimodal audio support (#21756)
SiruiHe Apr 11, 2026
ff5ef82
CUDA: skip compilation of superfluous FA kernels (#21768)
JohannesGaessler Apr 11, 2026
6313acb
docs: add guide on how to add multimodal support (#21778)
ngxson Apr 12, 2026
9e209c5
fix: Proper messages rendering for "Show raw output" (#21672)
allozaur Apr 12, 2026
547765a
mtmd: add Gemma 4 audio conformer encoder support (#21421)
stephencox-ict Apr 12, 2026
aa4695c
mtmd: add gemma 4 test (vision + audio) [no ci] (#21806)
ngxson Apr 12, 2026
1e9d771
convert : force f16 or f32 on step3-vl conv weights (#21646)
CISC Apr 12, 2026
21a4933
mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) (#19441)
ngxson Apr 12, 2026
82764d8
mtmd: fix crash when sending image under 2x2 pixels (#21711)
mzsergiu Apr 12, 2026
873c825
sycl: disable Q1_0 in backend and cleanup unused variables (#21807)
qnixsynapse Apr 13, 2026
bafae27
Remove extra conditional check on debug mode. (#21798)
yomaytk Apr 13, 2026
4effc2b
Merge remote-tracking branch 'upstream/master' into backend-dev-2
marty1885 Apr 13, 2026
83ee00b
add back deleted files
marty1885 Apr 13, 2026
42c81a0
fix: repair after merge
marty1885 Apr 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/cann.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# Define the CANN base image for easier version updates later
ARG CHIP_TYPE=910b
ARG CANN_BASE_IMAGE=quay.io/ascend/cann:8.3.rc2-${CHIP_TYPE}-openeuler24.03-py3.11
ARG CANN_BASE_IMAGE=quay.io/ascend/cann:8.5.0-${CHIP_TYPE}-openeuler24.03-py3.11

# ==============================================================================
# BUILD STAGE
Expand Down
13 changes: 8 additions & 5 deletions .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
ARG UBUNTU_VERSION=22.04
ARG UBUNTU_VERSION=24.04

FROM ubuntu:$UBUNTU_VERSION AS build

ARG TARGETARCH

RUN apt-get update && \
apt-get install -y build-essential git cmake libssl-dev
apt-get install -y gcc-14 g++-14 build-essential git cmake libssl-dev

ENV CC=gcc-14 CXX=g++-14

WORKDIR /app

Expand Down Expand Up @@ -34,7 +36,7 @@ RUN mkdir -p /app/full \
FROM ubuntu:$UBUNTU_VERSION AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand All @@ -55,8 +57,9 @@ RUN apt-get update \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
python3-wheel \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
95 changes: 0 additions & 95 deletions .devops/cuda-new.Dockerfile

This file was deleted.

13 changes: 8 additions & 5 deletions .devops/cuda.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04
ARG UBUNTU_VERSION=24.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=12.4.0
ARG CUDA_VERSION=12.8.1
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

Expand All @@ -12,7 +12,9 @@ FROM ${BASE_CUDA_DEV_CONTAINER} AS build
ARG CUDA_DOCKER_ARCH=default

RUN apt-get update && \
apt-get install -y build-essential cmake python3 python3-pip git libssl-dev libgomp1
apt-get install -y gcc-14 g++-14 build-essential cmake python3 python3-pip git libssl-dev libgomp1

ENV CC=gcc-14 CXX=g++-14 CUDAHOSTCXX=g++-14

WORKDIR /app

Expand All @@ -39,7 +41,7 @@ RUN mkdir -p /app/full \
FROM ${BASE_CUDA_RUN_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand All @@ -60,7 +62,8 @@ RUN apt-get update \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
python3-wheel \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
Expand Down
21 changes: 19 additions & 2 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2025.2.2-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.2-0-devel-ubuntu24.04

## Build Image

Expand Down Expand Up @@ -33,8 +33,25 @@ RUN mkdir -p /app/full \

FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS base

ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/$IGC_VERSION/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/$COMPUTE_RUNTIME_VERSION/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
2 changes: 1 addition & 1 deletion .devops/llama-cli-cann.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ASCEND_VERSION=8.1.RC1.alpha001-910b-openeuler22.03-py3.10
ARG ASCEND_VERSION=8.5.0-910b-openeuler22.03-py3.10

FROM ascendai/cann:$ASCEND_VERSION AS build

Expand Down
2 changes: 1 addition & 1 deletion .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ RUN mkdir -p /app/full \
FROM ${BASE_MUSA_RUN_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
7 changes: 5 additions & 2 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
rocmPackages,
vulkan-headers,
vulkan-loader,
curl,
openssl,
shaderc,
useBlas ?
builtins.all (x: !x) [
Expand All @@ -41,6 +41,7 @@
effectiveStdenv ? if useCuda then cudaPackages.backendStdenv else stdenv,
enableStatic ? effectiveStdenv.hostPlatform.isStatic,
precompileMetalShaders ? false,
useWebUi ? true,
}:

let
Expand Down Expand Up @@ -159,11 +160,13 @@ effectiveStdenv.mkDerivation (finalAttrs: {
++ optionals useMpi [ mpi ]
++ optionals useRocm rocmBuildInputs
++ optionals useBlas [ blas ]
++ optionals useVulkan vulkanBuildInputs;
++ optionals useVulkan vulkanBuildInputs
++ [ openssl ];

cmakeFlags =
[
(cmakeBool "LLAMA_BUILD_SERVER" true)
(cmakeBool "LLAMA_BUILD_WEBUI" useWebUi)
(cmakeBool "BUILD_SHARED_LIBS" (!enableStatic))
(cmakeBool "CMAKE_SKIP_BUILD_RPATH" true)
(cmakeBool "GGML_NATIVE" false)
Expand Down
2 changes: 1 addition & 1 deletion .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl\
&& apt-get install -y libgomp1 libtbb12 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
12 changes: 6 additions & 6 deletions .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
ARG UBUNTU_VERSION=24.04

# This needs to generally match the container host's environment.
ARG ROCM_VERSION=7.2
ARG AMDGPU_VERSION=7.2
ARG ROCM_VERSION=7.2.1
ARG AMDGPU_VERSION=7.2.1

# Target the ROCm build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete
Expand All @@ -12,11 +12,11 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
# This is mostly tied to rocBLAS supported archs.
# check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html
# check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.1/reference/system-requirements.html
# check https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/compatibility/compatibilityrad/native_linux/native_linux_compatibility.html
# check https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/compatibility/compatibilityryz/native_linux/native_linux_compatibility.html

ARG ROCM_DOCKER_ARCH='gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1151;gfx1150;gfx1200;gfx1201'
ARG ROCM_DOCKER_ARCH='gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1150;gfx1200;gfx1201'

# Set ROCm architectures
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
Expand Down Expand Up @@ -58,7 +58,7 @@ RUN mkdir -p /app/full \
FROM ${BASE_ROCM_DEV_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand All @@ -79,7 +79,7 @@ RUN apt-get update \
git \
python3-pip \
python3 \
python3-wheel\
python3-wheel \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
Expand Down
17 changes: 10 additions & 7 deletions .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,20 @@ COPY --from=build /app/full /app

WORKDIR /app

ENV PATH="/root/.venv/bin:/root/.local/bin:${PATH}"

# Flag for compatibility with pip
ARG UV_INDEX_STRATEGY="unsafe-best-match"
RUN apt-get update \
&& apt-get install -y \
build-essential \
curl \
git \
python3.13 \
python3.13-dev \
python3-pip \
python3-wheel \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.13 100 \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
ca-certificates \
&& curl -LsSf https://astral.sh/uv/install.sh | sh \
&& uv python install 3.13 \
&& uv venv --python 3.13 /root/.venv \
&& uv pip install --python /root/.venv/bin/python -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
16 changes: 8 additions & 8 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,6 @@ indent_style = tab
[prompts/*.txt]
insert_final_newline = unset

[tools/server/public/*]
indent_size = 2

[tools/server/public/deps_*]
trim_trailing_whitespace = unset
indent_style = unset
indent_size = unset

[tools/server/deps_*]
trim_trailing_whitespace = unset
indent_style = unset
Expand Down Expand Up @@ -61,6 +53,14 @@ charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/public/**]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset

[benches/**]
indent_style = unset
indent_size = unset
Expand Down
4 changes: 4 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Treat the generated single-file WebUI build as binary for diff purposes.
# Git's pack-file delta compression still works (byte-level), but this prevents
# git diff from printing the entire minified file on every change.
tools/server/public/index.html -diff
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/010-bug-compilation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, OpenVINO, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
multiple: true
validations:
required: true
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, OpenVINO, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
multiple: true
validations:
required: true
Expand Down
Loading
Loading