[pull] master from ggml-org:master by pull[bot] · Pull Request #856 · LongLeCE/llama.cpp

pull · 2026-02-06T20:42:02Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit d30e59b. * spec : check if the target context is compatible for spec decoding

Only test non-F16 for head size 64 and 72 (one a multiple of QK, one not).

* Fix SYCL CEIL operator * sycl: implement GGML_OP_CEIL

…#19310) * ggml webgpu: port binary operators to use pre-wgsl * Add binary.wgsl: unified shader with conditionals for all 4 ops * Add gen_binary_shaders.cpp: build tool for using pre_wgsl preprocessor * Remove bin_op.tmpl.wgsl and binary.wgsl (Python template) * Update CMake to generate binary operator shaders at build time * ggml-webgpu: migrate binary ops to JIT compilation with overlap handling * port binary operators from AOT to pre-wgsl JIT compilation * add src1=dst overlap handling for binary ops * use compile-time workgroup size defines instead of runtime overrides * ggml-webgpu: complete overlap handling for binary ops * add support for inplace & overlap case in binding setup * restructure conditional logic to handle all overlap cases * ensure all buffer bindings are correctly assigned for edge cases * ggml-webgpu: remove unused binary overlap cases Remove src0==src1 binary overlap case that never occurs in practice. * keep INPLACE (src0==dst), OVERLAP (src1==dst), DEFAULT * remove unused src0==src1 and all-same variant * refactor wgsl to eliminate duplication

* gguf-py: Bump sentencepiece version There's a new version that's been out for a while that addresses the issues mentioned in #14200. There's a long chain of reasons I would like this change, but the short version is that it allows people who use both `sentencepiece` and `gguf` to take advantage of these fixes. On conda-forge, currently, it locks the version (since there is no notion of optional dependencies). Regardless, I don't think this should be too controversial. * review feedback

@CISC

* Support Step3.5-Flash * fix: norm.weight + 1 (HF zero_centered=true) * step35: simplify GGUF conversion + drop redundant rope KVs * Address review feedback * rename limits -> clamp * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Apply suggestion from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * rename swiglu limits -> swiglu clamp in LLM_KV * avoid CI fail * Apply suggestions from code review * Apply suggestions from code review * disabled KV shifting for LLM_ARCH_STEP35 * Apply suggestions from code review * mistakenly removed cmath * add model size && apply missed suggestion * assert partial_rotary_factors * fix CI errors: * load freq_base_swa --------- Co-authored-by: lvyichen <lvyichen@stepfun.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ggerganov and others added 6 commits February 6, 2026 16:47

common : add common_speculative_is_compat() (#19270)

dfde599

* llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit d30e59b. * spec : check if the target context is compatible for spec decoding

tests: reduce number of FA test permutations (#19381)

db6adb3

Only test non-F16 for head size 64 and 72 (one a multiple of QK, one not).

sycl: add F16 support for GGML_OP_CEIL (#19306)

537eadb

* Fix SYCL CEIL operator * sycl: implement GGML_OP_CEIL

pull bot locked and limited conversation to collaborators Feb 6, 2026

pull bot added the ⤵️ pull label Feb 6, 2026

pull bot merged commit b831118 into LongLeCE:master Feb 6, 2026

github-actions bot added documentation Improvements or additions to documentation testing examples python ggml SYCL server model labels Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#856

[pull] master from ggml-org:master#856
pull[bot] merged 6 commits intoLongLeCE:masterfrom
ggml-org:master

pull bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

pull bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pull bot commented Feb 6, 2026 •

edited

Loading