[pull] master from ggml-org:master#856
Merged
pull[bot] merged 6 commits intoLongLeCE:masterfrom Feb 6, 2026
Merged
Conversation
* llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit d30e59b. * spec : check if the target context is compatible for spec decoding
Only test non-F16 for head size 64 and 72 (one a multiple of QK, one not).
* Fix SYCL CEIL operator * sycl: implement GGML_OP_CEIL
…#19310) * ggml webgpu: port binary operators to use pre-wgsl * Add binary.wgsl: unified shader with conditionals for all 4 ops * Add gen_binary_shaders.cpp: build tool for using pre_wgsl preprocessor * Remove bin_op.tmpl.wgsl and binary.wgsl (Python template) * Update CMake to generate binary operator shaders at build time * ggml-webgpu: migrate binary ops to JIT compilation with overlap handling * port binary operators from AOT to pre-wgsl JIT compilation * add src1=dst overlap handling for binary ops * use compile-time workgroup size defines instead of runtime overrides * ggml-webgpu: complete overlap handling for binary ops * add support for inplace & overlap case in binding setup * restructure conditional logic to handle all overlap cases * ensure all buffer bindings are correctly assigned for edge cases * ggml-webgpu: remove unused binary overlap cases Remove src0==src1 binary overlap case that never occurs in practice. * keep INPLACE (src0==dst), OVERLAP (src1==dst), DEFAULT * remove unused src0==src1 and all-same variant * refactor wgsl to eliminate duplication
* gguf-py: Bump sentencepiece version There's a new version that's been out for a while that addresses the issues mentioned in #14200. There's a long chain of reasons I would like this change, but the short version is that it allows people who use both `sentencepiece` and `gguf` to take advantage of these fixes. On conda-forge, currently, it locks the version (since there is no notion of optional dependencies). Regardless, I don't think this should be too controversial. * review feedback
* Support Step3.5-Flash * fix: norm.weight + 1 (HF zero_centered=true) * step35: simplify GGUF conversion + drop redundant rope KVs * Address review feedback * rename limits -> clamp * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Apply suggestion from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * rename swiglu limits -> swiglu clamp in LLM_KV * avoid CI fail * Apply suggestions from code review * Apply suggestions from code review * disabled KV shifting for LLM_ARCH_STEP35 * Apply suggestions from code review * mistakenly removed cmath * add model size && apply missed suggestion * assert partial_rotary_factors * fix CI errors: * load freq_base_swa --------- Co-authored-by: lvyichen <lvyichen@stepfun.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )