Skip to content

[pull] master from ggml-org:master#814

Merged
pull[bot] merged 4 commits intoLongLeCE:masterfrom
ggml-org:master
Jan 25, 2026
Merged

[pull] master from ggml-org:master#814
pull[bot] merged 4 commits intoLongLeCE:masterfrom
ggml-org:master

Conversation

@pull
Copy link

@pull pull bot commented Jan 25, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

am17an and others added 4 commits January 25, 2026 23:25
* ggml-cpu: Use tiled FA for prompt-processing

the FA performance is gimped on CPU on long contexts because it essentially uses a vector kernel. This PR adds a tiled FA for PP. Perf tuning for tile sizes done on a AMD EPYC single-socket 64-c machine.

* fix out of bounds for mask

* skip rows where there are all masks

* skip tile if mask is inf

* store mask in worksize

* check inf tile earlier
…acOS (#19088)

Co-authored-by: chenbin11 <chenbin11@kuaishou.com>
@pull pull bot locked and limited conversation to collaborators Jan 25, 2026
@pull pull bot added the ⤵️ pull label Jan 25, 2026
@pull pull bot merged commit 0c21677 into LongLeCE:master Jan 25, 2026
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants