Skip to content

Releases: joninco/llama.cpp

b6220

21 Aug 00:55
5682a37

Choose a tag to compare

sched : copy only the used experts when offloading prompt processing …

b6209

19 Aug 20:54
fb22dd0

Choose a tag to compare

opencl: mark `argsort` unsupported if cols exceed workgroup limit (#1…

b6170

15 Aug 08:46
4227c9b

Choose a tag to compare

CUDA: fix negative KV_max values in FA (#15321)

b6153

14 Aug 10:39
3ea913f

Choose a tag to compare

perplexity: give more information about constraints on failure (#15303)

* perplexity: give more information about constraints on failure

This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.

* log formatting

* log error and return instead of storing max_seq_exceeded int

* check if s0 is zero for -np check

b6144

13 Aug 09:41
00f35d5

Choose a tag to compare

ggml : repack block_iq4_nlx8 (#14904)

ggml-ci