Releases · spacemit-com/llama.cpp

22 May 06:33

github-actions

spacemit-llama.cpp.riscv64.0.1.1

17ce6aa

v0.1.1 Latest

Latest

Bugfix

Fixed llama-server slot erase behavior for SMT multimodal models.

This update allows multimodal SMT backends to use the /slots/{id}?action=erase API to clear slot context correctly. The change is intended for long-running service scenarios where prompt/KV state must be reset between requests. Slot save and restore
restrictions for multimodal mode remain unchanged.

What's Changed

feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
server: allow slot erase for SMT multimodal requests by @co-seven in #3
version: upgrading version number by @co-seven in #4

New Contributors

@co-seven made their first contribution in #1

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.1

Contributors

co-seven

Assets 3

20 May 06:26

github-actions

spacemit-llama.cpp.riscv64.0.1.0

f798b42

v0.1.0

Feature

This release adds SMT backend support for llama-mtmd, enabling multimodal inference on SpacemiT platforms through a unified llama.cpp integration.

Functional Support

Added SMT backend integration for llama-mtmd
Added multimodal inference support in llama-server
Added end-to-end support for vision-language model inference
Added end-to-end support for speech recognition model inference
Added end-to-end support for OCR and document understanding model inference
Unified model serving flow for SMT-backed multimodal models within the llama.cpp runtime
Added slot context erase support for SMT multimodal server workloads to improve long-running service management

Model Support

FastVLM-0.5B
Qwen3-VL-30B-A3B
Qwen3.5-VL-0.8B
Qwen3.5-VL-2B
Qwen3.5-VL-4B
Qwen3-ASR-0.6B
PaddleOCR-VL-0.9B

What's Changed

feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
Add SpacemiT MTMD build workflow and documentation by @co-seven in #2

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.0

Contributors

co-seven

Assets 3

17 Oct 04:07

github-actions

b6783

ceff6bb

b6783

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <gitty@example.com>

Assets 15

23 Sep 08:59

github-actions

b6556

264f1b5

b6556

zdnn: refactor codebase + add docs (#16178)

* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Assets 15

19 Sep 04:49

github-actions

b6517

69ffd89

b6517

ggml-amx : fix ggml_amx_init() on generic Linux (#16049)

Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.

Without this commit, the code compiles (with warnings) but fails:

    register_backend: registered backend CPU (1 devices)
    register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
    build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
    system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
    ....
    print_info: n_ctx_orig_yarn  = 262144
    print_info: rope_finetuned   = unknown
    print_info: model type       = 4B
    Illegal instruction (core dumped)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 15

18 Sep 03:31

github-actions

b6503

62c3b64

b6503

CANN: Remove print (#16044)

Signed-off-by: noemotiovon <757486878@qq.com>

Assets 15

18 Aug 12:13

github-actions

b6192

618575c

b6192

Fix broken build: require updated pip to support --break-system-packa…

Assets 15

13 Aug 07:23

github-actions

b6141

e71d48e

b6141

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …

Assets 15

12 Aug 04:58

github-actions

b6135

25ff6f7

b6135

musa: fix failures in test-backend-ops for mul_mat_id op (#15236)

* musa: fix failures in test-backend-ops for mul_mat_id op

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Assets 15

25 Jul 09:44

github-actions

b5986

c12bbde

b5986

sched : fix multiple evaluations of the same graph with pipeline para…

Assets 15

Releases: spacemit-com/llama.cpp

v0.1.1

Bugfix

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

Feature

Functional Support

Model Support

What's Changed

Contributors

Uh oh!

b6783

Uh oh!

b6556

Uh oh!

b6517

Uh oh!

b6503

Uh oh!

b6192

Uh oh!

b6141

Uh oh!

b6135

Uh oh!

b5986

Uh oh!