Skip to content

Releases: spacemit-com/llama.cpp

v0.1.1

22 May 06:33
17ce6aa

Choose a tag to compare

Bugfix

Fixed llama-server slot erase behavior for SMT multimodal models.

This update allows multimodal SMT backends to use the /slots/{id}?action=erase API to clear slot context correctly. The change is intended for long-running service scenarios where prompt/KV state must be reset between requests. Slot save and restore
restrictions for multimodal mode remain unchanged.

What's Changed

  • feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
  • Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
  • server: allow slot erase for SMT multimodal requests by @co-seven in #3
  • version: upgrading version number by @co-seven in #4

New Contributors

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.1

v0.1.0

20 May 06:26
f798b42

Choose a tag to compare

Feature

This release adds SMT backend support for llama-mtmd, enabling multimodal inference on SpacemiT platforms through a unified llama.cpp integration.

Functional Support

  • Added SMT backend integration for llama-mtmd
  • Added multimodal inference support in llama-server
  • Added end-to-end support for vision-language model inference
  • Added end-to-end support for speech recognition model inference
  • Added end-to-end support for OCR and document understanding model inference
  • Unified model serving flow for SMT-backed multimodal models within the llama.cpp runtime
  • Added slot context erase support for SMT multimodal server workloads to improve long-running service management

Model Support

  • FastVLM-0.5B
  • Qwen3-VL-30B-A3B
  • Qwen3.5-VL-0.8B
  • Qwen3.5-VL-2B
  • Qwen3.5-VL-4B
  • Qwen3-ASR-0.6B
  • PaddleOCR-VL-0.9B

What's Changed

  • feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
  • Add SpacemiT MTMD build workflow and documentation by @co-seven in #2

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.0

b6783

17 Oct 04:07
ceff6bb

Choose a tag to compare

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <gitty@example.com>

b6556

23 Sep 08:59
264f1b5

Choose a tag to compare

zdnn: refactor codebase + add docs (#16178)

* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

b6517

19 Sep 04:49
69ffd89

Choose a tag to compare

ggml-amx : fix ggml_amx_init() on generic Linux (#16049)

Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.

Without this commit, the code compiles (with warnings) but fails:

    register_backend: registered backend CPU (1 devices)
    register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
    build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
    system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
    ....
    print_info: n_ctx_orig_yarn  = 262144
    print_info: rope_finetuned   = unknown
    print_info: model type       = 4B
    Illegal instruction (core dumped)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

b6503

18 Sep 03:31
62c3b64

Choose a tag to compare

CANN: Remove print (#16044)

Signed-off-by: noemotiovon <757486878@qq.com>

b6192

18 Aug 12:13
618575c

Choose a tag to compare

Fix broken build: require updated pip to support --break-system-packa…

b6141

13 Aug 07:23
e71d48e

Choose a tag to compare

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …

b6135

12 Aug 04:58
25ff6f7

Choose a tag to compare

musa: fix failures in test-backend-ops for mul_mat_id op (#15236)

* musa: fix failures in test-backend-ops for mul_mat_id op

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

b5986

25 Jul 09:44
c12bbde

Choose a tag to compare

sched : fix multiple evaluations of the same graph with pipeline para…