Releases: spacemit-com/llama.cpp
Releases · spacemit-com/llama.cpp
v0.1.1
Bugfix
Fixed llama-server slot erase behavior for SMT multimodal models.
This update allows multimodal SMT backends to use the /slots/{id}?action=erase API to clear slot context correctly. The change is intended for long-running service scenarios where prompt/KV state must be reset between requests. Slot save and restore
restrictions for multimodal mode remain unchanged.
What's Changed
- feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
- Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
- server: allow slot erase for SMT multimodal requests by @co-seven in #3
- version: upgrading version number by @co-seven in #4
New Contributors
Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.1
v0.1.0
Feature
This release adds SMT backend support for llama-mtmd, enabling multimodal inference on SpacemiT platforms through a unified llama.cpp integration.
Functional Support
- Added SMT backend integration for
llama-mtmd - Added multimodal inference support in
llama-server - Added end-to-end support for vision-language model inference
- Added end-to-end support for speech recognition model inference
- Added end-to-end support for OCR and document understanding model inference
- Unified model serving flow for SMT-backed multimodal models within the
llama.cppruntime - Added slot context erase support for SMT multimodal server workloads to improve long-running service management
Model Support
- FastVLM-0.5B
- Qwen3-VL-30B-A3B
- Qwen3.5-VL-0.8B
- Qwen3.5-VL-2B
- Qwen3.5-VL-4B
- Qwen3-ASR-0.6B
- PaddleOCR-VL-0.9B
What's Changed
- feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
- Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.0
b6783
SYCL SET operator optimized for F32 tensors (#16350) * SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes * sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups * move SET op to standalone file, GPU-only implementation * Update SYCL SET operator for F32 * ci: fix editorconfig issues (LF endings, trailing spaces, final newline) * fixed ggml-sycl.cpp --------- Co-authored-by: Gitty Burstein <gitty@example.com>
b6556
zdnn: refactor codebase + add docs (#16178) * zdnn: initial matmul refactor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add zDNN docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
b6517
ggml-amx : fix ggml_amx_init() on generic Linux (#16049)
Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.
Without this commit, the code compiles (with warnings) but fails:
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
....
print_info: n_ctx_orig_yarn = 262144
print_info: rope_finetuned = unknown
print_info: model type = 4B
Illegal instruction (core dumped)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
b6503
CANN: Remove print (#16044) Signed-off-by: noemotiovon <757486878@qq.com>
b6192
Fix broken build: require updated pip to support --break-system-packa…
b6141
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …
b6135
musa: fix failures in test-backend-ops for mul_mat_id op (#15236) * musa: fix failures in test-backend-ops for mul_mat_id op Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b5986
sched : fix multiple evaluations of the same graph with pipeline para…