server : fix 501 on multimodal models blocking text-only slot save/restore (#21133) by CHIPMUNK-T0T · Pull Request #25076 · ggml-org/llama.cpp

CHIPMUNK-T0T · 2026-06-27T11:04:11Z

Overview

When I used a multimodal model (e.g. Qwen3.5) and loaded it with an mmproj, the llama server returned HTTP 501 unconditionally even for text-only conversations, because the check looked at model capability (mctx) rather than the slot's actual content.
As a result, the prefill cache could not be stored or reused even for text-only conversations, which leads to slow TTFT for long prompts.
The affected operations were /slots save / restore / erase on a server with --mmproj loaded. These now gate on the slot's content (has_media()): a text-only slot is allowed, and only a slot that holds media is rejected.

Additional information

This PR helps downstream consumers (e.g. Ollama) that reuse a prefilled cache across requests via slot save/restore on a multimodal model, which was previously blocked whenever an mmproj was loaded.
Only the text-only case is newly allowed; all other behavior is unchanged — slots that actually hold media are still rejected, and text-only (non-multimodal) servers are unaffected.

Notes:

The restore path keeps using the existing llama_state_seq_* format. Since that format cannot store media chunks, a restored slot is always text-only, so no media gate is needed there.
On save, get_text_tokens() is used instead of get_tokens() (which asserts !has_mtmd); the slot is confirmed media-free first.
Serializing image/audio chunks is out of scope here and could be a follow-up.

Testing:

New tests on a multimodal model (tinygemma3): text-only save / restore / erase succeed, while saving a slot that holds an image is rejected with HTTP 501. The existing text-only tests are unchanged.
Local run: clean build (-j8), test_slot_save.py passing 5/5.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes
- The original motivation came from me. I used ollama to more fast decode/ttft, and I had AI aided (Codex and Claude Code) investigate and assess its impact/scope, but I made the final decision on every implementation change myself.
- I have reviewed all changes, understand them fully, and can explain any line without AI assistance.

ISSUE #21133

ggml-gh-bot · 2026-06-27T11:08:29Z

Hi @CHIPMUNK-T0T, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

CHIPMUNK-T0T · 2026-06-27T13:35:11Z

Thanks for the notice.

To clarify my AI usage: I am a native Japanese speaker, so I wrote the original explanation and reasoning in Japanese and used AI assistance to translate and polish the English text. The design, implementation decisions, final patch, and tests are mine. I reviewed the final English description and code myself, and I can explain and maintain the changes.

If this still does not satisfy the project policy, I am happy to shorten or rewrite the PR description in simpler wording.

fix 501 on multimodal models blocking text-only slot save/restore

09349e7

CHIPMUNK-T0T requested a review from a team as a code owner June 27, 2026 11:04

github-actions Bot added the server label Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : fix 501 on multimodal models blocking text-only slot save/restore (#21133)#25076

server : fix 501 on multimodal models blocking text-only slot save/restore (#21133)#25076
CHIPMUNK-T0T wants to merge 1 commit into
ggml-org:masterfrom
CHIPMUNK-T0T:feat/mtmd-slot-save-restore

CHIPMUNK-T0T commented Jun 27, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot Bot commented Jun 27, 2026

Uh oh!

CHIPMUNK-T0T commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

CHIPMUNK-T0T commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Notes:

Testing:

Requirements

Uh oh!

ggml-gh-bot Bot commented Jun 27, 2026

Uh oh!

CHIPMUNK-T0T commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CHIPMUNK-T0T commented Jun 27, 2026 •

edited

Loading