Skip to content

server : fix 501 on multimodal models blocking text-only slot save/restore (#21133)#25076

Open
CHIPMUNK-T0T wants to merge 1 commit into
ggml-org:masterfrom
CHIPMUNK-T0T:feat/mtmd-slot-save-restore
Open

server : fix 501 on multimodal models blocking text-only slot save/restore (#21133)#25076
CHIPMUNK-T0T wants to merge 1 commit into
ggml-org:masterfrom
CHIPMUNK-T0T:feat/mtmd-slot-save-restore

Conversation

@CHIPMUNK-T0T

@CHIPMUNK-T0T CHIPMUNK-T0T commented Jun 27, 2026

Copy link
Copy Markdown

Overview

When I used a multimodal model (e.g. Qwen3.5) and loaded it with an mmproj, the llama server returned HTTP 501 unconditionally even for text-only conversations, because the check looked at model capability (mctx) rather than the slot's actual content.
As a result, the prefill cache could not be stored or reused even for text-only conversations, which leads to slow TTFT for long prompts.
The affected operations were /slots save / restore / erase on a server with --mmproj loaded. These now gate on the slot's content (has_media()): a text-only slot is allowed, and only a slot that holds media is rejected.

Additional information

This PR helps downstream consumers (e.g. Ollama) that reuse a prefilled cache across requests via slot save/restore on a multimodal model, which was previously blocked whenever an mmproj was loaded.
Only the text-only case is newly allowed; all other behavior is unchanged — slots that actually hold media are still rejected, and text-only (non-multimodal) servers are unaffected.

Notes:

  • The restore path keeps using the existing llama_state_seq_* format. Since that format cannot store media chunks, a restored slot is always text-only, so no media gate is needed there.
  • On save, get_text_tokens() is used instead of get_tokens() (which asserts !has_mtmd); the slot is confirmed media-free first.
  • Serializing image/audio chunks is out of scope here and could be a follow-up.

Testing:

  • New tests on a multimodal model (tinygemma3): text-only save / restore / erase succeed, while saving a slot that holds an image is rejected with HTTP 501. The existing text-only tests are unchanged.
  • Local run: clean build (-j8), test_slot_save.py passing 5/5.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes
    • The original motivation came from me. I used ollama to more fast decode/ttft, and I had AI aided (Codex and Claude Code) investigate and assess its impact/scope, but I made the final decision on every implementation change myself.
    • I have reviewed all changes, understand them fully, and can explain any line without AI assistance.

ISSUE #21133

@CHIPMUNK-T0T CHIPMUNK-T0T requested a review from a team as a code owner June 27, 2026 11:04
@ggml-gh-bot

ggml-gh-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

Hi @CHIPMUNK-T0T, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@CHIPMUNK-T0T

Copy link
Copy Markdown
Author

Thanks for the notice.

To clarify my AI usage: I am a native Japanese speaker, so I wrote the original explanation and reasoning in Japanese and used AI assistance to translate and polish the English text. The design, implementation decisions, final patch, and tests are mine. I reviewed the final English description and code myself, and I can explain and maintain the changes.

If this still does not satisfy the project policy, I am happy to shorten or rewrite the PR description in simpler wording.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant