server : fix 501 on multimodal models blocking text-only slot save/restore (#21133)#25076
server : fix 501 on multimodal models blocking text-only slot save/restore (#21133)#25076CHIPMUNK-T0T wants to merge 1 commit into
Conversation
|
Hi @CHIPMUNK-T0T, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
|
Thanks for the notice. To clarify my AI usage: I am a native Japanese speaker, so I wrote the original explanation and reasoning in Japanese and used AI assistance to translate and polish the English text. The design, implementation decisions, final patch, and tests are mine. I reviewed the final English description and code myself, and I can explain and maintain the changes. If this still does not satisfy the project policy, I am happy to shorten or rewrite the PR description in simpler wording. |
Overview
When I used a multimodal model (e.g. Qwen3.5) and loaded it with an mmproj, the llama server returned HTTP 501 unconditionally even for text-only conversations, because the check looked at model capability (mctx) rather than the slot's actual content.
As a result, the prefill cache could not be stored or reused even for text-only conversations, which leads to slow TTFT for long prompts.
The affected operations were /slots save / restore / erase on a server with --mmproj loaded. These now gate on the slot's content (has_media()): a text-only slot is allowed, and only a slot that holds media is rejected.
Additional information
This PR helps downstream consumers (e.g. Ollama) that reuse a prefilled cache across requests via slot save/restore on a multimodal model, which was previously blocked whenever an mmproj was loaded.
Only the text-only case is newly allowed; all other behavior is unchanged — slots that actually hold media are still rejected, and text-only (non-multimodal) servers are unaffected.
Notes:
Testing:
Requirements
ISSUE #21133