server : hint preserve_thinking when supported by chat template#25079
server : hint preserve_thinking when supported by chat template#25079ggerganov wants to merge 3 commits into
Conversation
aec9522 to
eae7149
Compare
|
A cap seems more applicable for consistency but I think either way works for something this simple. As far as other templates go, I believe only Qwen 3.6 supports this for now. There's no competing variables to enable this. |
|
@aldehir I'm thinking about a more broader case where we can somehow make this compatible with other templates. It seems like most templates only preserve I think such feature would still be quite useful. From time to time I've seen issues asking for such feature Update: seems like GLM-4.7 has |
|
I see. We can generalize this feature with capabilities, i.e. storing the field name and normalizing the value ( That said, not sure how easy it would be to generalize implementation for templates that don't natively support this. I also don't know the impact since they are likely not trained to retain thinking. |
|
I scanned through templates inside That being said, it's still technically a hack, but just not a (too) messy one. Will try to push a PoC tomorrow to see how it goes. |
Detect if the chat template supports the 'preserve_thinking' kwarg (by checking for its presence in the template source) and print a hint suggesting users enable it via --chat-template-kwargs. This is particularly useful for models like Qwen3.6 where preserve_thinking is recommended but many users are unaware of the option. ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking Assisted-by: pi:llama.cpp/Qwen3.6-27B
Print a hint to enable preserve_thinking kwarg when the template supports it. ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking Assisted-by: pi:llama.cpp/Qwen3.6-27B
Print a hint to enable preserve_thinking kwarg when the template supports it. ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking Assisted-by: pi:llama.cpp/Qwen3.6-27B
8cb25f9 to
adba174
Compare
|
Hmm ok so hacking it is more complicated than I though, so I ended up abandon it. My idea was to record any In anyway, I added #25105 that simply translate a generic
|
Overview
ref #24093 (comment)
Print a hint to enable
preserve_thinkingkwarg when the chat template supports it.Requirements