server : hint preserve_thinking when supported by chat template by ggerganov · Pull Request #25079 · ggml-org/llama.cpp

ggerganov · 2026-06-27T14:40:02Z

Overview

Print a hint to enable preserve_thinking kwarg when the chat template supports it.

# llama serve -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q4_K_M
0.00.571.359 I cmn  common_param: common_params_print_info: verbosity = 3 (adjust with the `-lv N` CLI arg)
0.00.573.173 I srv    load_model: loading model 'unsloth/Qwen3.6-27B-MTP-GGUF:Q4_K_M'
0.08.345.590 I srv    load_model: initializing, n_slots = 4, n_ctx_slot = 131072, kv_unified = 'true'
0.08.378.594 W srv          init: chat template supports 'preserve_thinking' - consider using --chat-template-kwargs "{\"preserve_thinking\": true}" (ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking)
0.08.378.603 I srv  llama_server: model loaded
0.08.378.607 I srv  llama_server: listening on http://0.0.0.0:8013

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES. pi:llama.cpp/Qwen3.6-27B

ngxson · 2026-06-27T15:59:44Z

IMO it's could better be one of the jinja caps (to make it more generic), although I'm not sure how other templates support this function (i.e. do they use the same preserve_thinking, or another mechanism?), cc @pwilkin @aldehir if you have any insights on this

aldehir · 2026-06-27T19:09:52Z

A cap seems more applicable for consistency but I think either way works for something this simple.

As far as other templates go, I believe only Qwen 3.6 supports this for now. There's no competing variables to enable this.

ngxson · 2026-06-27T21:32:22Z

@aldehir I'm thinking about a more broader case where we can somehow make this compatible with other templates. It seems like most templates only preserve reasoning_content for last assistant message, not the whole history. We may need a hack to make it work (I'm investigating that)

I think such feature would still be quite useful. From time to time I've seen issues asking for such feature

Update: seems like GLM-4.7 has clear_thinking that is the opposite of preserve_thinking

aldehir · 2026-06-28T00:00:36Z

I see. We can generalize this feature with capabilities, i.e. storing the field name and normalizing the value (clear_thinking requires false while preserve_thinking needs true).

That said, not sure how easy it would be to generalize implementation for templates that don't natively support this. I also don't know the impact since they are likely not trained to retain thinking.

ngxson · 2026-06-28T00:10:22Z

I scanned through templates inside models/templates and realize that most models check for last_user_index or last_user_idx to remove the reasoning content, so I'm quite confident that the solution could be quite simple: always force last_user_index=0 and ignore value from set statement. (So, we need to add a notion of "read-only" key)

That being said, it's still technically a hack, but just not a (too) messy one. Will try to push a PoC tomorrow to see how it goes.

Detect if the chat template supports the 'preserve_thinking' kwarg (by checking for its presence in the template source) and print a hint suggesting users enable it via --chat-template-kwargs. This is particularly useful for models like Qwen3.6 where preserve_thinking is recommended but many users are unaware of the option. ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking Assisted-by: pi:llama.cpp/Qwen3.6-27B

Print a hint to enable preserve_thinking kwarg when the template supports it. ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking Assisted-by: pi:llama.cpp/Qwen3.6-27B

ngxson · 2026-06-28T15:43:41Z

Hmm ok so hacking it is more complicated than I though, so I ended up abandon it. My idea was to record any if_statement and try to flip them to see which one control the reasoning output, then force them to true later.

In anyway, I added #25105 that simply translate a generic --reasoning-preserve flag into model-specific flag, I've found 3 of them:

preserve_thinking
clear_thinking (GLM-4.7)
truncate_history_thinking (NVIDIA-Nemotron-3-Nano-30B-A3B-BF16)

github-actions Bot added the server label Jun 27, 2026

ggerganov force-pushed the gg/preserve-thinking-hint branch from aec9522 to eae7149 Compare June 27, 2026 14:41

ggerganov added 3 commits June 28, 2026 08:53

server : hint preserve_thinking when supported by chat template

84ff9ef

Print a hint to enable preserve_thinking kwarg when the template supports it. ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking Assisted-by: pi:llama.cpp/Qwen3.6-27B

server : hint preserve_thinking when supported by chat template

adba174

Print a hint to enable preserve_thinking kwarg when the template supports it. ref: https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking Assisted-by: pi:llama.cpp/Qwen3.6-27B

ggerganov force-pushed the gg/preserve-thinking-hint branch from 8cb25f9 to adba174 Compare June 28, 2026 05:55

ngxson mentioned this pull request Jun 28, 2026

jinja, chat: add --reasoning-preserve flag #25105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : hint preserve_thinking when supported by chat template#25079

server : hint preserve_thinking when supported by chat template#25079
ggerganov wants to merge 3 commits into
masterfrom
gg/preserve-thinking-hint

ggerganov commented Jun 27, 2026 •

edited

Loading

Uh oh!

ngxson commented Jun 27, 2026 •

edited

Loading

Uh oh!

aldehir commented Jun 27, 2026

Uh oh!

ngxson commented Jun 27, 2026 •

edited

Loading

Uh oh!

aldehir commented Jun 28, 2026

Uh oh!

ngxson commented Jun 28, 2026 •

edited

Loading

Uh oh!

ngxson commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ggerganov commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

ngxson commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Jun 27, 2026

Uh oh!

ngxson commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Jun 28, 2026

Uh oh!

ngxson commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Jun 27, 2026 •

edited

Loading

ngxson commented Jun 27, 2026 •

edited

Loading

ngxson commented Jun 27, 2026 •

edited

Loading

ngxson commented Jun 28, 2026 •

edited

Loading