Commit 53386d2
fix(server): revert Qwen3.5 think block injection (3/7 → 5/7)
The official enable_thinking=False method (injecting <think></think>
in ChatML) made RLV Acme results WORSE:
With injection: 3/7
Without (logit suppression only): 5/7
The <think></think> block in the prompt confused the model's
response pattern, causing it to output document sections instead
of extracted answers.
quant.h's logit suppression (ba8a615) is the correct approach:
it prevents thinking mode without altering the prompt structure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent ba8a615 commit 53386d2
1 file changed
+6
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
109 | 113 | | |
| 114 | + | |
110 | 115 | | |
111 | 116 | | |
112 | 117 | | |
| |||
0 commit comments