Strip duplicate channel markers from message headers by bbrowning · Pull Request #97 · openai/harmony

bbrowning · 2026-02-10T16:46:10Z

gpt-oss models occasionally emit extra <|channel|> markers in message headers during deep multi-turn conversations. Instead of erroring, the parser now strips all duplicate channel markers after extracting the first one, using the same boundary logic (stop at whitespace or '<') that is used when extracting the first channel marker.

This also marks parse_header_from_string as pub(crate) to allow for direct unit testing, and adds tests of the previous and new behavior to make sure that channels are extracted properly.

I have personally observed this in multiple real-world scenarios with vLLM and gpt-oss-20b specifically. gpt-oss-120b appears far less prone to this type of failure in following its format. See vllm-project/vllm#32587 and vllm-project/vllm#31677 for vLLM community reports of this as well.

This still takes the first channel marker in the header as before. Any subsequent ones are dropped vs replacing the first we saw. This was preferred to throwing an error because in the vLLM code paths here we're handling the models generated response so anything we can clean up or fix in real-world inference scenarios without ambiguity or loss seem like easy wins to improve the reliability of the models in the wild.

gpt-oss models occasionally emit extra <|channel|> markers in message headers during deep multi-turn conversations. Instead of erroring, the parser now strips all duplicate channel markers after extracting the first one, using the same boundary logic (stop at whitespace or '<') that is used when extracting the first channel marker. This also marks `parse_header_from_string` as `pub(crate)` to allow for direct unit testing, and adds tests of the previous and new behavior to make sure that channels are extracted properly.

neon12345 · 2026-02-18T06:11:45Z

I tried this patch and still get errors from wrong tool call parsing. There are more open pull requests regarding the harmony tool call parsing in the vllm repo. Perhaps it is better to make a combined effort towards a more robust parser?

bbrowning · 2026-02-24T20:30:00Z

@neon12345 This change is focused on fixing one targeted failure mode I've observed with gpt-oss-20b in multi-turn scenarios with vLLM. However, you're right that it doesn't fix all potential tool call parsing scenarios. If you want to open an issue in this repository (if you're hitting Harmony errors) or in the vllm repository (if you're getting generic vllm issues with gpt-oss models), I'm happy to take a look and see if we can track down the root cause of that.

So, I'm +1 to making parsing more robust. But, I also don't want larger and longer-term plans around the entire strategy and how vLLM integrates with this library to delay shipping targeted fixes that impact users today.

We're seeing a lot of failures with gpt-oss-20b and gpt-oss-120b models in vLLM where we get <|channel|> markers leaking into tool call names and the model outputting reasoning after <|constrain|> tokens in very long multi-turn conversations. These are fairly trivial to reproduce with gpt-oss-20b using BFCL multi_turn_base as a simple eval, and occasionally show up with gpt-oss-120b as well. In SWE Bench Verified with gpt-oss-120b, this shows up a lot more. Here's an example of the kind of errors we see across 500 tests cases with SWE Bench Verified and gpt-oss-120b: ``` HarmonyErrors (special token leaks in message parsing): 122x Unknown role: assistant<|channel|>commentary 25x Unknown role: assistant<|channel|>bash<|channel|>commentary 24x unexpected tokens remaining in message header: Some("to=functions.bash") 3x Unknown role: final 2x Unknown role: assistant<|channel|>bash<|channel|>analysis 2x unexpected tokens remaining in message header: Some("<|constrain|>commentary") 2x Unknown role: assistant<|channel|>analysis 1x unexpected tokens remaining in message header: Some("<|constrain|>control<|end|><|start|>assistant<|channel|>commentary" 1x unexpected tokens remaining in message header: Some("bash") 1x unexpected tokens remaining in message header: Some("to=functions.bash # We'll create conftest.py with needed code<|end| 1x unexpected tokens remaining in message header: Some("to=functions.bash (We attempt to") 1x unexpected tokens remaining in message header: Some("<|constrain|>Complete_TASK_AND_SUBMIT_FINAL_OUTPUT &&") 1x could not decode header 1x Unknown role: assistant<|channel|>bash<|channel|>commentary,json 1x unexpected tokens remaining in message header: Some("to=functions.bash\"I'm thinking the repository is not trusted. We'l 1x unexpected tokens remaining in message header: Some("to=functions.bash by using git checkout...<|end|><|start|>assistant 1x unexpected tokens remaining in message header: Some("to=functions.bash as<|end|><|start|>assistant<|channel|>commentary" ``` Through experimentation and closely monitoring what the models output versus the tokens we send back in as input, I discovered two important things. First, for tool calls output to the commentary channel, the models always output the recipient and channel order backwards from what encoding.rs was doing. So, this swaps the order of the channel and recipient when outputting tool calls to the commentary channel to match what the model does, which presumably is a closer match to the training data. In anecdotal testing this seems to greatly reduce if not eliminate all the extra <|channel|> markers we were seeing in the model output. So, this change may obviate the need for openai#97. Second, we caught examples of the model emitting reasoning text after a <|constrain|> token in long multi-turn conversations. A real-world example of that we saw: `<|start|>assistant<|channel|>commentary to=functions.echo<|constrain|>2.0? Actually we need to write content. Use echo.<|end|><|start|>assistant<|channel|>commentary to=functions.echo<|constrain|>json<|message|>{"content": "2.0000", "file_name": "result.txt"}<|call|>`. To fix this, instead of ever passing in a bare content-type (such as `json`), this PR always emits the `<|constrain|>` special token before that content type. In long multi-turn scenarios this appears to solve the problem of the model incorrectly reasoning after that token. This is perhaps also partially a side-effect of roundtripping this model's outputs via APIs that have no provision to include a "content type" that the client can send back in, so we loose the details of whether the model output a `json` or `<|constrain|>json` and have to reconstruct as best we can on the inference side. We'll do some more testing on our side, but I wanted to go ahead and open this PR to get some eyes on the issue and any feedback about these fixes.

We're seeing a lot of failures with gpt-oss-20b and gpt-oss-120b models in vLLM where we get <|channel|> markers leaking into tool call names and the model outputting reasoning after <|constrain|> tokens in very long multi-turn conversations. These are fairly trivial to reproduce with gpt-oss-20b using BFCL multi_turn_base as a simple eval, and occasionally show up with gpt-oss-120b as well. In SWE Bench Verified with gpt-oss-120b, this shows up a lot more. Here's an example of the kind of errors we see across 500 tests cases with SWE Bench Verified and gpt-oss-120b: ``` HarmonyErrors (special token leaks in message parsing): 122x Unknown role: assistant<|channel|>commentary 25x Unknown role: assistant<|channel|>bash<|channel|>commentary 24x unexpected tokens remaining in message header: Some("to=functions.bash") 3x Unknown role: final 2x Unknown role: assistant<|channel|>bash<|channel|>analysis 2x unexpected tokens remaining in message header: Some("<|constrain|>commentary") 2x Unknown role: assistant<|channel|>analysis 1x unexpected tokens remaining in message header: Some("<|constrain|>control<|end|><|start|>assistant<|channel|>commentary" 1x unexpected tokens remaining in message header: Some("bash") 1x unexpected tokens remaining in message header: Some("to=functions.bash # We'll create conftest.py with needed code<|end| 1x unexpected tokens remaining in message header: Some("to=functions.bash (We attempt to") 1x unexpected tokens remaining in message header: Some("<|constrain|>Complete_TASK_AND_SUBMIT_FINAL_OUTPUT &&") 1x could not decode header 1x Unknown role: assistant<|channel|>bash<|channel|>commentary,json 1x unexpected tokens remaining in message header: Some("to=functions.bash\"I'm thinking the repository is not trusted. We'l 1x unexpected tokens remaining in message header: Some("to=functions.bash by using git checkout...<|end|><|start|>assistant 1x unexpected tokens remaining in message header: Some("to=functions.bash as<|end|><|start|>assistant<|channel|>commentary" ``` Through experimentation and closely monitoring what the models output versus the tokens we send back in as input, I discovered two important things. First, for tool calls output to the commentary channel, the models always output the recipient and channel order backwards from what encoding.rs was doing. So, this swaps the order of the channel and recipient when outputting tool calls to the commentary channel to match what the model does, which presumably is a closer match to the training data. In anecdotal testing this seems to greatly reduce if not eliminate all the extra <|channel|> markers we were seeing in the model output. So, this change may obviate the need for openai#97. Second, we caught examples of the model emitting reasoning text after a <|constrain|> token in long multi-turn conversations. A real-world example of that we saw: `<|start|>assistant<|channel|>commentary to=functions.echo<|constrain|>2.0? Actually we need to write content. Use echo.<|end|><|start|>assistant<|channel|>commentary to=functions.echo<|constrain|>json<|message|>{"content": "2.0000", "file_name": "result.txt"}<|call|>`. To fix this, instead of ever passing in a bare content-type (such as `json`), this PR always emits the `<|constrain|>` special token before that content type. In long multi-turn scenarios this appears to solve the problem of the model incorrectly reasoning after that token. This is perhaps also partially a side-effect of roundtripping this model's outputs via APIs that have no provision to include a "content type" that the client can send back in, so we loose the details of whether the model output a `json` or `<|constrain|>json` and have to reconstruct as best we can on the inference side. We'll do some more testing on our side, but I wanted to go ahead and open this PR to get some eyes on the issue and any feedback about these fixes. Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning mentioned this pull request Feb 10, 2026

[Bugfix] Sanitize malformed tool call recipients in Harmony parser vllm-project/vllm#31677

Open

5 tasks

bbrowning mentioned this pull request Apr 3, 2026

[Bug]: gpt-oss special tokens leak into tool names and responses despite reasoning_parser: "openai_gptoss" vllm-project/vllm#32587

Open

1 task

bbrowning mentioned this pull request Apr 17, 2026

Fix encoding of channel, recipient, and constrain #114

Open

bbrowning mentioned this pull request May 13, 2026

Fix encoding of channel, recipient, and constrain oss-harmony/harmony#6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strip duplicate channel markers from message headers#97

Strip duplicate channel markers from message headers#97
bbrowning wants to merge 1 commit into
openai:mainfrom
bbrowning:strip-extra-channels

bbrowning commented Feb 10, 2026

Uh oh!

neon12345 commented Feb 18, 2026

Uh oh!

bbrowning commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bbrowning commented Feb 10, 2026

Uh oh!

neon12345 commented Feb 18, 2026

Uh oh!

bbrowning commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants