openai: sanitize NUL/C0 bytes from reasoning_content (carved from #41)#114
Open
n-WN wants to merge 1 commit into
Open
openai: sanitize NUL/C0 bytes from reasoning_content (carved from #41)#114n-WN wants to merge 1 commit into
n-WN wants to merge 1 commit into
Conversation
Kimi K2.6 has been observed emitting a stray \x00 mid-`reasoning_content` chain-of-thought and then refusing the same string on replay with HTTP 400 "the reasoning_content at position N must be a valid UTF-8 string: string contains \x00". Filter NUL and other C0 control bytes (\t \n \r preserved, DEL dropped) at extraction time so the multi-turn round-trip stays clean for every downstream consumer (request serializer, session_store, trace emitter) without per-call-site guards. The sanitizer is applied to both extraction sources: - the dedicated `reasoning_content` / `reasoning` field - the `<think>…</think>` fallback inside `content` Originally bundled with the empty-turn-synth work in #41; carving it out so the clean v16 puffer fix can land independently of the kimi-oauth credentials in that branch. Traced to qemu-startup retry-03 on the honest Kimi TB2 run, which 400'd on replay until the same patch was applied to the v16 binary.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\x00) and other C0 control bytes from extractedreasoning_contentso the multi-turn round-trip stays clean.\t,\n,\rare preserved; DEL is dropped.reasoning_content/reasoningfield, and the<think>…</think>fallback insidecontent.sanitize_reasoning_textso other crates (request serializers, session-store writers) can apply the same filter when data arrives via a path other thanextract_chat_completions_reasoning.Motivation
Kimi K2.6 has been observed emitting a stray
\x00inside its ownreasoning_contentchain-of-thought, then refusing the same string on replay the next turn with:This is the same fix that shipped in the v16 puffer-kimi binary as part of the honest TB2 run (24/89 PASS) — carved out here to land standalone, since the original #41 branch bundles kimi-oauth credentials that don't belong on master.
Traced to
qemu-startup retry-03failures on the honest Kimi TB2 run, where every retry 400'd on replay until this sanitizer was added to the build.Test plan
cargo test --package puffer-provider-openai --test reasoning_extraction— 9 tests pass (7 existing + 2 new sanitizer tests + 1 unit test forsanitize_reasoning_textdirectly)cargo check --package puffer-provider-openai --package puffer-core— cleanstrips_nul_byte_from_reasoning_contentconfirms NUL is removed while\tand\nsurvivestrips_control_bytes_from_think_blockconfirms the<think>fallback path is also sanitizedsanitize_preserves_whitespace_and_strips_deldocuments the exact filter contract