openai: sanitize NUL/C0 bytes from reasoning_content (carved from #41) by n-WN · Pull Request #114 · berabuddies/puffer

n-WN · 2026-05-13T14:13:21Z

Summary

Filter NUL (\x00) and other C0 control bytes from extracted reasoning_content so the multi-turn round-trip stays clean. \t, \n, \r are preserved; DEL is dropped.
Applies to both extraction sources: the dedicated reasoning_content / reasoning field, and the <think>…</think> fallback inside content.
Exports sanitize_reasoning_text so other crates (request serializers, session-store writers) can apply the same filter when data arrives via a path other than extract_chat_completions_reasoning.

Motivation

Kimi K2.6 has been observed emitting a stray \x00 inside its own reasoning_content chain-of-thought, then refusing the same string on replay the next turn with:

HTTP 400 "the reasoning_content at position N must be a valid UTF-8 string: string contains \x00"

This is the same fix that shipped in the v16 puffer-kimi binary as part of the honest TB2 run (24/89 PASS) — carved out here to land standalone, since the original #41 branch bundles kimi-oauth credentials that don't belong on master.

Traced to qemu-startup retry-03 failures on the honest Kimi TB2 run, where every retry 400'd on replay until this sanitizer was added to the build.

Test plan

cargo test --package puffer-provider-openai --test reasoning_extraction — 9 tests pass (7 existing + 2 new sanitizer tests + 1 unit test for sanitize_reasoning_text directly)
cargo check --package puffer-provider-openai --package puffer-core — clean
New test strips_nul_byte_from_reasoning_content confirms NUL is removed while \t and \n survive
New test strips_control_bytes_from_think_block confirms the <think> fallback path is also sanitized
New test sanitize_preserves_whitespace_and_strips_del documents the exact filter contract

Kimi K2.6 has been observed emitting a stray \x00 mid-`reasoning_content` chain-of-thought and then refusing the same string on replay with HTTP 400 "the reasoning_content at position N must be a valid UTF-8 string: string contains \x00". Filter NUL and other C0 control bytes (\t \n \r preserved, DEL dropped) at extraction time so the multi-turn round-trip stays clean for every downstream consumer (request serializer, session_store, trace emitter) without per-call-site guards. The sanitizer is applied to both extraction sources: - the dedicated `reasoning_content` / `reasoning` field - the `<think>…</think>` fallback inside `content` Originally bundled with the empty-turn-synth work in #41; carving it out so the clean v16 puffer fix can land independently of the kimi-oauth credentials in that branch. Traced to qemu-startup retry-03 on the honest Kimi TB2 run, which 400'd on replay until the same patch was applied to the v16 binary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai: sanitize NUL/C0 bytes from reasoning_content (carved from #41)#114

openai: sanitize NUL/C0 bytes from reasoning_content (carved from #41)#114
n-WN wants to merge 1 commit into
masterfrom
n-WN/reasoning-control-byte-sanitize

n-WN commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

n-WN commented May 13, 2026

Summary

Motivation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant