fix: Litellm preserve streamed reasoning deltas in LiteLLM adapter

lorenzbaraldi · GWeale · copybara-github · commit b9625bfd7092 · 2026-06-25T16:42:19.000-07:00
Merge #4952 **Please ensure you have read the [contribution guide](https://github.com/google/adk-python/blob/main/CONTRIBUTING.md) before creating a pull request.** ### Link to Issue or Description of Change **1. Link to an existing issue (if applicable):** **2. Or, if no issue exists, describe the change:** Fixes: #5645 **Problem:** In `LiteLlm` message conversion, reasoning parts were combined with newline injection: `reasoning_content = _NEW_LINE.join(text for text in reasoning_texts if text)` For providers that stream reasoning in delta fragments (for example, vLLM-style reasoning chunks), this mutates the original stream by inserting extra separators. The reconstructed reasoning can differ compared to provider output. **Solution:** Preserve reasoning text exactly as streamed by concatenating fragments without adding separators: `reasoning_content = "".join(text for text in reasoning_texts if text)` This avoids corruption of chunked reasoning while still preserving explicit newlines already present in fragments. Also added targeted regression tests to lock behavior: - `test_content_to_message_param_preserves_chunked_reasoning_deltas` - `test_content_to_message_param_preserves_reasoning_newlines` ### Testing Plan **Unit Tests:** - [x] I have added or updated unit tests for my change. - [x] All unit tests pass locally. Summary of local `pytest` runs: 1. `python -m pytest tests/unittests/models/test_litellm.py -k "content_to_message_param_assistant_thought_and_content_message or preserves_chunked_reasoning_deltas or preserves_reasoning_newlines"` - Result: `3 passed, 244 deselected` 2. `python -m pytest tests/unittests/models/test_litellm.py -k "preserves_chunked_reasoning_deltas or preserves_reasoning_newlines"` - Result: `2 passed, 245 deselected` **Manual End-to-End (E2E) Tests:** ### Checklist - [x] I have read the [CONTRIBUTING.md](https://github.com/google/adk-python/blob/main/CONTRIBUTING.md) document. - [x] I have performed a self-review of my own code. - [x] I have commented my code, particularly in hard-to-understand areas. - [x] I have added tests that prove my fix is effective or that my feature works. - [x] New and existing unit tests pass locally with my changes. - [ ] I have manually tested my changes end-to-end. - [ ] Any dependent changes have been merged and published in downstream modules. ### Additional context Scope is intentionally minimal and low risk: - 1-line behavior change in reasoning-content reconstruction. - 2 regression tests added. - Anthropic `thinking_blocks` path is unchanged. Co-authored-by: George Weale <gweale@google.com> COPYBARA_INTEGRATE_REVIEW=#4952 from lorenzbaraldi:fix/reasoning-accumulation 5a09d55 PiperOrigin-RevId: 938260836
diff --git a/src/google/adk/models/lite_llm.py b/src/google/adk/models/lite_llm.py
@@ -1030,7 +1030,9 @@ async def _content_to_message_param(
       ):
         reasoning_texts.append(_decode_inline_text_data(part.inline_data.data))
 
-    reasoning_content = _NEW_LINE.join(text for text in reasoning_texts if text)
+    # Preserve reasoning deltas exactly as received. Injecting separators
+    # between fragments can corrupt provider-streamed thinking text.
+    reasoning_content = "".join(text for text in reasoning_texts if text)
     return ChatCompletionAssistantMessage(
         role=role,
         content=final_content,
diff --git a/tests/unittests/models/test_litellm.py b/tests/unittests/models/test_litellm.py
@@ -2217,6 +2217,38 @@ async def test_content_to_message_param_assistant_thought_and_content_message():
   assert message["reasoning_content"] == "internal reasoning"
 
 
+@pytest.mark.asyncio
+async def test_content_to_message_param_preserves_chunked_reasoning_deltas():
+  thought_part_1 = types.Part.from_text(text="Hel")
+  thought_part_1.thought = True
+  thought_part_2 = types.Part.from_text(text="lo")
+  thought_part_2.thought = True
+  content = types.Content(
+      role="assistant", parts=[thought_part_1, thought_part_2]
+  )
+
+  message = await _content_to_message_param(content)
+
+  assert message["role"] == "assistant"
+  assert message["content"] is None
+  assert message["reasoning_content"] == "Hello"
+
+
+@pytest.mark.asyncio
+async def test_content_to_message_param_preserves_reasoning_newlines():
+  thought_part_1 = types.Part.from_text(text="line 1\n")
+  thought_part_1.thought = True
+  thought_part_2 = types.Part.from_text(text="line 2")
+  thought_part_2.thought = True
+  content = types.Content(
+      role="assistant", parts=[thought_part_1, thought_part_2]
+  )
+
+  message = await _content_to_message_param(content)
+
+  assert message["reasoning_content"] == "line 1\nline 2"
+
+
 @pytest.mark.asyncio
 async def test_content_to_message_param_function_call():
   content = types.Content(