Skip to content

feat(llm): add streaming tool call accumulation and LLMResponse parity#1789

Merged
Pouyanpi merged 4 commits intodevelopfrom
feat/langchain-decouple/stack-11-streaming-parity
Apr 16, 2026
Merged

feat(llm): add streaming tool call accumulation and LLMResponse parity#1789
Pouyanpi merged 4 commits intodevelopfrom
feat/langchain-decouple/stack-11-streaming-parity

Conversation

@Pouyanpi
Copy link
Copy Markdown
Collaborator

@Pouyanpi Pouyanpi commented Apr 14, 2026

Description

addresses #1760 (comment)

part of #1760

must be merged after #1773

Summary by CodeRabbit

  • New Features
    • Streaming responses now include complete metadata: reasoning chains, tool calls, model information, finish reasons, request IDs, and token usage statistics
    • Improved tool calling during streaming with better handling of fragmented tool call arguments across multiple chunks
    • Enhanced context preservation and metadata tracking during streaming operations

@Pouyanpi Pouyanpi self-assigned this Apr 14, 2026
@Pouyanpi Pouyanpi added the enhancement New feature or request label Apr 14, 2026
@Pouyanpi Pouyanpi added this to the v0.22.0 milestone Apr 14, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 14, 2026

Greptile Summary

This PR adds streaming tool call accumulation to LangChainLLMAdapter.stream_async and establishes parity between the streaming and non-streaming paths in _stream_llm_call (setting tool_calls_var, reasoning_trace_var, and llm_response_metadata_var). The addressed issues from previous review threads — stale tool_calls_var, missing reasoning_trace_var population, silent JSON parse failure, and missing test assertion — all appear resolved.

Confidence Score: 5/5

Safe to merge; all previously flagged P0/P1 issues are resolved and remaining findings are edge-case P2 suggestions.

All three previously-threaded issues (stale tool_calls_var, missing reasoning_trace_var, silent JSON failure) are addressed. The two remaining comments are P2: an edge-case where non-standard providers might not emit finish_reason 'tool_calls', and a documentation gap about the single-emission contract for delta_tool_calls. Neither blocks correctness for supported providers.

nemoguardrails/integrations/langchain/llm_adapter.py — tool call finalization guard at line 232

Important Files Changed

Filename Overview
nemoguardrails/actions/llm/utils.py Adds tool_calls_var, reasoning_trace_var population and clearing in the streaming path; all previously flagged parity gaps addressed.
nemoguardrails/integrations/langchain/llm_adapter.py Adds tool call chunk accumulation in stream_async and _finalize_tool_call_acc; finalization is gated on finish_reason == 'tool_calls' which may silently drop accumulated calls for non-standard providers.
tests/test_actions_llm_utils.py New tests cover tool call accumulation, reasoning accumulation (including reasoning_trace_var assertion), stale-var clearing, and provider metadata storage.
tests/test_langchain_llm_adapter.py New TestStreamingToolCalls class covers single/parallel tool calls, invalid JSON fallback, and text-only stream; covers all new stream_async logic paths.

Sequence Diagram

sequenceDiagram
    participant C as Caller (llm_call)
    participant S as _stream_llm_call
    participant A as LangChainLLMAdapter.stream_async
    participant H as StreamingHandler

    C->>S: call with streaming_handler
    S->>A: async for chunk in stream_async(...)
    loop Per chunk
        A->>A: accumulate tool_call_chunks → tool_call_acc
        A->>A: on finish_reason=="tool_calls": _finalize_tool_call_acc()
        A-->>S: yield LLMResponseChunk (delta_content, delta_tool_calls, delta_reasoning, ...)
        S->>S: accumulate reasoning, tool_calls, usage, model, etc.
        S->>H: push_chunk(content, metadata)
    end
    S->>S: llm_response_metadata_var.set(...)
    S->>H: finish()
    S->>S: tool_calls_var.set([...]) or set(None)
    S->>S: reasoning_trace_var.set(reasoning_content or None)
    S-->>C: return LLMResponse(content, reasoning, tool_calls, ...)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemoguardrails/integrations/langchain/llm_adapter.py
Line: 232-234

Comment:
**Accumulated tool calls silently dropped on non-standard finish reason**

Finalization only fires when `response_chunk.finish_reason == "tool_calls"`. If a provider populates `tool_call_chunks` across multiple chunks but terminates with `"stop"` (or any other finish reason, or no finish reason at all), the entire `tool_call_acc` is silently discarded and no `delta_tool_calls` is ever emitted. Standard OpenAI and Anthropic providers always use `"tool_calls"` / `"tool_use"` (mapped to `"tool_calls"`), so this is safe today, but a missed finalization guard would cause hard-to-debug data loss for non-standard providers.

Consider also flushing on stream end if `tool_call_acc` is non-empty:

```python
            if response_chunk.finish_reason == "tool_calls" and tool_call_acc:
                response_chunk.delta_tool_calls = _finalize_tool_call_acc(tool_call_acc)

            yield response_chunk

        # Flush any accumulated tool calls not emitted via finish_reason
        if tool_call_acc:
            log.warning(
                "Stream ended with %d accumulated tool call(s) but no 'tool_calls' finish reason; "
                "emitting anyway.",
                len(tool_call_acc),
            )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemoguardrails/actions/llm/utils.py
Line: 116-118

Comment:
**Last-write-wins on `delta_tool_calls` across chunks**

`tool_calls` is overwritten on every chunk that carries `delta_tool_calls`, so only the final emitting chunk's value survives. The current `LangChainLLMAdapter` is designed to emit one consolidated `delta_tool_calls` list on the last chunk (the `finish_reason == "tool_calls"` chunk), so this is safe in practice. However, any future `LLMModel` adapter that progressively yields partial tool call lists across multiple chunks would have all but the last silently discarded, which is easy to miss. A brief comment documenting the expected single-emission contract would make this fragile assumption explicit.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (5): Last reviewed commit: "fix(llm): separate provider metadata fro..." | Re-trigger Greptile

Comment thread nemoguardrails/actions/llm/utils.py
Comment thread nemoguardrails/actions/llm/utils.py
Comment thread nemoguardrails/integrations/langchain/llm_adapter.py
Comment thread tests/test_actions_llm_utils.py
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 96.77419% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...moguardrails/integrations/langchain/llm_adapter.py 93.10% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi requested a review from tgasser-nv April 14, 2026 13:10
@Pouyanpi Pouyanpi force-pushed the feat/langchain-decouple/stack-7-provider-registry branch from cfe3d8d to af30ae5 Compare April 16, 2026 08:50
Base automatically changed from feat/langchain-decouple/stack-7-provider-registry to develop April 16, 2026 08:58
Split accumulated_metadata into streaming_handler_metadata (wrapped structure
for StreamingHandler) and accumulated_provider_metadata (flat dict matching
non-streaming path). Fixes schema parity for llm_response_metadata_var.

Follow-up: streaming_handler_metadata still uses the wrapped structure from
_extract_chunk_metadata ({"provider_metadata": ..., "usage": ...}). If output
rails or streaming consumers need raw provider metadata, that function should
be revisited.
@Pouyanpi Pouyanpi force-pushed the feat/langchain-decouple/stack-11-streaming-parity branch from a89e626 to fbf78bb Compare April 16, 2026 09:00
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

The changes expand streaming LLM call handling to accumulate and propagate additional chunk-derived fields (reasoning, tool_calls, model, finish_reason, request_id, usage) across streaming chunks. Tool calls are accumulated from fragmented stream chunks and finalized when finish_reason indicates tool completion. Provider metadata aggregation is refactored, and token statistics are updated from usage data. Tests validate the new streaming accumulation behavior for both core utilities and the LangChain adapter.

Changes

Cohort / File(s) Summary
Core LLM Streaming Enhancements
nemoguardrails/actions/llm/utils.py, nemoguardrails/integrations/langchain/llm_adapter.py
Expanded streaming call handling to accumulate and propagate chunk-derived fields including reasoning, tool_calls, model, finish_reason, request_id, and usage. Tool-call fragments are accumulated from streamed chunks and finalized based on finish_reason. Provider metadata aggregation refactored and token statistics updated from usage data.
Streaming Test Coverage
tests/test_actions_llm_utils.py, tests/test_langchain_llm_adapter.py
Added comprehensive test suites validating streaming accumulation behavior: tool call aggregation and preservation of completion state, reasoning aggregation from delta fragments, text-only fallback behavior, request ID extraction, context variable management, and provider metadata handling. Includes helper functions for fabricating streaming chunks and collecting adapter output.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LangChainAdapter as LangChain<br/>Adapter
    participant LLMUtils as LLM Utils<br/>Streaming
    participant ContextVars as Context<br/>Variables
    
    Client->>LangChainAdapter: stream_async()
    loop For each chunk
        LangChainAdapter->>LangChainAdapter: Accumulate tool_call fragments<br/>from chunk.tool_call_chunks
        LangChainAdapter->>LangChainAdapter: Convert to LLMResponseChunk
        alt finish_reason == "tool_calls"
            LangChainAdapter->>LangChainAdapter: Finalize accumulated tool_calls<br/>into delta_tool_calls
        end
        LangChainAdapter-->>Client: LLMResponseChunk
    end
    
    Client->>LLMUtils: _stream_llm_call()
    loop For each chunk from stream
        LLMUtils->>LLMUtils: Accumulate fields:<br/>content, reasoning, tool_calls<br/>model, finish_reason, request_id
        LLMUtils->>LLMUtils: Update provider_metadata
        alt reasoning accumulated
            LLMUtils->>ContextVars: Set reasoning_trace_var
        end
        alt tool_calls accumulated
            LLMUtils->>ContextVars: Set tool_calls_var
        end
        LLMUtils->>ContextVars: Set llm_response_metadata_var
        LLMUtils-->>Client: Accumulated response data
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR introduces major streaming tool call features but PR description lacks any test results, test coverage documentation, or verification of regression testing. Update PR description to document test results, test coverage for streaming features, regression test verification, and backward compatibility confirmation.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding streaming tool call accumulation and achieving LLMResponse parity across streaming and non-streaming paths.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/langchain-decouple/stack-11-streaming-parity

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoguardrails/integrations/langchain/llm_adapter.py`:
- Around line 388-392: The warning logs raw_args on JSON parse failures which
can leak PII/secrets; update the exception handler in the JSON parsing block
(where json.loads(raw_args) is called) to NOT include raw_args in the log and
instead log only stable identifiers such as entry["name"], idx, and the
size/length of raw_args (e.g., len(raw_args) or 0 when empty), then set
args_dict = {} as before; ensure the logging call references the same
identifiers used elsewhere to make the message useful without exposing payload
content.

In `@tests/test_actions_llm_utils.py`:
- Around line 611-639: The suite is leaking llm_response_metadata_var across
tests; update the test fixture that currently restores reasoning/tool-call state
(the autouse fixture referenced by these tests) to also reset
llm_response_metadata_var to a clean state before/after each test (e.g., call
llm_response_metadata_var.set(None) or use the ContextVar.Token from
llm_response_metadata_var.set and reset it in teardown). Ensure this change
affects tests including test_provider_metadata_stored_flat and
test_clears_metadata_var_when_none so the metadata contextvar cannot carry stale
values between tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8031d88a-bbad-4a48-8474-2d72e56ef7e0

📥 Commits

Reviewing files that changed from the base of the PR and between 5c2c1ea and fbf78bb.

📒 Files selected for processing (4)
  • nemoguardrails/actions/llm/utils.py
  • nemoguardrails/integrations/langchain/llm_adapter.py
  • tests/test_actions_llm_utils.py
  • tests/test_langchain_llm_adapter.py

Comment on lines +388 to +392
try:
args_dict = json.loads(raw_args) if raw_args else {}
except json.JSONDecodeError:
log.warning("Failed to parse tool call arguments for '%s' (index %d): %r", entry["name"], idx, raw_args)
args_dict = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't log raw tool-call arguments on parse failures.

raw_args can contain user-derived tool parameters, so echoing the full payload in a warning risks leaking PII/secrets into logs. Log only stable identifiers like tool name, index, and payload size.

🔒 Proposed fix
         try:
             args_dict = json.loads(raw_args) if raw_args else {}
         except json.JSONDecodeError:
-            log.warning("Failed to parse tool call arguments for '%s' (index %d): %r", entry["name"], idx, raw_args)
+            log.warning(
+                "Failed to parse tool call arguments for '%s' (index %d, %d bytes)",
+                entry["name"],
+                idx,
+                len(raw_args),
+            )
             args_dict = {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoguardrails/integrations/langchain/llm_adapter.py` around lines 388 - 392,
The warning logs raw_args on JSON parse failures which can leak PII/secrets;
update the exception handler in the JSON parsing block (where
json.loads(raw_args) is called) to NOT include raw_args in the log and instead
log only stable identifiers such as entry["name"], idx, and the size/length of
raw_args (e.g., len(raw_args) or 0 when empty), then set args_dict = {} as
before; ensure the logging call references the same identifiers used elsewhere
to make the message useful without exposing payload content.

Comment on lines +611 to +639
async def test_provider_metadata_stored_flat(self):
model = _make_chunk_model(
[
LLMResponseChunk(
delta_content="hi",
provider_metadata={"system_fingerprint": "fp_abc"},
finish_reason="stop",
),
]
)

await _stream_llm_call(model, "test", StreamingHandler(), stop=None)

metadata = llm_response_metadata_var.get()
assert metadata == {"system_fingerprint": "fp_abc"}

@pytest.mark.asyncio
async def test_clears_metadata_var_when_none(self):
llm_response_metadata_var.set({"stale": True})

model = _make_chunk_model(
[
LLMResponseChunk(delta_content="no metadata", finish_reason="stop"),
]
)

await _stream_llm_call(model, "test", StreamingHandler(), stop=None)

assert llm_response_metadata_var.get() is None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reset llm_response_metadata_var between tests.

These new cases mutate the metadata contextvar, but the autouse fixture still only restores reasoning/tool-call state. That makes the suite order-dependent and can mask regressions with stale metadata.

🧪 Fixture update
 `@pytest.fixture`(autouse=True)
 def reset_context_vars():
     reasoning_token = reasoning_trace_var.set(None)
     tool_calls_token = tool_calls_var.set(None)
+    metadata_token = llm_response_metadata_var.set(None)

     yield

+    llm_response_metadata_var.reset(metadata_token)
     reasoning_trace_var.reset(reasoning_token)
     tool_calls_var.reset(tool_calls_token)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_actions_llm_utils.py` around lines 611 - 639, The suite is leaking
llm_response_metadata_var across tests; update the test fixture that currently
restores reasoning/tool-call state (the autouse fixture referenced by these
tests) to also reset llm_response_metadata_var to a clean state before/after
each test (e.g., call llm_response_metadata_var.set(None) or use the
ContextVar.Token from llm_response_metadata_var.set and reset it in teardown).
Ensure this change affects tests including test_provider_metadata_stored_flat
and test_clears_metadata_var_when_none so the metadata contextvar cannot carry
stale values between tests.

@Pouyanpi Pouyanpi merged commit f2d4492 into develop Apr 16, 2026
7 checks passed
@Pouyanpi Pouyanpi deleted the feat/langchain-decouple/stack-11-streaming-parity branch April 16, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant