feat(llm): add streaming tool call accumulation and LLMResponse parity by Pouyanpi · Pull Request #1789 · NVIDIA-NeMo/Guardrails

Pouyanpi · 2026-04-14T11:41:14Z

Description

addresses #1760 (comment)

part of #1760

must be merged after #1773

Summary by CodeRabbit

New Features
- Streaming responses now include complete metadata: reasoning chains, tool calls, model information, finish reasons, request IDs, and token usage statistics
- Improved tool calling during streaming with better handling of fragmented tool call arguments across multiple chunks
- Enhanced context preservation and metadata tracking during streaming operations

greptile-apps · 2026-04-14T11:45:16Z

Greptile Summary

This PR adds streaming tool call accumulation to LangChainLLMAdapter.stream_async and establishes parity between the streaming and non-streaming paths in _stream_llm_call (setting tool_calls_var, reasoning_trace_var, and llm_response_metadata_var). The addressed issues from previous review threads — stale tool_calls_var, missing reasoning_trace_var population, silent JSON parse failure, and missing test assertion — all appear resolved.

Confidence Score: 5/5

Safe to merge; all previously flagged P0/P1 issues are resolved and remaining findings are edge-case P2 suggestions.

All three previously-threaded issues (stale tool_calls_var, missing reasoning_trace_var, silent JSON failure) are addressed. The two remaining comments are P2: an edge-case where non-standard providers might not emit finish_reason 'tool_calls', and a documentation gap about the single-emission contract for delta_tool_calls. Neither blocks correctness for supported providers.

nemoguardrails/integrations/langchain/llm_adapter.py — tool call finalization guard at line 232

Important Files Changed

Filename	Overview
nemoguardrails/actions/llm/utils.py	Adds tool_calls_var, reasoning_trace_var population and clearing in the streaming path; all previously flagged parity gaps addressed.
nemoguardrails/integrations/langchain/llm_adapter.py	Adds tool call chunk accumulation in stream_async and _finalize_tool_call_acc; finalization is gated on finish_reason == 'tool_calls' which may silently drop accumulated calls for non-standard providers.
tests/test_actions_llm_utils.py	New tests cover tool call accumulation, reasoning accumulation (including reasoning_trace_var assertion), stale-var clearing, and provider metadata storage.
tests/test_langchain_llm_adapter.py	New TestStreamingToolCalls class covers single/parallel tool calls, invalid JSON fallback, and text-only stream; covers all new stream_async logic paths.

Sequence Diagram

sequenceDiagram
    participant C as Caller (llm_call)
    participant S as _stream_llm_call
    participant A as LangChainLLMAdapter.stream_async
    participant H as StreamingHandler

    C->>S: call with streaming_handler
    S->>A: async for chunk in stream_async(...)
    loop Per chunk
        A->>A: accumulate tool_call_chunks → tool_call_acc
        A->>A: on finish_reason=="tool_calls": _finalize_tool_call_acc()
        A-->>S: yield LLMResponseChunk (delta_content, delta_tool_calls, delta_reasoning, ...)
        S->>S: accumulate reasoning, tool_calls, usage, model, etc.
        S->>H: push_chunk(content, metadata)
    end
    S->>S: llm_response_metadata_var.set(...)
    S->>H: finish()
    S->>S: tool_calls_var.set([...]) or set(None)
    S->>S: reasoning_trace_var.set(reasoning_content or None)
    S-->>C: return LLMResponse(content, reasoning, tool_calls, ...)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemoguardrails/integrations/langchain/llm_adapter.py
Line: 232-234

Comment:
**Accumulated tool calls silently dropped on non-standard finish reason**

Finalization only fires when `response_chunk.finish_reason == "tool_calls"`. If a provider populates `tool_call_chunks` across multiple chunks but terminates with `"stop"` (or any other finish reason, or no finish reason at all), the entire `tool_call_acc` is silently discarded and no `delta_tool_calls` is ever emitted. Standard OpenAI and Anthropic providers always use `"tool_calls"` / `"tool_use"` (mapped to `"tool_calls"`), so this is safe today, but a missed finalization guard would cause hard-to-debug data loss for non-standard providers.

Consider also flushing on stream end if `tool_call_acc` is non-empty:

```python
            if response_chunk.finish_reason == "tool_calls" and tool_call_acc:
                response_chunk.delta_tool_calls = _finalize_tool_call_acc(tool_call_acc)

            yield response_chunk

        # Flush any accumulated tool calls not emitted via finish_reason
        if tool_call_acc:
            log.warning(
                "Stream ended with %d accumulated tool call(s) but no 'tool_calls' finish reason; "
                "emitting anyway.",
                len(tool_call_acc),
            )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemoguardrails/actions/llm/utils.py
Line: 116-118

Comment:
**Last-write-wins on `delta_tool_calls` across chunks**

`tool_calls` is overwritten on every chunk that carries `delta_tool_calls`, so only the final emitting chunk's value survives. The current `LangChainLLMAdapter` is designed to emit one consolidated `delta_tool_calls` list on the last chunk (the `finish_reason == "tool_calls"` chunk), so this is safe in practice. However, any future `LLMModel` adapter that progressively yields partial tool call lists across multiple chunks would have all but the last silently discarded, which is easy to miss. A brief comment documenting the expected single-emission contract would make this fragile assumption explicit.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (5): Last reviewed commit: "fix(llm): separate provider metadata fro..." | Re-trigger Greptile}

codecov · 2026-04-14T11:48:46Z

Codecov Report

❌ Patch coverage is 96.77419% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...moguardrails/integrations/langchain/llm_adapter.py	93.10%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Split accumulated_metadata into streaming_handler_metadata (wrapped structure for StreamingHandler) and accumulated_provider_metadata (flat dict matching non-streaming path). Fixes schema parity for llm_response_metadata_var. Follow-up: streaming_handler_metadata still uses the wrapped structure from _extract_chunk_metadata ({"provider_metadata": ..., "usage": ...}). If output rails or streaming consumers need raw provider metadata, that function should be revisited.

coderabbitai · 2026-04-16T09:07:13Z

📝 Walkthrough

Walkthrough

The changes expand streaming LLM call handling to accumulate and propagate additional chunk-derived fields (reasoning, tool_calls, model, finish_reason, request_id, usage) across streaming chunks. Tool calls are accumulated from fragmented stream chunks and finalized when finish_reason indicates tool completion. Provider metadata aggregation is refactored, and token statistics are updated from usage data. Tests validate the new streaming accumulation behavior for both core utilities and the LangChain adapter.

Changes

Cohort / File(s)	Summary
Core LLM Streaming Enhancements `nemoguardrails/actions/llm/utils.py`, `nemoguardrails/integrations/langchain/llm_adapter.py`	Expanded streaming call handling to accumulate and propagate chunk-derived fields including `reasoning`, `tool_calls`, `model`, `finish_reason`, `request_id`, and `usage`. Tool-call fragments are accumulated from streamed chunks and finalized based on `finish_reason`. Provider metadata aggregation refactored and token statistics updated from usage data.
Streaming Test Coverage `tests/test_actions_llm_utils.py`, `tests/test_langchain_llm_adapter.py`	Added comprehensive test suites validating streaming accumulation behavior: tool call aggregation and preservation of completion state, reasoning aggregation from delta fragments, text-only fallback behavior, request ID extraction, context variable management, and provider metadata handling. Includes helper functions for fabricating streaming chunks and collecting adapter output.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LangChainAdapter as LangChain<br/>Adapter
    participant LLMUtils as LLM Utils<br/>Streaming
    participant ContextVars as Context<br/>Variables
    
    Client->>LangChainAdapter: stream_async()
    loop For each chunk
        LangChainAdapter->>LangChainAdapter: Accumulate tool_call fragments<br/>from chunk.tool_call_chunks
        LangChainAdapter->>LangChainAdapter: Convert to LLMResponseChunk
        alt finish_reason == "tool_calls"
            LangChainAdapter->>LangChainAdapter: Finalize accumulated tool_calls<br/>into delta_tool_calls
        end
        LangChainAdapter-->>Client: LLMResponseChunk
    end
    
    Client->>LLMUtils: _stream_llm_call()
    loop For each chunk from stream
        LLMUtils->>LLMUtils: Accumulate fields:<br/>content, reasoning, tool_calls<br/>model, finish_reason, request_id
        LLMUtils->>LLMUtils: Update provider_metadata
        alt reasoning accumulated
            LLMUtils->>ContextVars: Set reasoning_trace_var
        end
        alt tool_calls accumulated
            LLMUtils->>ContextVars: Set tool_calls_var
        end
        LLMUtils->>ContextVars: Set llm_response_metadata_var
        LLMUtils-->>Client: Accumulated response data
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR introduces major streaming tool call features but PR description lacks any test results, test coverage documentation, or verification of regression testing.	Update PR description to document test results, test coverage for streaming features, regression test verification, and backward compatibility confirmation.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: adding streaming tool call accumulation and achieving LLMResponse parity across streaming and non-streaming paths.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/langchain-decouple/stack-11-streaming-parity

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoguardrails/integrations/langchain/llm_adapter.py`:
- Around line 388-392: The warning logs raw_args on JSON parse failures which
can leak PII/secrets; update the exception handler in the JSON parsing block
(where json.loads(raw_args) is called) to NOT include raw_args in the log and
instead log only stable identifiers such as entry["name"], idx, and the
size/length of raw_args (e.g., len(raw_args) or 0 when empty), then set
args_dict = {} as before; ensure the logging call references the same
identifiers used elsewhere to make the message useful without exposing payload
content.

In `@tests/test_actions_llm_utils.py`:
- Around line 611-639: The suite is leaking llm_response_metadata_var across
tests; update the test fixture that currently restores reasoning/tool-call state
(the autouse fixture referenced by these tests) to also reset
llm_response_metadata_var to a clean state before/after each test (e.g., call
llm_response_metadata_var.set(None) or use the ContextVar.Token from
llm_response_metadata_var.set and reset it in teardown). Ensure this change
affects tests including test_provider_metadata_stored_flat and
test_clears_metadata_var_when_none so the metadata contextvar cannot carry stale
values between tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8031d88a-bbad-4a48-8474-2d72e56ef7e0

📥 Commits

Reviewing files that changed from the base of the PR and between 5c2c1ea and fbf78bb.

📒 Files selected for processing (4)

nemoguardrails/actions/llm/utils.py
nemoguardrails/integrations/langchain/llm_adapter.py
tests/test_actions_llm_utils.py
tests/test_langchain_llm_adapter.py

coderabbitai · 2026-04-16T09:07:16Z

+        try:
+            args_dict = json.loads(raw_args) if raw_args else {}
+        except json.JSONDecodeError:
+            log.warning("Failed to parse tool call arguments for '%s' (index %d): %r", entry["name"], idx, raw_args)
+            args_dict = {}


⚠️ Potential issue | 🟠 Major

Don't log raw tool-call arguments on parse failures.

raw_args can contain user-derived tool parameters, so echoing the full payload in a warning risks leaking PII/secrets into logs. Log only stable identifiers like tool name, index, and payload size.

🔒 Proposed fix

try: args_dict = json.loads(raw_args) if raw_args else {} except json.JSONDecodeError: - log.warning("Failed to parse tool call arguments for '%s' (index %d): %r", entry["name"], idx, raw_args) + log.warning( + "Failed to parse tool call arguments for '%s' (index %d, %d bytes)", + entry["name"], + idx, + len(raw_args), + ) args_dict = {}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/integrations/langchain/llm_adapter.py` around lines 388 - 392, The warning logs raw_args on JSON parse failures which can leak PII/secrets; update the exception handler in the JSON parsing block (where json.loads(raw_args) is called) to NOT include raw_args in the log and instead log only stable identifiers such as entry["name"], idx, and the size/length of raw_args (e.g., len(raw_args) or 0 when empty), then set args_dict = {} as before; ensure the logging call references the same identifiers used elsewhere to make the message useful without exposing payload content.

coderabbitai · 2026-04-16T09:07:16Z

+    async def test_provider_metadata_stored_flat(self):
+        model = _make_chunk_model(
+            [
+                LLMResponseChunk(
+                    delta_content="hi",
+                    provider_metadata={"system_fingerprint": "fp_abc"},
+                    finish_reason="stop",
+                ),
+            ]
+        )
+
+        await _stream_llm_call(model, "test", StreamingHandler(), stop=None)
+
+        metadata = llm_response_metadata_var.get()
+        assert metadata == {"system_fingerprint": "fp_abc"}
+
+    @pytest.mark.asyncio
+    async def test_clears_metadata_var_when_none(self):
+        llm_response_metadata_var.set({"stale": True})
+
+        model = _make_chunk_model(
+            [
+                LLMResponseChunk(delta_content="no metadata", finish_reason="stop"),
+            ]
+        )
+
+        await _stream_llm_call(model, "test", StreamingHandler(), stop=None)
+
+        assert llm_response_metadata_var.get() is None


⚠️ Potential issue | 🟡 Minor

Reset llm_response_metadata_var between tests.

These new cases mutate the metadata contextvar, but the autouse fixture still only restores reasoning/tool-call state. That makes the suite order-dependent and can mask regressions with stale metadata.

🧪 Fixture update

`@pytest.fixture`(autouse=True) def reset_context_vars(): reasoning_token = reasoning_trace_var.set(None) tool_calls_token = tool_calls_var.set(None) + metadata_token = llm_response_metadata_var.set(None) yield + llm_response_metadata_var.reset(metadata_token) reasoning_trace_var.reset(reasoning_token) tool_calls_var.reset(tool_calls_token)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_actions_llm_utils.py` around lines 611 - 639, The suite is leaking llm_response_metadata_var across tests; update the test fixture that currently restores reasoning/tool-call state (the autouse fixture referenced by these tests) to also reset llm_response_metadata_var to a clean state before/after each test (e.g., call llm_response_metadata_var.set(None) or use the ContextVar.Token from llm_response_metadata_var.set and reset it in teardown). Ensure this change affects tests including test_provider_metadata_stored_flat and test_clears_metadata_var_when_none so the metadata contextvar cannot carry stale values between tests.

Pouyanpi mentioned this pull request Apr 14, 2026

refactor(llm)!: atomic switch to LLMModel protocol #1760

Merged

Pouyanpi self-assigned this Apr 14, 2026

Pouyanpi added the enhancement New feature or request label Apr 14, 2026

Pouyanpi added this to the v0.22.0 milestone Apr 14, 2026

greptile-apps bot reviewed Apr 14, 2026

View reviewed changes

Comment thread nemoguardrails/actions/llm/utils.py

Comment thread nemoguardrails/actions/llm/utils.py

Comment thread nemoguardrails/integrations/langchain/llm_adapter.py

Comment thread tests/test_actions_llm_utils.py

Pouyanpi requested a review from tgasser-nv April 14, 2026 13:10

Pouyanpi force-pushed the feat/langchain-decouple/stack-7-provider-registry branch from cfe3d8d to af30ae5 Compare April 16, 2026 08:50

Base automatically changed from feat/langchain-decouple/stack-7-provider-registry to develop April 16, 2026 08:58

Pouyanpi added 4 commits April 16, 2026 11:00

feat(llm): add streaming tool call accumulation and LLMResponse parity

9883a0f

apply review suggestions

2631b71

apply greptile reviews

24554ad

Pouyanpi force-pushed the feat/langchain-decouple/stack-11-streaming-parity branch from a89e626 to fbf78bb Compare April 16, 2026 09:00

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Pouyanpi merged commit f2d4492 into develop Apr 16, 2026
7 checks passed

Pouyanpi deleted the feat/langchain-decouple/stack-11-streaming-parity branch April 16, 2026 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): add streaming tool call accumulation and LLMResponse parity#1789

feat(llm): add streaming tool call accumulation and LLMResponse parity#1789
Pouyanpi merged 4 commits intodevelopfrom
feat/langchain-decouple/stack-11-streaming-parity

Pouyanpi commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

greptile-apps bot commented Apr 14, 2026 •

edited

Loading

Confidence Score: 5/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 16, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 16, 2026

Uh oh!

coderabbitai bot Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pouyanpi commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

greptile-apps bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot commented Apr 16, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pouyanpi commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

greptile-apps bot commented Apr 14, 2026 •

edited

Loading

codecov bot commented Apr 14, 2026 •

edited

Loading