Conversation
Greptile SummaryThis PR adds streaming tool call accumulation to
|
| Filename | Overview |
|---|---|
| nemoguardrails/actions/llm/utils.py | Adds tool_calls_var, reasoning_trace_var population and clearing in the streaming path; all previously flagged parity gaps addressed. |
| nemoguardrails/integrations/langchain/llm_adapter.py | Adds tool call chunk accumulation in stream_async and _finalize_tool_call_acc; finalization is gated on finish_reason == 'tool_calls' which may silently drop accumulated calls for non-standard providers. |
| tests/test_actions_llm_utils.py | New tests cover tool call accumulation, reasoning accumulation (including reasoning_trace_var assertion), stale-var clearing, and provider metadata storage. |
| tests/test_langchain_llm_adapter.py | New TestStreamingToolCalls class covers single/parallel tool calls, invalid JSON fallback, and text-only stream; covers all new stream_async logic paths. |
Sequence Diagram
sequenceDiagram
participant C as Caller (llm_call)
participant S as _stream_llm_call
participant A as LangChainLLMAdapter.stream_async
participant H as StreamingHandler
C->>S: call with streaming_handler
S->>A: async for chunk in stream_async(...)
loop Per chunk
A->>A: accumulate tool_call_chunks → tool_call_acc
A->>A: on finish_reason=="tool_calls": _finalize_tool_call_acc()
A-->>S: yield LLMResponseChunk (delta_content, delta_tool_calls, delta_reasoning, ...)
S->>S: accumulate reasoning, tool_calls, usage, model, etc.
S->>H: push_chunk(content, metadata)
end
S->>S: llm_response_metadata_var.set(...)
S->>H: finish()
S->>S: tool_calls_var.set([...]) or set(None)
S->>S: reasoning_trace_var.set(reasoning_content or None)
S-->>C: return LLMResponse(content, reasoning, tool_calls, ...)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemoguardrails/integrations/langchain/llm_adapter.py
Line: 232-234
Comment:
**Accumulated tool calls silently dropped on non-standard finish reason**
Finalization only fires when `response_chunk.finish_reason == "tool_calls"`. If a provider populates `tool_call_chunks` across multiple chunks but terminates with `"stop"` (or any other finish reason, or no finish reason at all), the entire `tool_call_acc` is silently discarded and no `delta_tool_calls` is ever emitted. Standard OpenAI and Anthropic providers always use `"tool_calls"` / `"tool_use"` (mapped to `"tool_calls"`), so this is safe today, but a missed finalization guard would cause hard-to-debug data loss for non-standard providers.
Consider also flushing on stream end if `tool_call_acc` is non-empty:
```python
if response_chunk.finish_reason == "tool_calls" and tool_call_acc:
response_chunk.delta_tool_calls = _finalize_tool_call_acc(tool_call_acc)
yield response_chunk
# Flush any accumulated tool calls not emitted via finish_reason
if tool_call_acc:
log.warning(
"Stream ended with %d accumulated tool call(s) but no 'tool_calls' finish reason; "
"emitting anyway.",
len(tool_call_acc),
)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemoguardrails/actions/llm/utils.py
Line: 116-118
Comment:
**Last-write-wins on `delta_tool_calls` across chunks**
`tool_calls` is overwritten on every chunk that carries `delta_tool_calls`, so only the final emitting chunk's value survives. The current `LangChainLLMAdapter` is designed to emit one consolidated `delta_tool_calls` list on the last chunk (the `finish_reason == "tool_calls"` chunk), so this is safe in practice. However, any future `LLMModel` adapter that progressively yields partial tool call lists across multiple chunks would have all but the last silently discarded, which is easy to miss. A brief comment documenting the expected single-emission contract would make this fragile assumption explicit.
How can I resolve this? If you propose a fix, please make it concise.Reviews (5): Last reviewed commit: "fix(llm): separate provider metadata fro..." | Re-trigger Greptile
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
cfe3d8d to
af30ae5
Compare
Split accumulated_metadata into streaming_handler_metadata (wrapped structure
for StreamingHandler) and accumulated_provider_metadata (flat dict matching
non-streaming path). Fixes schema parity for llm_response_metadata_var.
Follow-up: streaming_handler_metadata still uses the wrapped structure from
_extract_chunk_metadata ({"provider_metadata": ..., "usage": ...}). If output
rails or streaming consumers need raw provider metadata, that function should
be revisited.
a89e626 to
fbf78bb
Compare
📝 WalkthroughWalkthroughThe changes expand streaming LLM call handling to accumulate and propagate additional chunk-derived fields ( Changes
Sequence DiagramsequenceDiagram
participant Client
participant LangChainAdapter as LangChain<br/>Adapter
participant LLMUtils as LLM Utils<br/>Streaming
participant ContextVars as Context<br/>Variables
Client->>LangChainAdapter: stream_async()
loop For each chunk
LangChainAdapter->>LangChainAdapter: Accumulate tool_call fragments<br/>from chunk.tool_call_chunks
LangChainAdapter->>LangChainAdapter: Convert to LLMResponseChunk
alt finish_reason == "tool_calls"
LangChainAdapter->>LangChainAdapter: Finalize accumulated tool_calls<br/>into delta_tool_calls
end
LangChainAdapter-->>Client: LLMResponseChunk
end
Client->>LLMUtils: _stream_llm_call()
loop For each chunk from stream
LLMUtils->>LLMUtils: Accumulate fields:<br/>content, reasoning, tool_calls<br/>model, finish_reason, request_id
LLMUtils->>LLMUtils: Update provider_metadata
alt reasoning accumulated
LLMUtils->>ContextVars: Set reasoning_trace_var
end
alt tool_calls accumulated
LLMUtils->>ContextVars: Set tool_calls_var
end
LLMUtils->>ContextVars: Set llm_response_metadata_var
LLMUtils-->>Client: Accumulated response data
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemoguardrails/integrations/langchain/llm_adapter.py`:
- Around line 388-392: The warning logs raw_args on JSON parse failures which
can leak PII/secrets; update the exception handler in the JSON parsing block
(where json.loads(raw_args) is called) to NOT include raw_args in the log and
instead log only stable identifiers such as entry["name"], idx, and the
size/length of raw_args (e.g., len(raw_args) or 0 when empty), then set
args_dict = {} as before; ensure the logging call references the same
identifiers used elsewhere to make the message useful without exposing payload
content.
In `@tests/test_actions_llm_utils.py`:
- Around line 611-639: The suite is leaking llm_response_metadata_var across
tests; update the test fixture that currently restores reasoning/tool-call state
(the autouse fixture referenced by these tests) to also reset
llm_response_metadata_var to a clean state before/after each test (e.g., call
llm_response_metadata_var.set(None) or use the ContextVar.Token from
llm_response_metadata_var.set and reset it in teardown). Ensure this change
affects tests including test_provider_metadata_stored_flat and
test_clears_metadata_var_when_none so the metadata contextvar cannot carry stale
values between tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 8031d88a-bbad-4a48-8474-2d72e56ef7e0
📒 Files selected for processing (4)
nemoguardrails/actions/llm/utils.pynemoguardrails/integrations/langchain/llm_adapter.pytests/test_actions_llm_utils.pytests/test_langchain_llm_adapter.py
| try: | ||
| args_dict = json.loads(raw_args) if raw_args else {} | ||
| except json.JSONDecodeError: | ||
| log.warning("Failed to parse tool call arguments for '%s' (index %d): %r", entry["name"], idx, raw_args) | ||
| args_dict = {} |
There was a problem hiding this comment.
Don't log raw tool-call arguments on parse failures.
raw_args can contain user-derived tool parameters, so echoing the full payload in a warning risks leaking PII/secrets into logs. Log only stable identifiers like tool name, index, and payload size.
🔒 Proposed fix
try:
args_dict = json.loads(raw_args) if raw_args else {}
except json.JSONDecodeError:
- log.warning("Failed to parse tool call arguments for '%s' (index %d): %r", entry["name"], idx, raw_args)
+ log.warning(
+ "Failed to parse tool call arguments for '%s' (index %d, %d bytes)",
+ entry["name"],
+ idx,
+ len(raw_args),
+ )
args_dict = {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemoguardrails/integrations/langchain/llm_adapter.py` around lines 388 - 392,
The warning logs raw_args on JSON parse failures which can leak PII/secrets;
update the exception handler in the JSON parsing block (where
json.loads(raw_args) is called) to NOT include raw_args in the log and instead
log only stable identifiers such as entry["name"], idx, and the size/length of
raw_args (e.g., len(raw_args) or 0 when empty), then set args_dict = {} as
before; ensure the logging call references the same identifiers used elsewhere
to make the message useful without exposing payload content.
| async def test_provider_metadata_stored_flat(self): | ||
| model = _make_chunk_model( | ||
| [ | ||
| LLMResponseChunk( | ||
| delta_content="hi", | ||
| provider_metadata={"system_fingerprint": "fp_abc"}, | ||
| finish_reason="stop", | ||
| ), | ||
| ] | ||
| ) | ||
|
|
||
| await _stream_llm_call(model, "test", StreamingHandler(), stop=None) | ||
|
|
||
| metadata = llm_response_metadata_var.get() | ||
| assert metadata == {"system_fingerprint": "fp_abc"} | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_clears_metadata_var_when_none(self): | ||
| llm_response_metadata_var.set({"stale": True}) | ||
|
|
||
| model = _make_chunk_model( | ||
| [ | ||
| LLMResponseChunk(delta_content="no metadata", finish_reason="stop"), | ||
| ] | ||
| ) | ||
|
|
||
| await _stream_llm_call(model, "test", StreamingHandler(), stop=None) | ||
|
|
||
| assert llm_response_metadata_var.get() is None |
There was a problem hiding this comment.
Reset llm_response_metadata_var between tests.
These new cases mutate the metadata contextvar, but the autouse fixture still only restores reasoning/tool-call state. That makes the suite order-dependent and can mask regressions with stale metadata.
🧪 Fixture update
`@pytest.fixture`(autouse=True)
def reset_context_vars():
reasoning_token = reasoning_trace_var.set(None)
tool_calls_token = tool_calls_var.set(None)
+ metadata_token = llm_response_metadata_var.set(None)
yield
+ llm_response_metadata_var.reset(metadata_token)
reasoning_trace_var.reset(reasoning_token)
tool_calls_var.reset(tool_calls_token)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_actions_llm_utils.py` around lines 611 - 639, The suite is leaking
llm_response_metadata_var across tests; update the test fixture that currently
restores reasoning/tool-call state (the autouse fixture referenced by these
tests) to also reset llm_response_metadata_var to a clean state before/after
each test (e.g., call llm_response_metadata_var.set(None) or use the
ContextVar.Token from llm_response_metadata_var.set and reset it in teardown).
Ensure this change affects tests including test_provider_metadata_stored_flat
and test_clears_metadata_var_when_none so the metadata contextvar cannot carry
stale values between tests.
Description
addresses #1760 (comment)
part of #1760
must be merged after #1773
Summary by CodeRabbit