Skip to content

Feat/context summarization#61

Open
offerakrabi wants to merge 20 commits intomainfrom
feat/context-summarization
Open

Feat/context summarization#61
offerakrabi wants to merge 20 commits intomainfrom
feat/context-summarization

Conversation

@offerakrabi
Copy link
Copy Markdown
Collaborator

@offerakrabi offerakrabi commented Mar 12, 2026

Feature Pull Request

Related Issue

Description

Add context summarization to Cuga in server(demo), SDK and Supervisor

Type of Changes

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Testing

  • I have tested this feature locally
  • I have added tests that prove my feature works
  • All new and existing tests passed

Documentation

  • I have updated the documentation accordingly
  • I have added docstrings to new functions/classes

Checklist

  • My code follows the code style of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation

Summary by CodeRabbit

  • New Features

    • Conversation context summarizer and token-counting utilities for smarter context trimming and metrics.
    • Public helper to apply context summarization before model calls.
  • Improvements

    • Unified pre-invocation context management across agent and supervisor flows to preserve continuity.
    • Safer handling of optional model/config attributes and proactive step-limit enforcement.
    • Enhanced logging around summarization and model invocation.
  • Tests

    • Extensive unit, integration, and end-to-end tests for summarization, token counting, and multi-agent scenarios.
  • Settings

    • New configurable context summarization options.

Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds context summarization and token-counting utilities, integrates pre-LLM context management into ChatAgent/CugaLite/CugaSupervisor call paths, extends AgentState with manage_message_context and summarization metrics, and adds extensive tests plus configuration for context summarization and token accounting.

Changes

Cohort / File(s) Summary
Core Summarization & Token Tools
src/cuga/backend/cuga_graph/utils/context_summarizer.py, src/cuga/backend/cuga_graph/utils/token_counter.py, src/cuga/backend/cuga_graph/utils/message_utils.py
Add ContextSummarizer (middleware orchestration, triggers, metrics), TokenCounter (token estimation, model sizing, usage), and message normalizer convert_to_proper_message_type.
Agent State & Context Management
src/cuga/backend/cuga_graph/state/agent_state.py
Add manage_message_context, _summarize_message_list, and last_summarization_metrics; wire ContextSummarizer/ActivityTracker and provide sliding-window fallback.
Node Integrations
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py, src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py, src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py
Extract model/model_name/tools/system_prompt and call state.manage_message_context before model invocation; CugaLite replaces in-place message mutations with summarized effective_chat_messages and non-mutating modified_chat_messages.
CugaLite Step & Step-limit Flow
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
Replace in-place chat message mutations with building modified_chat_messages; compute new_step_count explicitly, enforce step limits proactively, and use updated messages for sandbox/command flows.
Context Summarization Helper
src/cuga/backend/cuga_graph/utils/context_management_utils.py
Add apply_context_summarization wrapper that runs summarization on a temp AgentState and logs/tracks summarization metrics to an optional tracker.
Config & Defensive Fixes
src/cuga/settings.toml, src/cuga/backend/llm/utils/helpers.py, src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py
Add context_summarization config block, replace direct attribute access with getattr(..., False) guards, and add a clarifying comment in combined_tool_provider.
Tests — Unit / Integration / E2E / SDK
tests/unit/.../test_context_summarizer.py, tests/unit/test_token_counter.py, tests/integration/test_context_summarization.py, src/cuga/sdk_core/tests/test_context_summarization_sdk.py, src/system_tests/e2e/test_context_summarization_e2e.py
Add extensive unit, integration, SDK, and E2E tests covering summarizer triggers, metrics, TokenCounter behavior, tracker integration, supervisor flows, edge cases, and real-model scenarios.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Node as Agent Node
    participant State as AgentState
    participant Counter as TokenCounter
    participant Summarizer as ContextSummarizer
    participant LLM as Model

    Client->>Node: invoke(request)
    Node->>Node: extract model, model_name, tools, system_prompt
    Node->>State: manage_message_context(model, model_name, tools, system_prompt)
    State->>Counter: count_total_context_tokens(messages, tools, system_prompt)
    Counter-->>State: tokens, usage_pct
    State->>Summarizer: should_summarize?(tokens, usage_pct, messages)
    alt trigger satisfied
        Summarizer->>Summarizer: invoke middleware -> summarized_messages, metrics
        Summarizer-->>State: summarized_messages, metrics
    else not triggered
        State-->>Node: original messages
    end
    State-->>Node: updated chat_messages
    Node->>LLM: invoke(updated messages, tools)
    LLM-->>Node: response
    Node-->>Client: result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I nibble tokens, tidy the trail,
I hop through prompts and trim the tale,
Threads stay short while meaning stays bright,
Summaries snug for the model's flight,
Little hops, big insight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feat/context summarization' is vague and uses a convention-based prefix ('Feat/') rather than being a clear, complete sentence. Rewrite the title as a complete sentence describing the main change, such as 'Add context summarization for conversation management' or 'Implement intelligent context summarization across agent layers'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 96.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/context-summarization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (3)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

914-920: Misleading log statement references tools that are always None.

tools_for_later_context is hardcoded to None on line 914, but the log on lines 917-920 references its length as if it could have tools. This could confuse debugging.

🧹 Proposed fix for clarity
             # Note: In CugaLite, tools are managed via sandbox context, not state
             # For context summarization, we'll pass None since tools aren't directly available
-            tools_for_later_context = None

             logger.debug(
                 f"CugaLite: Calling manage_message_context with model_name={model_name}, "
-                f"{len(tools_for_later_context) if tools_for_later_context else 0} tools, "
+                f"tools=None (managed via sandbox), "
                 f"system_prompt={len(dynamic_prompt) if dynamic_prompt else 0} chars"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
914 - 920, The logger.debug message is misleading because
tools_for_later_context is explicitly set to None before logging; update the
logging to reflect the actual state by removing or changing the length check on
tools_for_later_context and instead log a clear value (e.g., "0 tools" or "no
tools") or the correct variable that may contain tools; locate the block around
tools_for_later_context, the call to manage_message_context, and the
logger.debug statement (symbols: tools_for_later_context,
manage_message_context, logger.debug, dynamic_prompt, model_name) and modify the
message to accurately report tools presence and dynamic_prompt length.
src/system_tests/e2e/test_context_summarization_e2e.py (1)

117-117: Remove developer signature comment.

Same issue as in message_utils.py - the # Made with Bob comment should be removed from production code.

🧹 Proposed fix
 if __name__ == "__main__":
     unittest.main()
-
-# Made with Bob
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/system_tests/e2e/test_context_summarization_e2e.py` at line 117, Remove
the developer signature line "# Made with Bob" from the top of the test module
test_context_summarization_e2e.py (delete the comment line so it is not present
in production/test code); no other changes are needed.
src/cuga/backend/cuga_graph/utils/message_utils.py (1)

80-80: Remove developer signature comment.

The comment # Made with Bob appears to be a personal signature and should be removed from production code.

🧹 Proposed fix
-
-# Made with Bob
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/utils/message_utils.py` at line 80, Remove the
personal signature comment "# Made with Bob" from the top of the module in
message_utils.py; delete that standalone comment line so no developer signature
remains in production code and run formatting/linting to ensure no trailing
blank-line or style issues remain after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 940-942: The code directly assigns state.chat_messages =
temp_agent_state.chat_messages (and then computes final_message_count), which
mutates LangGraph-managed state; instead create and return a Command that
encapsulates the updated chat_messages (and any derived final_message_count if
needed) rather than mutating state in-place. Replace the direct assignment
around temp_agent_state.chat_messages in the cuga_lite_graph handler with
constructing an appropriate Command (e.g., UpdateState or similar command used
elsewhere in the graph) that carries the new chat_messages payload so the graph
runtime can apply the state change consistently.

In `@src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py`:
- Around line 438-440: The code mutates state.supervisor_chat_messages directly
(assigning temp_agent_state.chat_messages), which can bypass LangGraph's
expected state-update flow; instead, stop the in-place assignment and return the
updated messages via the node's Command update payload (use
Command.update({"supervisor_chat_messages": temp_agent_state.chat_messages})) or
the node's equivalent update mechanism so LangGraph can persist/merge the new
value; modify the handler that does the assignment (referenced by
state.supervisor_chat_messages and temp_agent_state.chat_messages) to emit that
update rather than mutating the state object directly.

In `@src/cuga/backend/cuga_graph/state/agent_state.py`:
- Around line 1057-1068: The code enables ContextSummarizer when
settings.context_summarization.enabled and model/model_name exist but doesn't
handle the case where ContextSummarizer returns middleware=None; that causes
_summarize_message_list to be a no-op and the sliding-window fallback to be
skipped. Modify the branch that constructs ContextSummarizer (in agent_state.py
around the ContextSummarizer call) to detect when summarizer.middleware is falsy
(or summarizer is unusable) and in that case set summarizer to None (or skip
using it) so the existing sliding-window codepath is used; specifically guard
uses of ContextSummarizer and calls to _summarize_message_list so they only
occur when summarizer.middleware is present, otherwise fall back to the old
windowing logic.
- Around line 1069-1085: The supervisor_chat_messages list is not being
summarized and can grow indefinitely; update the summarization path to call
self._summarize_message_list for supervisor_chat_messages (similar to
chat_messages/chat_agent_messages) inside the same block where chat_messages and
chat_agent_messages are summarized, passing the summarizer, tools, system_prompt
and an appropriate store_metrics flag (likely False) and a unique name like
"supervisor_chat_messages", so apply_message_sliding_window + summarization both
manage that history.

In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py`:
- Around line 396-414: The current loop in the context summarizer rebuilds
messages with only content, dropping metadata like tool_calls,
additional_kwargs, response_metadata, and id; update the logic that creates
sanitized messages (after convert_to_proper_message_type and
self._sanitize_content) to preserve all existing message fields by using each
LangChain message's model_copy(update={"content": sanitized_content}) instead of
constructing new AIMessage/HumanMessage/SystemMessage instances, so tool_calls
and other metadata remain available (agent_loop.py checks if msg.tool_calls).

In `@src/cuga/backend/cuga_graph/utils/message_utils.py`:
- Around line 60-68: The current ToolMessage construction in message_utils.py
only pulls tool_call_id from message.additional_kwargs, which misses cases where
the message has a direct tool_call_id attribute; update the logic to first try
getattr(message, 'tool_call_id', None) (or message.tool_call_id) and if that is
None/empty fall back to message.additional_kwargs.get('tool_call_id',
'unknown'), then pass that resolved tool_call_id into the ToolMessage(...) call
so ToolMessage receives the correct identifier.

In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 297-308: The test currently swallows failures and only prints
warnings when reading the agent checkpoint (calls to
agent.graph.checkpointer.get) so it doesn't assert that summarization reduced
history; change the two try/except blocks around agent.graph.checkpointer.get
(and subsequent access to checkpoint -> "channel_values" -> "chat_messages") to
assert or raise on exceptions instead of printing, and add definitive assertions
that summarization occurred (e.g., assert that
len(state_dict.get("chat_messages", [])) is less than the preloaded message
count and/or that the chat history contains a summary marker or removed original
messages) so the test fails if the checkpoint still contains the full preloaded
history; update both occurrences noted in the comment (around the first block
with message_count_before and the later block) to enforce these checks.

In `@src/system_tests/e2e/test_context_summarization_e2e.py`:
- Around line 25-31: The test environment variables in test_env_vars are using
keys that ContextSummarizer doesn't read
(DYNACONF_CONTEXT_SUMMARIZATION__TRIGGER_STRATEGY and
DYNACONF_CONTEXT_SUMMARIZATION__MESSAGE_COUNT_THRESHOLD), so update the mapping
to use the implementation's expected env var names (e.g., replace those two
entries with DYNACONF_CONTEXT_SUMMARIZATION__TRIGGER_MESSAGES="3"; optionally
ensure DYNACONF_CONTEXT_SUMMARIZATION__TRIGGER_FRACTION or __TRIGGER_TOKENS are
set if needed by tests) so ContextSummarizer (the component reading
trigger_messages/trigger_fraction/trigger_tokens) will pick up the intended
configuration.

In `@tests/integration/test_context_summarization.py`:
- Around line 478-484: The test sets DYNACONF_CONTEXT_SUMMARIZATION trigger env
vars but doesn't set ENABLED, making it environment-dependent; update the setup
in tests/integration/test_context_summarization.py (the block that saves
original_fraction and original_keep and calls settings.reload()) to also set
os.environ["DYNACONF_CONTEXT_SUMMARIZATION__ENABLED"] = "true" before calling
settings.reload(), and mirror the same change in the other setup location
referenced (around the block at lines 519-520) so the supervisor flow always
exercises summarization regardless of external env.

In `@tests/unit/test_context_summarizer.py`:
- Around line 480-487: The current test only asserts that len(results) +
len(errors) == 10 which passes even if all workers failed; update the assertions
in tests/unit/test_context_summarizer.py to require at least one successful
result and validate results are non-empty before iterating: keep the existing
total-count check for results + errors == 10, then add an assertion like assert
len(results) > 0 (or assert len(errors) < 10) to ensure not all workers failed,
and then continue to verify each (result, metrics) pair from results is a
non-empty list and a dict respectively; reference the variables results and
errors and the loop over for result, metrics in results to locate where to add
the new assertion.

---

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 914-920: The logger.debug message is misleading because
tools_for_later_context is explicitly set to None before logging; update the
logging to reflect the actual state by removing or changing the length check on
tools_for_later_context and instead log a clear value (e.g., "0 tools" or "no
tools") or the correct variable that may contain tools; locate the block around
tools_for_later_context, the call to manage_message_context, and the
logger.debug statement (symbols: tools_for_later_context,
manage_message_context, logger.debug, dynamic_prompt, model_name) and modify the
message to accurately report tools presence and dynamic_prompt length.

In `@src/cuga/backend/cuga_graph/utils/message_utils.py`:
- Line 80: Remove the personal signature comment "# Made with Bob" from the top
of the module in message_utils.py; delete that standalone comment line so no
developer signature remains in production code and run formatting/linting to
ensure no trailing blank-line or style issues remain after removal.

In `@src/system_tests/e2e/test_context_summarization_e2e.py`:
- Line 117: Remove the developer signature line "# Made with Bob" from the top
of the test module test_context_summarization_e2e.py (delete the comment line so
it is not present in production/test code); no other changes are needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20c42a3e-67a6-4ca6-ab9a-69f81b034fc3

📥 Commits

Reviewing files that changed from the base of the PR and between 8527c2e and efc7945.

📒 Files selected for processing (14)
  • src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/cuga_graph/utils/context_summarizer.py
  • src/cuga/backend/cuga_graph/utils/message_utils.py
  • src/cuga/backend/cuga_graph/utils/token_counter.py
  • src/cuga/backend/llm/utils/helpers.py
  • src/cuga/sdk_core/tests/test_context_summarization_sdk.py
  • src/cuga/settings.toml
  • src/system_tests/e2e/test_context_summarization_e2e.py
  • tests/integration/test_context_summarization.py
  • tests/unit/test_context_summarizer.py
  • tests/unit/test_token_counter.py

Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
… feat/context-summarization

# Conflicts:
#	tests/integration/test_context_summarization.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)

1025-1030: ⚠️ Potential issue | 🔴 Critical

The approval path loses the summarized chat history—this is a critical bug.

The approval handler is passed the original state with unsummarized state.chat_messages. When it builds and returns its Command via append_chat_messages_with_step_limit(state, ...), it appends only to the original chat history, not the summarized effective_chat_messages that was prepared for the model in the normal flow.

Required fix: Before calling check_and_create_approval_interrupt(), update state.chat_messages to modified_chat_messages (which is built from the summarized effective_chat_messages). Alternatively, pass the summarized messages explicitly to the approval handler as a separate parameter.

Current flow in call_model (lines 1025-1033)
if settings.policy.enabled:
    approval_command = await ToolApprovalHandler.check_and_create_approval_interrupt(
        state, code, content, config  # state.chat_messages still unsummarized
    )
    if approval_command:
        return approval_command  # Returns before modified_chat_messages is used

# This only runs if no approval is needed
updated_messages = modified_chat_messages + [AIMessage(content=content)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
1025 - 1030, The approval path is using the original unsummarized
state.chat_messages, so approvals lose the summarized history; before calling
ToolApprovalHandler.check_and_create_approval_interrupt (from call_model),
replace state.chat_messages with the prepared modified_chat_messages (built from
effective_chat_messages) or pass the summarized messages into
check_and_create_approval_interrupt as an extra parameter, and ensure any
internal use of append_chat_messages_with_step_limit in the approval flow
consumes the summarized modified_chat_messages (or the new parameter) rather
than the original state.chat_messages so the approval Command preserves the
summarized chat history.
🧹 Nitpick comments (1)
tests/unit/test_context_summarizer.py (1)

562-577: Assert the tool-call payload survives the summarize/restore path.

This only checks list shape, so it stays green if the retained AIMessage loses tool_calls or if the empty-content tool-call turn gets rewritten during sanitization. Given the new typing/sanitization logic, this test should assert that at least one preserved AIMessage still has the original tool_calls payload and content == "".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_context_summarizer.py` around lines 562 - 577, The test
currently only checks shape; update it to assert that the tool-call payload
survives summarize/restore by locating at least one AIMessage in result_messages
(from ContextSummarizer.summarize_messages) whose content == "" and whose
tool_calls contains the original payload (e.g. an entry with name "calculator"
and args {"expression":"2+2"} and id "call_123"); use a boolean any(...) style
check over result_messages to assert this condition so the test fails if
tool_calls are dropped or the empty-content AIMessage is rewritten.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py`:
- Around line 232-233: The guard using getattr(settings.advanced_features,
'registry', False) changes the intended default (True) when advanced_features is
partially mocked; update the checks in combined_tool_provider.py (the registry
guards around the registry-aware logic such as in get_tools() / any
registry-path branches) to either read the setting directly
(settings.advanced_features.registry) or use getattr(..., True) so the default
remains True, ensuring registry apps/tools continue to be included and tracker
tools keep their original path.

In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py`:
- Around line 377-380: The sanitization currently turns exact empty message
strings into "[empty message]", mutating legitimate empty tool calls; update
_sanitize_content() so it first checks the original content and if it is exactly
an empty string (""), return "" unchanged (so AIMessage(content="") stays
empty); otherwise perform the existing strip and only replace whitespace-only or
None-like content with the "[empty message]" placeholder; ensure callers like
_convert_messages_to_typed() will then preserve true empty tool-call turns.

In `@src/cuga/backend/cuga_graph/utils/message_utils.py`:
- Around line 45-51: The AIMessage reconstruction in message_utils.py drops
tool_calls, causing downstream code (agent_loop.py, controller.py) that accesses
msg.tool_calls to break; update the AIMessage creation in the branch handling
msg_type == 'ai' or 'AIMessage' to forward the tool_calls attribute from the
incoming message (use getattr(message, 'tool_calls', None)) when instantiating
AIMessage so tool-enabled workflows preserve tool call data.

In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 650-659: The test currently only asserts result2 is not None, so
add a negative assertion that validates isolation: after calling
agent.invoke("What's my name?") without thread_id, assert that result2.answer
does not contain "Alice" (or is equal to a phrase indicating no memory), and/or
inspect persisted state to confirm no stored name for the anonymous thread;
update the assertions around agent.invoke in this test (result1 and result2) to
explicitly check result1.answer contains "Alice" and result2.answer does NOT
contain "Alice" so failures surface if state is accidentally shared.
- Around line 60-61: Tests create a fixed thread_id ("test-context-basic") which
can collide with existing checkpoints and make assertions order-dependent;
update the test(s) that instantiate CugaAgent and set thread_id (e.g., the
occurrences around CugaAgent and the thread_id variables at the shown locations)
to generate a unique id per run using uuid4() (assign thread_id =
str(uuid.uuid4()) or equivalent) so each test uses an isolated history; apply
the same change to the other occurrences noted (lines ~120, ~282, ~489).

---

Outside diff comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1025-1030: The approval path is using the original unsummarized
state.chat_messages, so approvals lose the summarized history; before calling
ToolApprovalHandler.check_and_create_approval_interrupt (from call_model),
replace state.chat_messages with the prepared modified_chat_messages (built from
effective_chat_messages) or pass the summarized messages into
check_and_create_approval_interrupt as an extra parameter, and ensure any
internal use of append_chat_messages_with_step_limit in the approval flow
consumes the summarized modified_chat_messages (or the new parameter) rather
than the original state.chat_messages so the approval Command preserves the
summarized chat history.

---

Nitpick comments:
In `@tests/unit/test_context_summarizer.py`:
- Around line 562-577: The test currently only checks shape; update it to assert
that the tool-call payload survives summarize/restore by locating at least one
AIMessage in result_messages (from ContextSummarizer.summarize_messages) whose
content == "" and whose tool_calls contains the original payload (e.g. an entry
with name "calculator" and args {"expression":"2+2"} and id "call_123"); use a
boolean any(...) style check over result_messages to assert this condition so
the test fails if tool_calls are dropped or the empty-content AIMessage is
rewritten.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 790fe723-2064-460f-8c82-51166eca3d63

📥 Commits

Reviewing files that changed from the base of the PR and between efc7945 and 956b78e.

📒 Files selected for processing (9)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/cuga_graph/utils/context_summarizer.py
  • src/cuga/backend/cuga_graph/utils/message_utils.py
  • src/cuga/sdk_core/tests/test_context_summarization_sdk.py
  • src/system_tests/e2e/test_context_summarization_e2e.py
  • tests/integration/test_context_summarization.py
  • tests/unit/test_context_summarizer.py

Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/cuga/backend/cuga_graph/utils/context_summarizer.py (2)

443-459: Use f-string conversion flag instead of str(e).

Ruff RUF010 flags str(e) inside f-strings. The idiomatic approach is to use {e!s} or simply {e} (which implicitly calls __str__).

Suggested fix
-            raise RuntimeError(f"middleware_invocation_failed: Missing required imports - {str(e)}") from e
+            raise RuntimeError(f"middleware_invocation_failed: Missing required imports - {e}") from e
         except ValueError as e:
             logger.error(f"Value error during middleware invocation: {e}")
             logger.error(f"Traceback: {traceback.format_exc()}")
-            raise RuntimeError(f"middleware_invocation_failed: Invalid value - {str(e)}") from e
+            raise RuntimeError(f"middleware_invocation_failed: Invalid value - {e}") from e
         except AttributeError as e:
             logger.error(f"Attribute error during middleware invocation: {e}")
             logger.error(f"Traceback: {traceback.format_exc()}")
-            raise RuntimeError(f"middleware_invocation_failed: Missing attribute - {str(e)}") from e
+            raise RuntimeError(f"middleware_invocation_failed: Missing attribute - {e}") from e
         except Exception as e:
             logger.error(f"Unexpected error during middleware invocation: {e}")
             logger.error(f"Traceback: {traceback.format_exc()}")
             logger.error("Falling back to keeping recent messages")
-            raise RuntimeError(f"middleware_invocation_failed: {str(e)}") from e
+            raise RuntimeError(f"middleware_invocation_failed: {e}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py` around lines 443 -
459, The exception handlers in the middleware invocation block (the except
ImportError/ValueError/AttributeError/Exception clauses) use str(e) inside
f-strings; replace those occurrences with the f-string conversion flag (e.g.,
{e!s}) or simply {e} to satisfy RUF010. Update the raise RuntimeError(...) calls
in the middleware invocation error handling so the messages use {e!s} (or {e})
instead of str(e), keeping the same error text and chaining (from e).

56-64: Config is cached at init time; dynamic reloads won't affect existing instances.

self.config captures the configuration object at construction. If settings.reload() is called after a ContextSummarizer instance exists (e.g., in a long-lived agent), the cached reference won't reflect the new values. The tests work because they create new agents after reloading, but production use with config hot-reloading could behave unexpectedly.

Consider reading from settings.context_summarization directly in methods where dynamic config is important, or document that instances must be recreated after config changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py` around lines 56 -
64, The constructor for ContextSummarizer currently caches the config into
self.config which prevents runtime reloads from taking effect; change this by
removing the cached self.config and instead access
settings.context_summarization dynamically (or implement a config property that
returns settings.context_summarization) wherever the config is used (e.g., the
__init__ block that checks config.enabled, the _init_middleware call, and any
methods reading self.config); update references to self.config in
ContextSummarizer to use the live settings object so hot-reloads apply without
recreating instances.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Line 672: Remove the stray personal/debug comment "# Made with Bob" from the
test file so the test suite contains only relevant test code and comments;
locate the line containing that exact string in the
test_context_summarization_sdk.py test module and delete it (no other changes
needed).

---

Nitpick comments:
In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py`:
- Around line 443-459: The exception handlers in the middleware invocation block
(the except ImportError/ValueError/AttributeError/Exception clauses) use str(e)
inside f-strings; replace those occurrences with the f-string conversion flag
(e.g., {e!s}) or simply {e} to satisfy RUF010. Update the raise
RuntimeError(...) calls in the middleware invocation error handling so the
messages use {e!s} (or {e}) instead of str(e), keeping the same error text and
chaining (from e).
- Around line 56-64: The constructor for ContextSummarizer currently caches the
config into self.config which prevents runtime reloads from taking effect;
change this by removing the cached self.config and instead access
settings.context_summarization dynamically (or implement a config property that
returns settings.context_summarization) wherever the config is used (e.g., the
__init__ block that checks config.enabled, the _init_middleware call, and any
methods reading self.config); update references to self.config in
ContextSummarizer to use the live settings object so hot-reloads apply without
recreating instances.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f52f44f6-6ba5-452c-a1fa-3a06b83b25ef

📥 Commits

Reviewing files that changed from the base of the PR and between 956b78e and 04fb472.

📒 Files selected for processing (4)
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py
  • src/cuga/backend/cuga_graph/utils/context_summarizer.py
  • src/cuga/backend/cuga_graph/utils/message_utils.py
  • src/cuga/sdk_core/tests/test_context_summarization_sdk.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cuga/backend/cuga_graph/utils/message_utils.py

tracker_apps = tracker.apps

registry_apps = []
if settings.advanced_features.registry:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this, as this works by default too

Copy link
Copy Markdown
Collaborator Author

@offerakrabi offerakrabi Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sami-marreed in one of the unit tests it failed because of this

after reviewing this again, i agree, no need for this, reverted it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai
what do you think about this?

The issue is that **the test is creating a `CombinedToolProvider` directly** at [`supervisor_config.py:153-154`](src/cuga/supervisor_utils/supervisor_config.py:153-154), which then imports and uses the `settings` object from [`config.py`](src/cuga/config.py:1).

**The real problem:**

When `CombinedToolProvider.initialize()` is called in the test context, it accesses `settings.advanced_features.registry` at [`combined_tool_provider.py:233`](src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py:233) **before** the Dynaconf validators have been applied.

**Why it works locally but fails in CI:**

- **Locally**: Your environment likely has the full settings loaded with validators applied during normal app startup
- **In CI**: The test runs in isolation, and the `settings` object might be imported fresh without validators being registered/validated yet

**The actual bug:**

The code at [`supervisor_config.py:153-154`](src/cuga/supervisor_utils/supervisor_config.py:153-154) creates a `CombinedToolProvider` without ensuring the settings are properly initialized with validators. The test environment doesn't go through the normal app initialization that would call `settings.validators.validate_all()`.

**The real fix needed:**

The test (or the `load_supervisor_config` function) needs to ensure settings validators are applied before creating the `CombinedToolProvider`. This is a **test setup issue**, not a code issue - the validators exist but aren't being applied in the test environment before the settings are accessed.

The test should either:
1. Mock the settings object
2. Ensure validators are applied before calling `load_supervisor_config`
3. Or the `CombinedToolProvider` should use `getattr(settings.advanced_features, 'registry', True)` as a defensive fallback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check config.py for validators and settings setup
fd -e py 'config.py' src/cuga --exec grep -n "validators\|advanced_features\|registry\|Validator" {} +

# Check supervisor_config.py around lines 140-170
fd -e py 'supervisor_config.py' --exec sed -n '140,175p' {} +

Length of output: 7133


@offerakrabi — Your root cause analysis is largely accurate, but there's an important nuance worth highlighting.

The validate_all() timing issue is real, but narrower than stated:

Looking at config.py lines 244 and 264, validate_all() is called at module initialization level, not inside a function. This means that any import of settings from config.py should already have the registry default applied — if the module completes initialization cleanly.

The actual failure scenarios in CI are likely one of these:

  1. settings.toml is absent or inaccessible in the CI sandbox — Dynaconf can't load the file, so advanced_features might not exist as a namespace at all, and the Validator default hasn't been applied yet if validate_all() throws or is never reached.

  2. A test uses importlib.reload(config) or monkeypatch on settings — This can create a fresh settings object that bypasses the module-level validate_all() call, exactly as you described.

  3. Import order race — if combined_tool_provider.py is imported during test collection before config.py finishes initializing (e.g. circular import path), the settings object exists but validate_all() hasn't yet run.

On the three proposed fixes:

Option Tradeoff
Mock settings Cleanest for isolation; requires updating every test that creates CombinedToolProvider
Ensure validate_all() before load_supervisor_config Correct semantically, but validate_all() should already run at import — adds noise
getattr(settings.advanced_features, 'registry', True) Most defensive; handles all edge cases at the source with minimal risk

Since you've already reverted the comment-only change, Option 3 (getattr fallback) is the most pragmatic fix at the call site — it's the same pattern used in helpers.py in this very PR (getattr(model_config, 'enable_format', False)). If the settings object is fully initialized, getattr has zero overhead. If not, it safely defaults to True, matching the validator's declared default.

The test-level fixes (Options 1 & 2) are more correct architecturally, but they require discipline across all future tests that touch CombinedToolProvider. The getattr approach makes the code resilient regardless of how tests set up their environment.


✏️ Learnings added
Learnt from: offerakrabi
Repo: cuga-project/cuga-agent PR: 61
File: src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py:232-232
Timestamp: 2026-03-16T09:35:22.010Z
Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py`, `settings.advanced_features.registry` access in `CombinedToolProvider.initialize()` can fail in CI test environments if Dynaconf validators haven't been applied yet. The validator with `default=True` is declared in `config.py` (line 117) and `validate_all()` is called at module level (line 264). The preferred defensive fix is `getattr(settings.advanced_features, 'registry', True)` consistent with patterns used elsewhere in the codebase (e.g., `helpers.py`).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it and ran unit tests again, now passing
https://github.com/cuga-project/cuga-agent/actions/runs/23139403491

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +811 to +839
# Get configurable values from config
configurable = config.get("configurable", {}) if config else {}
current_callbacks = configurable.get("callbacks", base_callbacks or [])
active_model = configurable.get("llm") or base_model

# ── Context management BEFORE building messages_for_model ────────────
effective_chat_messages = state.chat_messages or []
try:
initial_message_count = len(effective_chat_messages)
model_name = getattr(active_model, 'model_name', 'gpt-4')
tools_for_later_context = None

logger.debug(
f"CugaLite: Calling manage_message_context with model_name={model_name}, "
f"0 tools, system_prompt={len(dynamic_prompt) if dynamic_prompt else 0} chars"
)

temp_agent_state = AgentState(
input="",
url="",
chat_messages=list(effective_chat_messages), # pass a copy
variables_storage=state.variables_storage,
variable_counter_state=state.variable_counter_state,
variable_creation_order=state.variable_creation_order,
)

temp_agent_state.manage_message_context(
model=active_model,
model_name=model_name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify this code to a one-liner that uses a helper. I wonder if in the current file you haven't decided yet to mutate the chat message. Maybe create a helper class that does a oneline summary. After that update, I think it's preferable to keep the logic of summarization minimal here.

@coderabbitai please advice on this

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sami-marreed, great point. The block can be dramatically reduced by extracting everything into a reusable helper. Here's the recommended approach:


Step 1 — Create a helper function in a shared utils module

Add a function like apply_context_summarization to a shared location (e.g., src/cuga/backend/cuga_graph/utils/context_management_utils.py or directly on AgentState as a @staticmethod). The helper owns:

  • the temporary AgentState construction
  • the manage_message_context call
  • metric logging and tracker recording
# src/cuga/backend/cuga_graph/utils/context_management_utils.py

from typing import Any, List, Optional
from langchain_core.messages import BaseMessage
from loguru import logger


def apply_context_summarization(
    messages: List[BaseMessage],
    model: Any,
    *,
    system_prompt: Optional[str] = None,
    tools: Optional[List[Any]] = None,
    tracker: Optional[Any] = None,
) -> List[BaseMessage]:
    """Summarize messages in-place via a temporary AgentState.

    Returns the (possibly summarized) message list; falls back to the
    original list on any error.
    """
    from cuga.backend.cuga_graph.state.agent_state import AgentState
    from cuga.backend.activity_tracker.tracker import Step
    import json

    if not messages:
        return messages

    model_name = getattr(model, 'model_name', 'gpt-4')
    try:
        temp = AgentState(input="", url="", chat_messages=list(messages))
        temp.manage_message_context(
            model=model,
            model_name=model_name,
            tools=tools,
            system_prompt=system_prompt,
        )
        summarized = temp.chat_messages or messages
        _log_and_track(messages, summarized, temp, tracker)
        return summarized
    except Exception as e:
        logger.exception(f"Context summarization failed, using original messages: {e}")
        return messages


def _log_and_track(original, summarized, temp_state, tracker):
    if len(summarized) < len(original):
        logger.info(
            f"Context summarization: {len(original)}{len(summarized)} messages"
        )
    metrics = (
        getattr(temp_state, 'last_summarization_metrics', None) or {}
    ).get('chat_messages')
    if not metrics:
        return
    logger.info(
        f"📊 Summarization: messages {metrics['before']['message_count']} → "
        f"{metrics['after']['message_count']}, "
        f"saved {metrics['tokens_saved']} tokens "
        f"({(1 - metrics['compression_ratio']):.1%} reduction)"
    )
    if tracker:
        try:
            import json
            from cuga.backend.activity_tracker.tracker import Step
            tracker.collect_step(Step(name="ContextSummarization", data=json.dumps(metrics)))
        except Exception as e:
            logger.debug(f"Failed to record summarization in tracker: {e}")

Step 2 — Reduce the call site to a one-liner

The entire ~90-line block in call_model shrinks to:

effective_chat_messages = apply_context_summarization(
    state.chat_messages or [],
    active_model,
    system_prompt=dynamic_prompt,
    tracker=tracker,
)

Step 3 — Reuse in the supervisor

The same helper can replace the equivalent block in cuga_supervisor_graph.py, so the summarization contract is defined in exactly one place.


Key benefits:

  • cuga_lite_graph.py stays focused on orchestration; summarization details live in the helper.
  • The helper is independently testable.
  • Both the lite graph and the supervisor share the same behaviour without duplication.

# CONTEXT SUMMARIZATION - Manage context before LLM invocation
# ============================================================================
try:
initial_message_count = len(state.supervisor_chat_messages or [])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)

# Perform summarization
messages, summary_metrics = summarizer.summarize_messages(messages)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make async with ainvoke internally

Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (2)

970-970: Minor: Consider iterable unpacking for list concatenation.

Static analysis suggests using [*modified_chat_messages, AIMessage(content=content)] instead of modified_chat_messages + [AIMessage(content=content)]. This is a minor style preference and doesn't affect correctness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` at line 970,
Replace the list concatenation that builds updated_messages using
modified_chat_messages + [AIMessage(content=content)] with iterable unpacking
for style consistency: construct updated_messages by unpacking
modified_chat_messages and appending AIMessage(content=content) via
[*modified_chat_messages, AIMessage(content=content)]; update the assignment
where updated_messages is defined (referencing the variables
modified_chat_messages and AIMessage(content=content) in cuga_lite_graph.py).

969-1025: Step limit logic is duplicated between code and non-code paths.

The step limit checking logic (lines 973-987 and 1011-1025) is nearly identical. The existing append_chat_messages_with_step_limit helper handles this, but it operates on state.chat_messages rather than accepting an input list.

Consider either:

  1. Extracting a helper that accepts the base message list as a parameter, or
  2. Accept the duplication as acceptable since it's localized to this function
💡 Potential refactor for step limit helper
def check_step_limit(current_step: int, max_steps: int, limit_setting: int) -> Optional[str]:
    """Check if step limit is exceeded. Returns error message if exceeded, None otherwise."""
    limit = max_steps if max_steps is not None else limit_setting
    if current_step > limit:
        return (
            f"Maximum step limit ({limit}) reached. "
            f"The task has exceeded the allowed number of execution cycles. "
            f"Please simplify your request or break it into smaller tasks."
        )
    return None

Then use in both paths:

error_msg = check_step_limit(new_step_count, max_steps, settings.advanced_features.cuga_lite_max_steps)
if error_msg:
    logger.warning(f"Step limit reached: {new_step_count}")
    error_ai_message = AIMessage(content=error_msg)
    return create_error_command(...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines
969 - 1025, Duplicate step-limit checking exists in the code and non-code
branches; refactor by extracting a helper that takes the base messages list and
new_step_count (or modify append_chat_messages_with_step_limit to accept a
messages parameter) to centralize the logic used in the blocks that build
updated_messages and call create_error_command; the helper should compute limit
using max_steps or settings.advanced_features.cuga_lite_max_steps, log the
warning with the new_step_count, construct the AIMessage error and return the
create_error_command result (or None if under limit) so both places (where
AIMessage(content=content) and planning_response are appended) call the single
helper instead of duplicating the if new_step_count > limit block.
src/cuga/backend/cuga_graph/utils/context_management_utils.py (1)

115-121: Potential KeyError if metrics structure is incomplete.

The metrics dictionary access assumes a specific structure with nested keys (before, after, message_count, token_count, etc.). If the summarizer returns partial metrics, this will raise a KeyError and the logging will fail silently due to the outer try/except.

Consider using .get() with defaults for robustness:

💡 Suggested defensive access pattern
     # Log detailed metrics
+    before = metrics.get('before', {})
+    after = metrics.get('after', {})
     logger.info(
-        f"📊 Summarization: messages {metrics['before']['message_count']} → "
-        f"{metrics['after']['message_count']}, "
-        f"saved {metrics['tokens_saved']} tokens "
+        f"📊 Summarization: messages {before.get('message_count', '?')} → "
+        f"{after.get('message_count', '?')}, "
+        f"saved {metrics.get('tokens_saved', '?')} tokens "
         f"({(1 - metrics['compression_ratio']):.1%} reduction)"
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/utils/context_management_utils.py` around lines
115 - 121, The logging block using logger.info accesses nested keys on metrics
(e.g., metrics['before']['message_count'], metrics['after']['message_count'],
metrics['tokens_saved'], metrics['compression_ratio']) which can raise KeyError
if the summarizer returns partial metrics; change the access to use safe .get()
lookups with sensible defaults (e.g., metrics.get('before',
{}).get('message_count', 0), metrics.get('after', {}).get('message_count', 0),
metrics.get('tokens_saved', 0), metrics.get('compression_ratio', 0.0')) and
format the message using those safe values so the logger.info call never fails
even when keys are missing.
src/cuga/sdk_core/tests/test_context_summarization_sdk.py (1)

380-387: Consider documenting why 60% threshold was chosen.

The assertion requires at least 3 out of 5 checks to pass. While the threshold is reasonable for handling LLM variability, a brief comment explaining the rationale would help future maintainers understand why this specific threshold was selected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py` around lines 380 -
387, Add a short explanatory comment above the pass_threshold declaration that
documents why 60% (3 out of 5) was chosen — e.g., to balance LLM variability and
preserve key info while avoiding flakiness — referencing the variables
pass_threshold, checks_passed, total_checks, and messages so future maintainers
understand the tradeoff and can adjust the threshold with context. Make the
comment one or two sentences and place it immediately above the line setting
pass_threshold.
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py (1)

288-320: Three separate iterations over chain.steps could be consolidated.

The code iterates through chain.steps three times (lines 290-295, 298-302, 305-315) to extract model, tools, and system_prompt. Consolidating these into a single loop would be more efficient and maintainable.

♻️ Consolidated extraction loop
         try:
-            # Try to extract model from chain
-            if hasattr(self.chain, 'steps'):
-                for step in self.chain.steps:
-                    if hasattr(step, 'bound') and hasattr(step.bound, 'model_name'):
-                        model = step.bound
-                        model_name = step.bound.model_name
-                        break
-
-            # Try to extract tools from chain
-            if hasattr(self.chain, 'steps'):
-                for step in self.chain.steps:
-                    if hasattr(step, 'tools'):
-                        tools_for_context = step.tools
-                        break
-
-            # Try to extract system prompt from chain
             if hasattr(self.chain, 'steps'):
                 for step in self.chain.steps:
-                    if hasattr(step, 'prompt'):
+                    # Extract model
+                    if model is None and hasattr(step, 'bound') and hasattr(step.bound, 'model_name'):
+                        model = step.bound
+                        model_name = step.bound.model_name
+                    # Extract tools
+                    if tools_for_context is None and hasattr(step, 'tools'):
+                        tools_for_context = step.tools
+                    # Extract system prompt
+                    if system_prompt_text is None and hasattr(step, 'prompt'):
                         prompt_template = step.prompt
                         if hasattr(prompt_template, 'messages'):
                             for msg_template in prompt_template.messages:
                                 if hasattr(msg_template, 'prompt') and hasattr(
                                     msg_template.prompt, 'template'
                                 ):
                                     system_prompt_text = msg_template.prompt.template
                                     break
         except Exception as e:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py` around lines
288 - 320, The three separate iterations over self.chain.steps should be merged
into a single loop that scans each step once and extracts model (step.bound and
step.bound.model_name into model/model_name), tools (step.tools into
tools_for_context), and system prompt (step.prompt -> prompt_template and
iterate prompt_template.messages to pull msg_template.prompt.template into
system_prompt_text); implement the single-pass by checking each attribute per
step, assigning when found, and breaking early when all three targets are
populated, and keep the current try/except and logger.warning behavior for
failures (references: self.chain, chain.steps, model/model_name,
tools_for_context, prompt_template, system_prompt_text).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py`:
- Around line 288-320: The three separate iterations over self.chain.steps
should be merged into a single loop that scans each step once and extracts model
(step.bound and step.bound.model_name into model/model_name), tools (step.tools
into tools_for_context), and system prompt (step.prompt -> prompt_template and
iterate prompt_template.messages to pull msg_template.prompt.template into
system_prompt_text); implement the single-pass by checking each attribute per
step, assigning when found, and breaking early when all three targets are
populated, and keep the current try/except and logger.warning behavior for
failures (references: self.chain, chain.steps, model/model_name,
tools_for_context, prompt_template, system_prompt_text).

In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Line 970: Replace the list concatenation that builds updated_messages using
modified_chat_messages + [AIMessage(content=content)] with iterable unpacking
for style consistency: construct updated_messages by unpacking
modified_chat_messages and appending AIMessage(content=content) via
[*modified_chat_messages, AIMessage(content=content)]; update the assignment
where updated_messages is defined (referencing the variables
modified_chat_messages and AIMessage(content=content) in cuga_lite_graph.py).
- Around line 969-1025: Duplicate step-limit checking exists in the code and
non-code branches; refactor by extracting a helper that takes the base messages
list and new_step_count (or modify append_chat_messages_with_step_limit to
accept a messages parameter) to centralize the logic used in the blocks that
build updated_messages and call create_error_command; the helper should compute
limit using max_steps or settings.advanced_features.cuga_lite_max_steps, log the
warning with the new_step_count, construct the AIMessage error and return the
create_error_command result (or None if under limit) so both places (where
AIMessage(content=content) and planning_response are appended) call the single
helper instead of duplicating the if new_step_count > limit block.

In `@src/cuga/backend/cuga_graph/utils/context_management_utils.py`:
- Around line 115-121: The logging block using logger.info accesses nested keys
on metrics (e.g., metrics['before']['message_count'],
metrics['after']['message_count'], metrics['tokens_saved'],
metrics['compression_ratio']) which can raise KeyError if the summarizer returns
partial metrics; change the access to use safe .get() lookups with sensible
defaults (e.g., metrics.get('before', {}).get('message_count', 0),
metrics.get('after', {}).get('message_count', 0), metrics.get('tokens_saved',
0), metrics.get('compression_ratio', 0.0')) and format the message using those
safe values so the logger.info call never fails even when keys are missing.

In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 380-387: Add a short explanatory comment above the pass_threshold
declaration that documents why 60% (3 out of 5) was chosen — e.g., to balance
LLM variability and preserve key info while avoiding flakiness — referencing the
variables pass_threshold, checks_passed, total_checks, and messages so future
maintainers understand the tradeoff and can adjust the threshold with context.
Make the comment one or two sentences and place it immediately above the line
setting pass_threshold.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea3acd66-d457-4606-9731-77120c9ee78f

📥 Commits

Reviewing files that changed from the base of the PR and between 04fb472 and 5e9d802.

📒 Files selected for processing (6)
  • src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py
  • src/cuga/backend/cuga_graph/utils/context_management_utils.py
  • src/cuga/sdk_core/tests/test_context_summarization_sdk.py

Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py (1)

328-339: ⚠️ Potential issue | 🟠 Major

The summarized history never reaches the model.

manage_message_context() operates on state.chat_messages, but Lines 333-336 still send the separate chat_messages argument into ainvoke(). If those differ—or once summarization replaces the state field—the actual LLM call still uses the unsummarized history, so this feature is effectively bypassed in regular-chat mode.

💡 Proposed fix
+        state.chat_messages = list(chat_messages)
         await state.manage_message_context(
             model=model, model_name=model_name, tools=tools_for_context, system_prompt=system_prompt_text
         )
+        effective_chat_messages = state.chat_messages or []
         logger.info("ChatAgent: manage_message_context completed successfully")

         res = await self.chain.ainvoke(
             {
-                "conversation": self.map_chat_messages(chat_messages),
+                "conversation": self.map_chat_messages(effective_chat_messages),
                 "variables_history": state.variables_manager.get_variables_summary(last_n=10),
                 "apps_list": apps_list or "No apps available",
             }
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py` around lines
328 - 339, The current call to self.chain.ainvoke uses the original
chat_messages variable, bypassing any summarization manage_message_context
applied to state.chat_messages; change the payload passed to self.chain.ainvoke
to use the updated state.chat_messages (e.g., call
self.map_chat_messages(state.chat_messages) instead of
self.map_chat_messages(chat_messages)) so the LLM receives the post-managed
history, keeping the rest of the payload (variables_history via
state.variables_manager.get_variables_summary and apps_list) unchanged.
♻️ Duplicate comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py (1)

399-411: ⚠️ Potential issue | 🟠 Major

Keep supervisor_chat_messages local until the node returns.

This still mutates state.supervisor_chat_messages in place—first with the summarized list, then again when injecting variables. LangGraph only persists returned updates, so retries/checkpoint restores can diverge from what the model actually saw. Mirror the Lite path here: build effective_chat_messages / modified_chat_messages locally and only send them back via Command.update.

Also applies to: 457-485

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py`
around lines 399 - 411, The code currently mutates
state.supervisor_chat_messages in place when calling
apply_context_summarization; instead, keep a local variable (e.g.,
effective_chat_messages or modified_chat_messages) to receive the summarized
list from apply_context_summarization and use that local variable for subsequent
injection of variables and LLM invocation, then return the updated chat messages
only via the node's Command.update so state.supervisor_chat_messages is not
mutated until the node explicitly returns; apply the same change for the other
occurrence around the 457-485 block referenced.
tests/unit/test_context_summarizer.py (1)

487-497: ⚠️ Potential issue | 🟡 Minor

This still stays green when most workers fail.

The new assertion only proves one task succeeded. If 9/10 concurrent calls raise, this test still passes and the thread-safety regression stays hidden.

🧪 Proposed fix
         # All should complete
         assert len(results) + len(errors) == 10
-
-        # Ensure at least one worker succeeded
-        assert len(results) > 0, "All workers failed - expected at least one successful result"
+        assert not errors, f"Concurrent summarize_messages calls failed: {errors!r}"
+        assert len(results) == 10
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_context_summarizer.py` around lines 487 - 497, The current
test only requires at least one successful worker, which lets runs with 9/10
failures pass; change the assertion to require a meaningful success threshold
(e.g., majority or >= 50%) by asserting len(results) > len(errors) or
len(results) >= 5 instead of assert len(results) > 0, and keep the existing
checks that iterate over the results; update the test in
test_context_summarizer.py where results and errors are asserted and validated
so the test fails when most concurrent calls raise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/test_context_summarizer.py`:
- Around line 334-346: The test incorrectly uses a bare Mock for
mock_middleware_error so hasattr(self.middleware, 'abefore_model') returns a
callable Mock and the code attempts the async path; change the mock to
explicitly provide an asynchronous abefore_model (AsyncMock) that raises or
remove abefore_model attribute so _invoke_middleware on ContextSummarizer falls
back to calling the synchronous before_model and triggers the intended
Exception("Middleware error"); update the patch of
SummarizationMiddleware/return_value accordingly to use mock_middleware_error
with either an AsyncMock abefore_model that raises or with abefore_model deleted
and a before_model that raises to exercise the before_model error handling path.

---

Outside diff comments:
In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py`:
- Around line 328-339: The current call to self.chain.ainvoke uses the original
chat_messages variable, bypassing any summarization manage_message_context
applied to state.chat_messages; change the payload passed to self.chain.ainvoke
to use the updated state.chat_messages (e.g., call
self.map_chat_messages(state.chat_messages) instead of
self.map_chat_messages(chat_messages)) so the LLM receives the post-managed
history, keeping the rest of the payload (variables_history via
state.variables_manager.get_variables_summary and apps_list) unchanged.

---

Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py`:
- Around line 399-411: The code currently mutates state.supervisor_chat_messages
in place when calling apply_context_summarization; instead, keep a local
variable (e.g., effective_chat_messages or modified_chat_messages) to receive
the summarized list from apply_context_summarization and use that local variable
for subsequent injection of variables and LLM invocation, then return the
updated chat messages only via the node's Command.update so
state.supervisor_chat_messages is not mutated until the node explicitly returns;
apply the same change for the other occurrence around the 457-485 block
referenced.

In `@tests/unit/test_context_summarizer.py`:
- Around line 487-497: The current test only requires at least one successful
worker, which lets runs with 9/10 failures pass; change the assertion to require
a meaningful success threshold (e.g., majority or >= 50%) by asserting
len(results) > len(errors) or len(results) >= 5 instead of assert len(results) >
0, and keep the existing checks that iterate over the results; update the test
in test_context_summarizer.py where results and errors are asserted and
validated so the test fails when most concurrent calls raise.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1cc37e1d-efc5-437b-82f6-5b5dea06664a

📥 Commits

Reviewing files that changed from the base of the PR and between 5e9d802 and 5b25494.

📒 Files selected for processing (8)
  • src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py
  • src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py
  • src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py
  • src/cuga/backend/cuga_graph/state/agent_state.py
  • src/cuga/backend/cuga_graph/utils/context_management_utils.py
  • src/cuga/backend/cuga_graph/utils/context_summarizer.py
  • tests/integration/test_context_summarization.py
  • tests/unit/test_context_summarizer.py

Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Comment on lines +282 to +310
# Extract model and tools from chain for context management
model = None
model_name = None
tools_for_context = None
system_prompt_text = None

try:
# Try to extract model from chain
if hasattr(self.chain, 'steps'):
for step in self.chain.steps:
if hasattr(step, 'bound') and hasattr(step.bound, 'model_name'):
model = step.bound
model_name = step.bound.model_name
break

# Try to extract tools from chain
if hasattr(self.chain, 'steps'):
for step in self.chain.steps:
if hasattr(step, 'tools'):
tools_for_context = step.tools
break

# Try to extract system prompt from chain
if hasattr(self.chain, 'steps'):
for step in self.chain.steps:
if hasattr(step, 'prompt'):
prompt_template = step.prompt
if hasattr(prompt_template, 'messages'):
for msg_template in prompt_template.messages:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this code for

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it extracts data needed for the context summarization, things like model name, tools and prompts (needed for the token estimations)

i can improve this to extract the data on the chat agent initialization.
also, i see there was a small fix i made in cuga_lite_graph.py that i forgot to apply here as well

sami-marreed and others added 7 commits March 16, 2026 23:47
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants