Conversation
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds context summarization and token-counting utilities, integrates pre-LLM context management into ChatAgent/CugaLite/CugaSupervisor call paths, extends AgentState with manage_message_context and summarization metrics, and adds extensive tests plus configuration for context summarization and token accounting. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Node as Agent Node
participant State as AgentState
participant Counter as TokenCounter
participant Summarizer as ContextSummarizer
participant LLM as Model
Client->>Node: invoke(request)
Node->>Node: extract model, model_name, tools, system_prompt
Node->>State: manage_message_context(model, model_name, tools, system_prompt)
State->>Counter: count_total_context_tokens(messages, tools, system_prompt)
Counter-->>State: tokens, usage_pct
State->>Summarizer: should_summarize?(tokens, usage_pct, messages)
alt trigger satisfied
Summarizer->>Summarizer: invoke middleware -> summarized_messages, metrics
Summarizer-->>State: summarized_messages, metrics
else not triggered
State-->>Node: original messages
end
State-->>Node: updated chat_messages
Node->>LLM: invoke(updated messages, tools)
LLM-->>Node: response
Node-->>Client: result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)
914-920: Misleading log statement references tools that are alwaysNone.
tools_for_later_contextis hardcoded toNoneon line 914, but the log on lines 917-920 references its length as if it could have tools. This could confuse debugging.🧹 Proposed fix for clarity
# Note: In CugaLite, tools are managed via sandbox context, not state # For context summarization, we'll pass None since tools aren't directly available - tools_for_later_context = None logger.debug( f"CugaLite: Calling manage_message_context with model_name={model_name}, " - f"{len(tools_for_later_context) if tools_for_later_context else 0} tools, " + f"tools=None (managed via sandbox), " f"system_prompt={len(dynamic_prompt) if dynamic_prompt else 0} chars" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 914 - 920, The logger.debug message is misleading because tools_for_later_context is explicitly set to None before logging; update the logging to reflect the actual state by removing or changing the length check on tools_for_later_context and instead log a clear value (e.g., "0 tools" or "no tools") or the correct variable that may contain tools; locate the block around tools_for_later_context, the call to manage_message_context, and the logger.debug statement (symbols: tools_for_later_context, manage_message_context, logger.debug, dynamic_prompt, model_name) and modify the message to accurately report tools presence and dynamic_prompt length.src/system_tests/e2e/test_context_summarization_e2e.py (1)
117-117: Remove developer signature comment.Same issue as in
message_utils.py- the# Made with Bobcomment should be removed from production code.🧹 Proposed fix
if __name__ == "__main__": unittest.main() - -# Made with Bob🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/system_tests/e2e/test_context_summarization_e2e.py` at line 117, Remove the developer signature line "# Made with Bob" from the top of the test module test_context_summarization_e2e.py (delete the comment line so it is not present in production/test code); no other changes are needed.src/cuga/backend/cuga_graph/utils/message_utils.py (1)
80-80: Remove developer signature comment.The comment
# Made with Bobappears to be a personal signature and should be removed from production code.🧹 Proposed fix
- -# Made with Bob🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/utils/message_utils.py` at line 80, Remove the personal signature comment "# Made with Bob" from the top of the module in message_utils.py; delete that standalone comment line so no developer signature remains in production code and run formatting/linting to ensure no trailing blank-line or style issues remain after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 940-942: The code directly assigns state.chat_messages =
temp_agent_state.chat_messages (and then computes final_message_count), which
mutates LangGraph-managed state; instead create and return a Command that
encapsulates the updated chat_messages (and any derived final_message_count if
needed) rather than mutating state in-place. Replace the direct assignment
around temp_agent_state.chat_messages in the cuga_lite_graph handler with
constructing an appropriate Command (e.g., UpdateState or similar command used
elsewhere in the graph) that carries the new chat_messages payload so the graph
runtime can apply the state change consistently.
In `@src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py`:
- Around line 438-440: The code mutates state.supervisor_chat_messages directly
(assigning temp_agent_state.chat_messages), which can bypass LangGraph's
expected state-update flow; instead, stop the in-place assignment and return the
updated messages via the node's Command update payload (use
Command.update({"supervisor_chat_messages": temp_agent_state.chat_messages})) or
the node's equivalent update mechanism so LangGraph can persist/merge the new
value; modify the handler that does the assignment (referenced by
state.supervisor_chat_messages and temp_agent_state.chat_messages) to emit that
update rather than mutating the state object directly.
In `@src/cuga/backend/cuga_graph/state/agent_state.py`:
- Around line 1057-1068: The code enables ContextSummarizer when
settings.context_summarization.enabled and model/model_name exist but doesn't
handle the case where ContextSummarizer returns middleware=None; that causes
_summarize_message_list to be a no-op and the sliding-window fallback to be
skipped. Modify the branch that constructs ContextSummarizer (in agent_state.py
around the ContextSummarizer call) to detect when summarizer.middleware is falsy
(or summarizer is unusable) and in that case set summarizer to None (or skip
using it) so the existing sliding-window codepath is used; specifically guard
uses of ContextSummarizer and calls to _summarize_message_list so they only
occur when summarizer.middleware is present, otherwise fall back to the old
windowing logic.
- Around line 1069-1085: The supervisor_chat_messages list is not being
summarized and can grow indefinitely; update the summarization path to call
self._summarize_message_list for supervisor_chat_messages (similar to
chat_messages/chat_agent_messages) inside the same block where chat_messages and
chat_agent_messages are summarized, passing the summarizer, tools, system_prompt
and an appropriate store_metrics flag (likely False) and a unique name like
"supervisor_chat_messages", so apply_message_sliding_window + summarization both
manage that history.
In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py`:
- Around line 396-414: The current loop in the context summarizer rebuilds
messages with only content, dropping metadata like tool_calls,
additional_kwargs, response_metadata, and id; update the logic that creates
sanitized messages (after convert_to_proper_message_type and
self._sanitize_content) to preserve all existing message fields by using each
LangChain message's model_copy(update={"content": sanitized_content}) instead of
constructing new AIMessage/HumanMessage/SystemMessage instances, so tool_calls
and other metadata remain available (agent_loop.py checks if msg.tool_calls).
In `@src/cuga/backend/cuga_graph/utils/message_utils.py`:
- Around line 60-68: The current ToolMessage construction in message_utils.py
only pulls tool_call_id from message.additional_kwargs, which misses cases where
the message has a direct tool_call_id attribute; update the logic to first try
getattr(message, 'tool_call_id', None) (or message.tool_call_id) and if that is
None/empty fall back to message.additional_kwargs.get('tool_call_id',
'unknown'), then pass that resolved tool_call_id into the ToolMessage(...) call
so ToolMessage receives the correct identifier.
In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 297-308: The test currently swallows failures and only prints
warnings when reading the agent checkpoint (calls to
agent.graph.checkpointer.get) so it doesn't assert that summarization reduced
history; change the two try/except blocks around agent.graph.checkpointer.get
(and subsequent access to checkpoint -> "channel_values" -> "chat_messages") to
assert or raise on exceptions instead of printing, and add definitive assertions
that summarization occurred (e.g., assert that
len(state_dict.get("chat_messages", [])) is less than the preloaded message
count and/or that the chat history contains a summary marker or removed original
messages) so the test fails if the checkpoint still contains the full preloaded
history; update both occurrences noted in the comment (around the first block
with message_count_before and the later block) to enforce these checks.
In `@src/system_tests/e2e/test_context_summarization_e2e.py`:
- Around line 25-31: The test environment variables in test_env_vars are using
keys that ContextSummarizer doesn't read
(DYNACONF_CONTEXT_SUMMARIZATION__TRIGGER_STRATEGY and
DYNACONF_CONTEXT_SUMMARIZATION__MESSAGE_COUNT_THRESHOLD), so update the mapping
to use the implementation's expected env var names (e.g., replace those two
entries with DYNACONF_CONTEXT_SUMMARIZATION__TRIGGER_MESSAGES="3"; optionally
ensure DYNACONF_CONTEXT_SUMMARIZATION__TRIGGER_FRACTION or __TRIGGER_TOKENS are
set if needed by tests) so ContextSummarizer (the component reading
trigger_messages/trigger_fraction/trigger_tokens) will pick up the intended
configuration.
In `@tests/integration/test_context_summarization.py`:
- Around line 478-484: The test sets DYNACONF_CONTEXT_SUMMARIZATION trigger env
vars but doesn't set ENABLED, making it environment-dependent; update the setup
in tests/integration/test_context_summarization.py (the block that saves
original_fraction and original_keep and calls settings.reload()) to also set
os.environ["DYNACONF_CONTEXT_SUMMARIZATION__ENABLED"] = "true" before calling
settings.reload(), and mirror the same change in the other setup location
referenced (around the block at lines 519-520) so the supervisor flow always
exercises summarization regardless of external env.
In `@tests/unit/test_context_summarizer.py`:
- Around line 480-487: The current test only asserts that len(results) +
len(errors) == 10 which passes even if all workers failed; update the assertions
in tests/unit/test_context_summarizer.py to require at least one successful
result and validate results are non-empty before iterating: keep the existing
total-count check for results + errors == 10, then add an assertion like assert
len(results) > 0 (or assert len(errors) < 10) to ensure not all workers failed,
and then continue to verify each (result, metrics) pair from results is a
non-empty list and a dict respectively; reference the variables results and
errors and the loop over for result, metrics in results to locate where to add
the new assertion.
---
Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 914-920: The logger.debug message is misleading because
tools_for_later_context is explicitly set to None before logging; update the
logging to reflect the actual state by removing or changing the length check on
tools_for_later_context and instead log a clear value (e.g., "0 tools" or "no
tools") or the correct variable that may contain tools; locate the block around
tools_for_later_context, the call to manage_message_context, and the
logger.debug statement (symbols: tools_for_later_context,
manage_message_context, logger.debug, dynamic_prompt, model_name) and modify the
message to accurately report tools presence and dynamic_prompt length.
In `@src/cuga/backend/cuga_graph/utils/message_utils.py`:
- Line 80: Remove the personal signature comment "# Made with Bob" from the top
of the module in message_utils.py; delete that standalone comment line so no
developer signature remains in production code and run formatting/linting to
ensure no trailing blank-line or style issues remain after removal.
In `@src/system_tests/e2e/test_context_summarization_e2e.py`:
- Line 117: Remove the developer signature line "# Made with Bob" from the top
of the test module test_context_summarization_e2e.py (delete the comment line so
it is not present in production/test code); no other changes are needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 20c42a3e-67a6-4ca6-ab9a-69f81b034fc3
📒 Files selected for processing (14)
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.pysrc/cuga/backend/cuga_graph/state/agent_state.pysrc/cuga/backend/cuga_graph/utils/context_summarizer.pysrc/cuga/backend/cuga_graph/utils/message_utils.pysrc/cuga/backend/cuga_graph/utils/token_counter.pysrc/cuga/backend/llm/utils/helpers.pysrc/cuga/sdk_core/tests/test_context_summarization_sdk.pysrc/cuga/settings.tomlsrc/system_tests/e2e/test_context_summarization_e2e.pytests/integration/test_context_summarization.pytests/unit/test_context_summarizer.pytests/unit/test_token_counter.py
src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
… feat/context-summarization # Conflicts: # tests/integration/test_context_summarization.py
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (1)
1025-1030:⚠️ Potential issue | 🔴 CriticalThe approval path loses the summarized chat history—this is a critical bug.
The approval handler is passed the original
statewith unsummarizedstate.chat_messages. When it builds and returns itsCommandviaappend_chat_messages_with_step_limit(state, ...), it appends only to the original chat history, not the summarizedeffective_chat_messagesthat was prepared for the model in the normal flow.Required fix: Before calling
check_and_create_approval_interrupt(), updatestate.chat_messagestomodified_chat_messages(which is built from the summarizedeffective_chat_messages). Alternatively, pass the summarized messages explicitly to the approval handler as a separate parameter.Current flow in call_model (lines 1025-1033)
if settings.policy.enabled: approval_command = await ToolApprovalHandler.check_and_create_approval_interrupt( state, code, content, config # state.chat_messages still unsummarized ) if approval_command: return approval_command # Returns before modified_chat_messages is used # This only runs if no approval is needed updated_messages = modified_chat_messages + [AIMessage(content=content)]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 1025 - 1030, The approval path is using the original unsummarized state.chat_messages, so approvals lose the summarized history; before calling ToolApprovalHandler.check_and_create_approval_interrupt (from call_model), replace state.chat_messages with the prepared modified_chat_messages (built from effective_chat_messages) or pass the summarized messages into check_and_create_approval_interrupt as an extra parameter, and ensure any internal use of append_chat_messages_with_step_limit in the approval flow consumes the summarized modified_chat_messages (or the new parameter) rather than the original state.chat_messages so the approval Command preserves the summarized chat history.
🧹 Nitpick comments (1)
tests/unit/test_context_summarizer.py (1)
562-577: Assert the tool-call payload survives the summarize/restore path.This only checks list shape, so it stays green if the retained
AIMessagelosestool_callsor if the empty-content tool-call turn gets rewritten during sanitization. Given the new typing/sanitization logic, this test should assert that at least one preservedAIMessagestill has the originaltool_callspayload andcontent == "".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_context_summarizer.py` around lines 562 - 577, The test currently only checks shape; update it to assert that the tool-call payload survives summarize/restore by locating at least one AIMessage in result_messages (from ContextSummarizer.summarize_messages) whose content == "" and whose tool_calls contains the original payload (e.g. an entry with name "calculator" and args {"expression":"2+2"} and id "call_123"); use a boolean any(...) style check over result_messages to assert this condition so the test fails if tool_calls are dropped or the empty-content AIMessage is rewritten.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py`:
- Around line 232-233: The guard using getattr(settings.advanced_features,
'registry', False) changes the intended default (True) when advanced_features is
partially mocked; update the checks in combined_tool_provider.py (the registry
guards around the registry-aware logic such as in get_tools() / any
registry-path branches) to either read the setting directly
(settings.advanced_features.registry) or use getattr(..., True) so the default
remains True, ensuring registry apps/tools continue to be included and tracker
tools keep their original path.
In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py`:
- Around line 377-380: The sanitization currently turns exact empty message
strings into "[empty message]", mutating legitimate empty tool calls; update
_sanitize_content() so it first checks the original content and if it is exactly
an empty string (""), return "" unchanged (so AIMessage(content="") stays
empty); otherwise perform the existing strip and only replace whitespace-only or
None-like content with the "[empty message]" placeholder; ensure callers like
_convert_messages_to_typed() will then preserve true empty tool-call turns.
In `@src/cuga/backend/cuga_graph/utils/message_utils.py`:
- Around line 45-51: The AIMessage reconstruction in message_utils.py drops
tool_calls, causing downstream code (agent_loop.py, controller.py) that accesses
msg.tool_calls to break; update the AIMessage creation in the branch handling
msg_type == 'ai' or 'AIMessage' to forward the tool_calls attribute from the
incoming message (use getattr(message, 'tool_calls', None)) when instantiating
AIMessage so tool-enabled workflows preserve tool call data.
In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 650-659: The test currently only asserts result2 is not None, so
add a negative assertion that validates isolation: after calling
agent.invoke("What's my name?") without thread_id, assert that result2.answer
does not contain "Alice" (or is equal to a phrase indicating no memory), and/or
inspect persisted state to confirm no stored name for the anonymous thread;
update the assertions around agent.invoke in this test (result1 and result2) to
explicitly check result1.answer contains "Alice" and result2.answer does NOT
contain "Alice" so failures surface if state is accidentally shared.
- Around line 60-61: Tests create a fixed thread_id ("test-context-basic") which
can collide with existing checkpoints and make assertions order-dependent;
update the test(s) that instantiate CugaAgent and set thread_id (e.g., the
occurrences around CugaAgent and the thread_id variables at the shown locations)
to generate a unique id per run using uuid4() (assign thread_id =
str(uuid.uuid4()) or equivalent) so each test uses an isolated history; apply
the same change to the other occurrences noted (lines ~120, ~282, ~489).
---
Outside diff comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Around line 1025-1030: The approval path is using the original unsummarized
state.chat_messages, so approvals lose the summarized history; before calling
ToolApprovalHandler.check_and_create_approval_interrupt (from call_model),
replace state.chat_messages with the prepared modified_chat_messages (built from
effective_chat_messages) or pass the summarized messages into
check_and_create_approval_interrupt as an extra parameter, and ensure any
internal use of append_chat_messages_with_step_limit in the approval flow
consumes the summarized modified_chat_messages (or the new parameter) rather
than the original state.chat_messages so the approval Command preserves the
summarized chat history.
---
Nitpick comments:
In `@tests/unit/test_context_summarizer.py`:
- Around line 562-577: The test currently only checks shape; update it to assert
that the tool-call payload survives summarize/restore by locating at least one
AIMessage in result_messages (from ContextSummarizer.summarize_messages) whose
content == "" and whose tool_calls contains the original payload (e.g. an entry
with name "calculator" and args {"expression":"2+2"} and id "call_123"); use a
boolean any(...) style check over result_messages to assert this condition so
the test fails if tool_calls are dropped or the empty-content AIMessage is
rewritten.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 790fe723-2064-460f-8c82-51166eca3d63
📒 Files selected for processing (9)
src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/state/agent_state.pysrc/cuga/backend/cuga_graph/utils/context_summarizer.pysrc/cuga/backend/cuga_graph/utils/message_utils.pysrc/cuga/sdk_core/tests/test_context_summarization_sdk.pysrc/system_tests/e2e/test_context_summarization_e2e.pytests/integration/test_context_summarization.pytests/unit/test_context_summarizer.py
src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/cuga/backend/cuga_graph/utils/context_summarizer.py (2)
443-459: Use f-string conversion flag instead ofstr(e).Ruff RUF010 flags
str(e)inside f-strings. The idiomatic approach is to use{e!s}or simply{e}(which implicitly calls__str__).Suggested fix
- raise RuntimeError(f"middleware_invocation_failed: Missing required imports - {str(e)}") from e + raise RuntimeError(f"middleware_invocation_failed: Missing required imports - {e}") from e except ValueError as e: logger.error(f"Value error during middleware invocation: {e}") logger.error(f"Traceback: {traceback.format_exc()}") - raise RuntimeError(f"middleware_invocation_failed: Invalid value - {str(e)}") from e + raise RuntimeError(f"middleware_invocation_failed: Invalid value - {e}") from e except AttributeError as e: logger.error(f"Attribute error during middleware invocation: {e}") logger.error(f"Traceback: {traceback.format_exc()}") - raise RuntimeError(f"middleware_invocation_failed: Missing attribute - {str(e)}") from e + raise RuntimeError(f"middleware_invocation_failed: Missing attribute - {e}") from e except Exception as e: logger.error(f"Unexpected error during middleware invocation: {e}") logger.error(f"Traceback: {traceback.format_exc()}") logger.error("Falling back to keeping recent messages") - raise RuntimeError(f"middleware_invocation_failed: {str(e)}") from e + raise RuntimeError(f"middleware_invocation_failed: {e}") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py` around lines 443 - 459, The exception handlers in the middleware invocation block (the except ImportError/ValueError/AttributeError/Exception clauses) use str(e) inside f-strings; replace those occurrences with the f-string conversion flag (e.g., {e!s}) or simply {e} to satisfy RUF010. Update the raise RuntimeError(...) calls in the middleware invocation error handling so the messages use {e!s} (or {e}) instead of str(e), keeping the same error text and chaining (from e).
56-64: Config is cached at init time; dynamic reloads won't affect existing instances.
self.configcaptures the configuration object at construction. Ifsettings.reload()is called after aContextSummarizerinstance exists (e.g., in a long-lived agent), the cached reference won't reflect the new values. The tests work because they create new agents after reloading, but production use with config hot-reloading could behave unexpectedly.Consider reading from
settings.context_summarizationdirectly in methods where dynamic config is important, or document that instances must be recreated after config changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py` around lines 56 - 64, The constructor for ContextSummarizer currently caches the config into self.config which prevents runtime reloads from taking effect; change this by removing the cached self.config and instead access settings.context_summarization dynamically (or implement a config property that returns settings.context_summarization) wherever the config is used (e.g., the __init__ block that checks config.enabled, the _init_middleware call, and any methods reading self.config); update references to self.config in ContextSummarizer to use the live settings object so hot-reloads apply without recreating instances.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Line 672: Remove the stray personal/debug comment "# Made with Bob" from the
test file so the test suite contains only relevant test code and comments;
locate the line containing that exact string in the
test_context_summarization_sdk.py test module and delete it (no other changes
needed).
---
Nitpick comments:
In `@src/cuga/backend/cuga_graph/utils/context_summarizer.py`:
- Around line 443-459: The exception handlers in the middleware invocation block
(the except ImportError/ValueError/AttributeError/Exception clauses) use str(e)
inside f-strings; replace those occurrences with the f-string conversion flag
(e.g., {e!s}) or simply {e} to satisfy RUF010. Update the raise
RuntimeError(...) calls in the middleware invocation error handling so the
messages use {e!s} (or {e}) instead of str(e), keeping the same error text and
chaining (from e).
- Around line 56-64: The constructor for ContextSummarizer currently caches the
config into self.config which prevents runtime reloads from taking effect;
change this by removing the cached self.config and instead access
settings.context_summarization dynamically (or implement a config property that
returns settings.context_summarization) wherever the config is used (e.g., the
__init__ block that checks config.enabled, the _init_middleware call, and any
methods reading self.config); update references to self.config in
ContextSummarizer to use the live settings object so hot-reloads apply without
recreating instances.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f52f44f6-6ba5-452c-a1fa-3a06b83b25ef
📒 Files selected for processing (4)
src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.pysrc/cuga/backend/cuga_graph/utils/context_summarizer.pysrc/cuga/backend/cuga_graph/utils/message_utils.pysrc/cuga/sdk_core/tests/test_context_summarization_sdk.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cuga/backend/cuga_graph/utils/message_utils.py
| tracker_apps = tracker.apps | ||
|
|
||
| registry_apps = [] | ||
| if settings.advanced_features.registry: |
There was a problem hiding this comment.
why did you change this, as this works by default too
There was a problem hiding this comment.
@sami-marreed in one of the unit tests it failed because of this
after reviewing this again, i agree, no need for this, reverted it
There was a problem hiding this comment.
@coderabbitai
what do you think about this?
The issue is that **the test is creating a `CombinedToolProvider` directly** at [`supervisor_config.py:153-154`](src/cuga/supervisor_utils/supervisor_config.py:153-154), which then imports and uses the `settings` object from [`config.py`](src/cuga/config.py:1).
**The real problem:**
When `CombinedToolProvider.initialize()` is called in the test context, it accesses `settings.advanced_features.registry` at [`combined_tool_provider.py:233`](src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py:233) **before** the Dynaconf validators have been applied.
**Why it works locally but fails in CI:**
- **Locally**: Your environment likely has the full settings loaded with validators applied during normal app startup
- **In CI**: The test runs in isolation, and the `settings` object might be imported fresh without validators being registered/validated yet
**The actual bug:**
The code at [`supervisor_config.py:153-154`](src/cuga/supervisor_utils/supervisor_config.py:153-154) creates a `CombinedToolProvider` without ensuring the settings are properly initialized with validators. The test environment doesn't go through the normal app initialization that would call `settings.validators.validate_all()`.
**The real fix needed:**
The test (or the `load_supervisor_config` function) needs to ensure settings validators are applied before creating the `CombinedToolProvider`. This is a **test setup issue**, not a code issue - the validators exist but aren't being applied in the test environment before the settings are accessed.
The test should either:
1. Mock the settings object
2. Ensure validators are applied before calling `load_supervisor_config`
3. Or the `CombinedToolProvider` should use `getattr(settings.advanced_features, 'registry', True)` as a defensive fallback
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check config.py for validators and settings setup
fd -e py 'config.py' src/cuga --exec grep -n "validators\|advanced_features\|registry\|Validator" {} +
# Check supervisor_config.py around lines 140-170
fd -e py 'supervisor_config.py' --exec sed -n '140,175p' {} +
Length of output: 7133
@offerakrabi — Your root cause analysis is largely accurate, but there's an important nuance worth highlighting.
The validate_all() timing issue is real, but narrower than stated:
Looking at config.py lines 244 and 264, validate_all() is called at module initialization level, not inside a function. This means that any import of settings from config.py should already have the registry default applied — if the module completes initialization cleanly.
The actual failure scenarios in CI are likely one of these:
-
settings.tomlis absent or inaccessible in the CI sandbox — Dynaconf can't load the file, soadvanced_featuresmight not exist as a namespace at all, and theValidatordefault hasn't been applied yet ifvalidate_all()throws or is never reached. -
A test uses
importlib.reload(config)ormonkeypatchonsettings— This can create a freshsettingsobject that bypasses the module-levelvalidate_all()call, exactly as you described. -
Import order race — if
combined_tool_provider.pyis imported during test collection beforeconfig.pyfinishes initializing (e.g. circular import path), thesettingsobject exists butvalidate_all()hasn't yet run.
On the three proposed fixes:
| Option | Tradeoff |
|---|---|
Mock settings |
Cleanest for isolation; requires updating every test that creates CombinedToolProvider |
Ensure validate_all() before load_supervisor_config |
Correct semantically, but validate_all() should already run at import — adds noise |
getattr(settings.advanced_features, 'registry', True) |
Most defensive; handles all edge cases at the source with minimal risk |
Since you've already reverted the comment-only change, Option 3 (getattr fallback) is the most pragmatic fix at the call site — it's the same pattern used in helpers.py in this very PR (getattr(model_config, 'enable_format', False)). If the settings object is fully initialized, getattr has zero overhead. If not, it safely defaults to True, matching the validator's declared default.
The test-level fixes (Options 1 & 2) are more correct architecturally, but they require discipline across all future tests that touch CombinedToolProvider. The getattr approach makes the code resilient regardless of how tests set up their environment.
✏️ Learnings added
Learnt from: offerakrabi
Repo: cuga-project/cuga-agent PR: 61
File: src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py:232-232
Timestamp: 2026-03-16T09:35:22.010Z
Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.py`, `settings.advanced_features.registry` access in `CombinedToolProvider.initialize()` can fail in CI test environments if Dynaconf validators haven't been applied yet. The validator with `default=True` is declared in `config.py` (line 117) and `validate_all()` is called at module level (line 264). The preferred defensive fix is `getattr(settings.advanced_features, 'registry', True)` consistent with patterns used elsewhere in the codebase (e.g., `helpers.py`).
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
fixed it and ran unit tests again, now passing
https://github.com/cuga-project/cuga-agent/actions/runs/23139403491
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| # Get configurable values from config | ||
| configurable = config.get("configurable", {}) if config else {} | ||
| current_callbacks = configurable.get("callbacks", base_callbacks or []) | ||
| active_model = configurable.get("llm") or base_model | ||
|
|
||
| # ── Context management BEFORE building messages_for_model ──────────── | ||
| effective_chat_messages = state.chat_messages or [] | ||
| try: | ||
| initial_message_count = len(effective_chat_messages) | ||
| model_name = getattr(active_model, 'model_name', 'gpt-4') | ||
| tools_for_later_context = None | ||
|
|
||
| logger.debug( | ||
| f"CugaLite: Calling manage_message_context with model_name={model_name}, " | ||
| f"0 tools, system_prompt={len(dynamic_prompt) if dynamic_prompt else 0} chars" | ||
| ) | ||
|
|
||
| temp_agent_state = AgentState( | ||
| input="", | ||
| url="", | ||
| chat_messages=list(effective_chat_messages), # pass a copy | ||
| variables_storage=state.variables_storage, | ||
| variable_counter_state=state.variable_counter_state, | ||
| variable_creation_order=state.variable_creation_order, | ||
| ) | ||
|
|
||
| temp_agent_state.manage_message_context( | ||
| model=active_model, | ||
| model_name=model_name, |
There was a problem hiding this comment.
Please simplify this code to a one-liner that uses a helper. I wonder if in the current file you haven't decided yet to mutate the chat message. Maybe create a helper class that does a oneline summary. After that update, I think it's preferable to keep the logic of summarization minimal here.
@coderabbitai please advice on this
There was a problem hiding this comment.
@sami-marreed, great point. The block can be dramatically reduced by extracting everything into a reusable helper. Here's the recommended approach:
Step 1 — Create a helper function in a shared utils module
Add a function like apply_context_summarization to a shared location (e.g., src/cuga/backend/cuga_graph/utils/context_management_utils.py or directly on AgentState as a @staticmethod). The helper owns:
- the temporary
AgentStateconstruction - the
manage_message_contextcall - metric logging and tracker recording
# src/cuga/backend/cuga_graph/utils/context_management_utils.py
from typing import Any, List, Optional
from langchain_core.messages import BaseMessage
from loguru import logger
def apply_context_summarization(
messages: List[BaseMessage],
model: Any,
*,
system_prompt: Optional[str] = None,
tools: Optional[List[Any]] = None,
tracker: Optional[Any] = None,
) -> List[BaseMessage]:
"""Summarize messages in-place via a temporary AgentState.
Returns the (possibly summarized) message list; falls back to the
original list on any error.
"""
from cuga.backend.cuga_graph.state.agent_state import AgentState
from cuga.backend.activity_tracker.tracker import Step
import json
if not messages:
return messages
model_name = getattr(model, 'model_name', 'gpt-4')
try:
temp = AgentState(input="", url="", chat_messages=list(messages))
temp.manage_message_context(
model=model,
model_name=model_name,
tools=tools,
system_prompt=system_prompt,
)
summarized = temp.chat_messages or messages
_log_and_track(messages, summarized, temp, tracker)
return summarized
except Exception as e:
logger.exception(f"Context summarization failed, using original messages: {e}")
return messages
def _log_and_track(original, summarized, temp_state, tracker):
if len(summarized) < len(original):
logger.info(
f"Context summarization: {len(original)} → {len(summarized)} messages"
)
metrics = (
getattr(temp_state, 'last_summarization_metrics', None) or {}
).get('chat_messages')
if not metrics:
return
logger.info(
f"📊 Summarization: messages {metrics['before']['message_count']} → "
f"{metrics['after']['message_count']}, "
f"saved {metrics['tokens_saved']} tokens "
f"({(1 - metrics['compression_ratio']):.1%} reduction)"
)
if tracker:
try:
import json
from cuga.backend.activity_tracker.tracker import Step
tracker.collect_step(Step(name="ContextSummarization", data=json.dumps(metrics)))
except Exception as e:
logger.debug(f"Failed to record summarization in tracker: {e}")Step 2 — Reduce the call site to a one-liner
The entire ~90-line block in call_model shrinks to:
effective_chat_messages = apply_context_summarization(
state.chat_messages or [],
active_model,
system_prompt=dynamic_prompt,
tracker=tracker,
)Step 3 — Reuse in the supervisor
The same helper can replace the equivalent block in cuga_supervisor_graph.py, so the summarization contract is defined in exactly one place.
Key benefits:
cuga_lite_graph.pystays focused on orchestration; summarization details live in the helper.- The helper is independently testable.
- Both the lite graph and the supervisor share the same behaviour without duplication.
| # CONTEXT SUMMARIZATION - Manage context before LLM invocation | ||
| # ============================================================================ | ||
| try: | ||
| initial_message_count = len(state.supervisor_chat_messages or []) |
There was a problem hiding this comment.
same comment applies as https://github.com/cuga-project/cuga-agent/pull/61/changes#r2937317534
| ) | ||
|
|
||
| # Perform summarization | ||
| messages, summary_metrics = summarizer.summarize_messages(messages) |
There was a problem hiding this comment.
make async with ainvoke internally
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
There was a problem hiding this comment.
🧹 Nitpick comments (5)
src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py (2)
970-970: Minor: Consider iterable unpacking for list concatenation.Static analysis suggests using
[*modified_chat_messages, AIMessage(content=content)]instead ofmodified_chat_messages + [AIMessage(content=content)]. This is a minor style preference and doesn't affect correctness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` at line 970, Replace the list concatenation that builds updated_messages using modified_chat_messages + [AIMessage(content=content)] with iterable unpacking for style consistency: construct updated_messages by unpacking modified_chat_messages and appending AIMessage(content=content) via [*modified_chat_messages, AIMessage(content=content)]; update the assignment where updated_messages is defined (referencing the variables modified_chat_messages and AIMessage(content=content) in cuga_lite_graph.py).
969-1025: Step limit logic is duplicated between code and non-code paths.The step limit checking logic (lines 973-987 and 1011-1025) is nearly identical. The existing
append_chat_messages_with_step_limithelper handles this, but it operates onstate.chat_messagesrather than accepting an input list.Consider either:
- Extracting a helper that accepts the base message list as a parameter, or
- Accept the duplication as acceptable since it's localized to this function
💡 Potential refactor for step limit helper
def check_step_limit(current_step: int, max_steps: int, limit_setting: int) -> Optional[str]: """Check if step limit is exceeded. Returns error message if exceeded, None otherwise.""" limit = max_steps if max_steps is not None else limit_setting if current_step > limit: return ( f"Maximum step limit ({limit}) reached. " f"The task has exceeded the allowed number of execution cycles. " f"Please simplify your request or break it into smaller tasks." ) return NoneThen use in both paths:
error_msg = check_step_limit(new_step_count, max_steps, settings.advanced_features.cuga_lite_max_steps) if error_msg: logger.warning(f"Step limit reached: {new_step_count}") error_ai_message = AIMessage(content=error_msg) return create_error_command(...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py` around lines 969 - 1025, Duplicate step-limit checking exists in the code and non-code branches; refactor by extracting a helper that takes the base messages list and new_step_count (or modify append_chat_messages_with_step_limit to accept a messages parameter) to centralize the logic used in the blocks that build updated_messages and call create_error_command; the helper should compute limit using max_steps or settings.advanced_features.cuga_lite_max_steps, log the warning with the new_step_count, construct the AIMessage error and return the create_error_command result (or None if under limit) so both places (where AIMessage(content=content) and planning_response are appended) call the single helper instead of duplicating the if new_step_count > limit block.src/cuga/backend/cuga_graph/utils/context_management_utils.py (1)
115-121: PotentialKeyErrorif metrics structure is incomplete.The metrics dictionary access assumes a specific structure with nested keys (
before,after,message_count,token_count, etc.). If the summarizer returns partial metrics, this will raise aKeyErrorand the logging will fail silently due to the outer try/except.Consider using
.get()with defaults for robustness:💡 Suggested defensive access pattern
# Log detailed metrics + before = metrics.get('before', {}) + after = metrics.get('after', {}) logger.info( - f"📊 Summarization: messages {metrics['before']['message_count']} → " - f"{metrics['after']['message_count']}, " - f"saved {metrics['tokens_saved']} tokens " + f"📊 Summarization: messages {before.get('message_count', '?')} → " + f"{after.get('message_count', '?')}, " + f"saved {metrics.get('tokens_saved', '?')} tokens " f"({(1 - metrics['compression_ratio']):.1%} reduction)" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/utils/context_management_utils.py` around lines 115 - 121, The logging block using logger.info accesses nested keys on metrics (e.g., metrics['before']['message_count'], metrics['after']['message_count'], metrics['tokens_saved'], metrics['compression_ratio']) which can raise KeyError if the summarizer returns partial metrics; change the access to use safe .get() lookups with sensible defaults (e.g., metrics.get('before', {}).get('message_count', 0), metrics.get('after', {}).get('message_count', 0), metrics.get('tokens_saved', 0), metrics.get('compression_ratio', 0.0')) and format the message using those safe values so the logger.info call never fails even when keys are missing.src/cuga/sdk_core/tests/test_context_summarization_sdk.py (1)
380-387: Consider documenting why 60% threshold was chosen.The assertion requires at least 3 out of 5 checks to pass. While the threshold is reasonable for handling LLM variability, a brief comment explaining the rationale would help future maintainers understand why this specific threshold was selected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py` around lines 380 - 387, Add a short explanatory comment above the pass_threshold declaration that documents why 60% (3 out of 5) was chosen — e.g., to balance LLM variability and preserve key info while avoiding flakiness — referencing the variables pass_threshold, checks_passed, total_checks, and messages so future maintainers understand the tradeoff and can adjust the threshold with context. Make the comment one or two sentences and place it immediately above the line setting pass_threshold.src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py (1)
288-320: Three separate iterations overchain.stepscould be consolidated.The code iterates through
chain.stepsthree times (lines 290-295, 298-302, 305-315) to extract model, tools, and system_prompt. Consolidating these into a single loop would be more efficient and maintainable.♻️ Consolidated extraction loop
try: - # Try to extract model from chain - if hasattr(self.chain, 'steps'): - for step in self.chain.steps: - if hasattr(step, 'bound') and hasattr(step.bound, 'model_name'): - model = step.bound - model_name = step.bound.model_name - break - - # Try to extract tools from chain - if hasattr(self.chain, 'steps'): - for step in self.chain.steps: - if hasattr(step, 'tools'): - tools_for_context = step.tools - break - - # Try to extract system prompt from chain if hasattr(self.chain, 'steps'): for step in self.chain.steps: - if hasattr(step, 'prompt'): + # Extract model + if model is None and hasattr(step, 'bound') and hasattr(step.bound, 'model_name'): + model = step.bound + model_name = step.bound.model_name + # Extract tools + if tools_for_context is None and hasattr(step, 'tools'): + tools_for_context = step.tools + # Extract system prompt + if system_prompt_text is None and hasattr(step, 'prompt'): prompt_template = step.prompt if hasattr(prompt_template, 'messages'): for msg_template in prompt_template.messages: if hasattr(msg_template, 'prompt') and hasattr( msg_template.prompt, 'template' ): system_prompt_text = msg_template.prompt.template break except Exception as e:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py` around lines 288 - 320, The three separate iterations over self.chain.steps should be merged into a single loop that scans each step once and extracts model (step.bound and step.bound.model_name into model/model_name), tools (step.tools into tools_for_context), and system prompt (step.prompt -> prompt_template and iterate prompt_template.messages to pull msg_template.prompt.template into system_prompt_text); implement the single-pass by checking each attribute per step, assigning when found, and breaking early when all three targets are populated, and keep the current try/except and logger.warning behavior for failures (references: self.chain, chain.steps, model/model_name, tools_for_context, prompt_template, system_prompt_text).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py`:
- Around line 288-320: The three separate iterations over self.chain.steps
should be merged into a single loop that scans each step once and extracts model
(step.bound and step.bound.model_name into model/model_name), tools (step.tools
into tools_for_context), and system prompt (step.prompt -> prompt_template and
iterate prompt_template.messages to pull msg_template.prompt.template into
system_prompt_text); implement the single-pass by checking each attribute per
step, assigning when found, and breaking early when all three targets are
populated, and keep the current try/except and logger.warning behavior for
failures (references: self.chain, chain.steps, model/model_name,
tools_for_context, prompt_template, system_prompt_text).
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`:
- Line 970: Replace the list concatenation that builds updated_messages using
modified_chat_messages + [AIMessage(content=content)] with iterable unpacking
for style consistency: construct updated_messages by unpacking
modified_chat_messages and appending AIMessage(content=content) via
[*modified_chat_messages, AIMessage(content=content)]; update the assignment
where updated_messages is defined (referencing the variables
modified_chat_messages and AIMessage(content=content) in cuga_lite_graph.py).
- Around line 969-1025: Duplicate step-limit checking exists in the code and
non-code branches; refactor by extracting a helper that takes the base messages
list and new_step_count (or modify append_chat_messages_with_step_limit to
accept a messages parameter) to centralize the logic used in the blocks that
build updated_messages and call create_error_command; the helper should compute
limit using max_steps or settings.advanced_features.cuga_lite_max_steps, log the
warning with the new_step_count, construct the AIMessage error and return the
create_error_command result (or None if under limit) so both places (where
AIMessage(content=content) and planning_response are appended) call the single
helper instead of duplicating the if new_step_count > limit block.
In `@src/cuga/backend/cuga_graph/utils/context_management_utils.py`:
- Around line 115-121: The logging block using logger.info accesses nested keys
on metrics (e.g., metrics['before']['message_count'],
metrics['after']['message_count'], metrics['tokens_saved'],
metrics['compression_ratio']) which can raise KeyError if the summarizer returns
partial metrics; change the access to use safe .get() lookups with sensible
defaults (e.g., metrics.get('before', {}).get('message_count', 0),
metrics.get('after', {}).get('message_count', 0), metrics.get('tokens_saved',
0), metrics.get('compression_ratio', 0.0')) and format the message using those
safe values so the logger.info call never fails even when keys are missing.
In `@src/cuga/sdk_core/tests/test_context_summarization_sdk.py`:
- Around line 380-387: Add a short explanatory comment above the pass_threshold
declaration that documents why 60% (3 out of 5) was chosen — e.g., to balance
LLM variability and preserve key info while avoiding flakiness — referencing the
variables pass_threshold, checks_passed, total_checks, and messages so future
maintainers understand the tradeoff and can adjust the threshold with context.
Make the comment one or two sentences and place it immediately above the line
setting pass_threshold.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ea3acd66-d457-4606-9731-77120c9ee78f
📒 Files selected for processing (6)
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/combined_tool_provider.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.pysrc/cuga/backend/cuga_graph/utils/context_management_utils.pysrc/cuga/sdk_core/tests/test_context_summarization_sdk.py
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py (1)
328-339:⚠️ Potential issue | 🟠 MajorThe summarized history never reaches the model.
manage_message_context()operates onstate.chat_messages, but Lines 333-336 still send the separatechat_messagesargument intoainvoke(). If those differ—or once summarization replaces the state field—the actual LLM call still uses the unsummarized history, so this feature is effectively bypassed in regular-chat mode.💡 Proposed fix
+ state.chat_messages = list(chat_messages) await state.manage_message_context( model=model, model_name=model_name, tools=tools_for_context, system_prompt=system_prompt_text ) + effective_chat_messages = state.chat_messages or [] logger.info("ChatAgent: manage_message_context completed successfully") res = await self.chain.ainvoke( { - "conversation": self.map_chat_messages(chat_messages), + "conversation": self.map_chat_messages(effective_chat_messages), "variables_history": state.variables_manager.get_variables_summary(last_n=10), "apps_list": apps_list or "No apps available", } )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py` around lines 328 - 339, The current call to self.chain.ainvoke uses the original chat_messages variable, bypassing any summarization manage_message_context applied to state.chat_messages; change the payload passed to self.chain.ainvoke to use the updated state.chat_messages (e.g., call self.map_chat_messages(state.chat_messages) instead of self.map_chat_messages(chat_messages)) so the LLM receives the post-managed history, keeping the rest of the payload (variables_history via state.variables_manager.get_variables_summary and apps_list) unchanged.
♻️ Duplicate comments (2)
src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py (1)
399-411:⚠️ Potential issue | 🟠 MajorKeep
supervisor_chat_messageslocal until the node returns.This still mutates
state.supervisor_chat_messagesin place—first with the summarized list, then again when injecting variables. LangGraph only persists returned updates, so retries/checkpoint restores can diverge from what the model actually saw. Mirror the Lite path here: buildeffective_chat_messages/modified_chat_messageslocally and only send them back viaCommand.update.Also applies to: 457-485
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py` around lines 399 - 411, The code currently mutates state.supervisor_chat_messages in place when calling apply_context_summarization; instead, keep a local variable (e.g., effective_chat_messages or modified_chat_messages) to receive the summarized list from apply_context_summarization and use that local variable for subsequent injection of variables and LLM invocation, then return the updated chat messages only via the node's Command.update so state.supervisor_chat_messages is not mutated until the node explicitly returns; apply the same change for the other occurrence around the 457-485 block referenced.tests/unit/test_context_summarizer.py (1)
487-497:⚠️ Potential issue | 🟡 MinorThis still stays green when most workers fail.
The new assertion only proves one task succeeded. If 9/10 concurrent calls raise, this test still passes and the thread-safety regression stays hidden.
🧪 Proposed fix
# All should complete assert len(results) + len(errors) == 10 - - # Ensure at least one worker succeeded - assert len(results) > 0, "All workers failed - expected at least one successful result" + assert not errors, f"Concurrent summarize_messages calls failed: {errors!r}" + assert len(results) == 10🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_context_summarizer.py` around lines 487 - 497, The current test only requires at least one successful worker, which lets runs with 9/10 failures pass; change the assertion to require a meaningful success threshold (e.g., majority or >= 50%) by asserting len(results) > len(errors) or len(results) >= 5 instead of assert len(results) > 0, and keep the existing checks that iterate over the results; update the test in test_context_summarizer.py where results and errors are asserted and validated so the test fails when most concurrent calls raise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/test_context_summarizer.py`:
- Around line 334-346: The test incorrectly uses a bare Mock for
mock_middleware_error so hasattr(self.middleware, 'abefore_model') returns a
callable Mock and the code attempts the async path; change the mock to
explicitly provide an asynchronous abefore_model (AsyncMock) that raises or
remove abefore_model attribute so _invoke_middleware on ContextSummarizer falls
back to calling the synchronous before_model and triggers the intended
Exception("Middleware error"); update the patch of
SummarizationMiddleware/return_value accordingly to use mock_middleware_error
with either an AsyncMock abefore_model that raises or with abefore_model deleted
and a before_model that raises to exercise the before_model error handling path.
---
Outside diff comments:
In `@src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.py`:
- Around line 328-339: The current call to self.chain.ainvoke uses the original
chat_messages variable, bypassing any summarization manage_message_context
applied to state.chat_messages; change the payload passed to self.chain.ainvoke
to use the updated state.chat_messages (e.g., call
self.map_chat_messages(state.chat_messages) instead of
self.map_chat_messages(chat_messages)) so the LLM receives the post-managed
history, keeping the rest of the payload (variables_history via
state.variables_manager.get_variables_summary and apps_list) unchanged.
---
Duplicate comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.py`:
- Around line 399-411: The code currently mutates state.supervisor_chat_messages
in place when calling apply_context_summarization; instead, keep a local
variable (e.g., effective_chat_messages or modified_chat_messages) to receive
the summarized list from apply_context_summarization and use that local variable
for subsequent injection of variables and LLM invocation, then return the
updated chat messages only via the node's Command.update so
state.supervisor_chat_messages is not mutated until the node explicitly returns;
apply the same change for the other occurrence around the 457-485 block
referenced.
In `@tests/unit/test_context_summarizer.py`:
- Around line 487-497: The current test only requires at least one successful
worker, which lets runs with 9/10 failures pass; change the assertion to require
a meaningful success threshold (e.g., majority or >= 50%) by asserting
len(results) > len(errors) or len(results) >= 5 instead of assert len(results) >
0, and keep the existing checks that iterate over the results; update the test
in test_context_summarizer.py where results and errors are asserted and
validated so the test fails when most concurrent calls raise.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1cc37e1d-efc5-437b-82f6-5b5dea06664a
📒 Files selected for processing (8)
src/cuga/backend/cuga_graph/nodes/chat/chat_agent/chat_agent.pysrc/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.pysrc/cuga/backend/cuga_graph/nodes/cuga_supervisor/cuga_supervisor_graph.pysrc/cuga/backend/cuga_graph/state/agent_state.pysrc/cuga/backend/cuga_graph/utils/context_management_utils.pysrc/cuga/backend/cuga_graph/utils/context_summarizer.pytests/integration/test_context_summarization.pytests/unit/test_context_summarizer.py
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
| # Extract model and tools from chain for context management | ||
| model = None | ||
| model_name = None | ||
| tools_for_context = None | ||
| system_prompt_text = None | ||
|
|
||
| try: | ||
| # Try to extract model from chain | ||
| if hasattr(self.chain, 'steps'): | ||
| for step in self.chain.steps: | ||
| if hasattr(step, 'bound') and hasattr(step.bound, 'model_name'): | ||
| model = step.bound | ||
| model_name = step.bound.model_name | ||
| break | ||
|
|
||
| # Try to extract tools from chain | ||
| if hasattr(self.chain, 'steps'): | ||
| for step in self.chain.steps: | ||
| if hasattr(step, 'tools'): | ||
| tools_for_context = step.tools | ||
| break | ||
|
|
||
| # Try to extract system prompt from chain | ||
| if hasattr(self.chain, 'steps'): | ||
| for step in self.chain.steps: | ||
| if hasattr(step, 'prompt'): | ||
| prompt_template = step.prompt | ||
| if hasattr(prompt_template, 'messages'): | ||
| for msg_template in prompt_template.messages: |
There was a problem hiding this comment.
what is this code for
There was a problem hiding this comment.
here it extracts data needed for the context summarization, things like model name, tools and prompts (needed for the token estimations)
i can improve this to extract the data on the chat agent initialization.
also, i see there was a small fix i made in cuga_lite_graph.py that i forgot to apply here as well
…ontext-summarization
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
# Conflicts: # .secrets.baseline
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Signed-off-by: Offer Akrabi <offer.akrabi@il.ibm.com>
Feature Pull Request
Related Issue
Description
Add context summarization to Cuga in server(demo), SDK and Supervisor
Type of Changes
Testing
Documentation
Checklist
Summary by CodeRabbit
New Features
Improvements
Tests
Settings