Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions src/cuga/backend/cuga_graph/nodes/shared/base_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,20 @@ def get_chain(
schema = APIPlannerOutputWX
parser = PydanticOutputParser(pydantic_object=schema)
if wx_json_mode == "response_format":
# vLLM's xgrammar guided decoding fails to compile an FSM for
# schemas with $defs/$ref (vllm#21148), returning empty content
# (completion_tokens~=2, finish_reason=stop). Only apply the
# prompt-based fallback when the schema actually has that shape;
# flat schemas (e.g. PlanControllerOutput, NextAgentPlan) keep
# working under guided decoding and are left on the existing
# with_structured_output path.
if "$defs" in schema.model_json_schema():
logger.debug(
"Schema has $defs/$ref; using prompt-based JSON parsing "
"for watsonx (response_format triggers empty content)"
)
chain = prompt_template | llm | parser
return chain.with_retry(stop_after_attempt=3)
return BaseAgent.create_validated_structured_output_chain(llm, schema, prompt_template)
elif wx_json_mode == "function_calling" or wx_json_mode == "json_mode":
chain = prompt_template | llm.with_structured_output(schema, method=wx_json_mode)
Comment on lines 157 to 174
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't keep function_calling on the known-broken watsonx path.

The new inline note says watsonx function_calling returns no tool_calls, but Line 167 still routes that mode through llm.with_structured_output(...). Any caller that explicitly selects wx_json_mode="function_calling" will keep seeing the old failure. Either send that mode through the same prompt+parser fallback or reject it with a clear error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cuga/backend/cuga_graph/nodes/shared/base_agent.py` around lines 157 -
168, The branch still sends wx_json_mode == "function_calling" into
llm.with_structured_output which is known-broken for watsonx; update the
conditional logic in the wx_json_mode handling (the branch around wx_json_mode,
llm.with_structured_output, prompt_template, parser and chain) so that
"function_calling" is not routed into llm.with_structured_output — either treat
"function_calling" the same as "response_format" by using the prompt_template |
llm | parser fallback with chain.with_retry(...) or explicitly raise a clear
error when wx_json_mode == "function_calling", ensuring callers cannot
accidentally get routed through llm.with_structured_output.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point. However, no caller in the codebase explicitly passes wx_json_mode="function_calling" for watsonx today — the default is "response_format" and the only explicit overrides are "no_format" (for code_agent, reflection, etc.).

The function_calling and json_mode branches at line 167 are pre-existing code that was not modified by this PR. They would only be hit if a caller explicitly opts in.

That said, if someone does explicitly pass function_calling for watsonx, they'd hit the same vLLM issue. We can address that in a follow-up if needed, but changing pre-existing untouched branches is out of scope for this fix.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haroldship, that's a fair and reasonable scoping decision — the current PR is a focused fix for the response_format empty-content bug, and touching untouched pre-existing branches would expand its scope unnecessarily.

Would you like me to open a follow-up GitHub issue to track the risk that wx_json_mode="function_calling" (and "json_mode") on ChatWatsonx will hit the same vLLM guided-decoding failure if a caller ever opts in? That way it's documented and won't be forgotten.


🧠 Learnings used
Learnt from: haroldship
Repo: cuga-project/cuga-agent PR: 67
File: src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:377-397
Timestamp: 2026-03-23T08:37:18.878Z
Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`, when `PromptUtils.find_tools` raises an `OutputParserException` or other exception (i.e., the shortlister LLM returns malformed/empty JSON), the correct behaviour is to return a short user-facing error string to the agent — NOT a fallback tool list. This was an explicit design decision: returning all tools on LLM failure bloats the agent context and masks bugs; the error string lets the agent LLM decide whether to retry or proceed. This is distinct from the soft-filtering fallback (which applies when `filter_criteria` is provided but no tools match the criteria).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes open the issue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

gh issue create \
  --repo cuga-project/cuga-agent \
  --title "fix(watsonx): guard function_calling and json_mode against vLLM guided-decoding failure" \
  --body "## Background

PR `#89` fixed the \`response_format\` empty-content bug on \`ChatWatsonx\` by switching to a prompt-based \`PydanticOutputParser\` fallback. During review it was noted that the \`function_calling\` and \`json_mode\` branches in \`BaseAgent.get_chain\` (\`src/cuga/backend/cuga_graph/nodes/shared/base_agent.py\`, ~line 167) still route through \`llm.with_structured_output(...)\`, which is known-broken on vLLM versions prior to 0.8.2 (see [vllm#15236](https://github.com/vllm-project/vllm/issues/15236) and [vllm#21148](https://github.com/vllm-project/vllm/issues/21148)).

## Risk

No caller currently passes \`wx_json_mode=\"function_calling\"\` or \`wx_json_mode=\"json_mode\"\` explicitly for a \`ChatWatsonx\` LLM — the default is \`\"response_format\"\` and current overrides use \`\"no_format\"\`. However, if a future caller opts in to either of these modes, they will silently hit the same vLLM guided-decoding failure (empty content / no tool_calls returned).

## Suggested Fix

Either:
1. Route \`\"function_calling\"\` and \`\"json_mode\"\` through the same \`prompt_template | llm | parser\` fallback with \`.with_retry(stop_after_attempt=3)\` (same approach as the \`\"response_format\"\` fix in PR `#89`), **or**
2. Raise a clear \`ValueError\` / \`NotImplementedError\` when \`wx_json_mode\` is \`\"function_calling\"\` or \`\"json_mode\"\` on \`ChatWatsonx\`, so callers get an explicit error rather than a silent failure.

This fix should be **reverted or revisited** once IBM upgrades the deployed vLLM to a fixed version (0.8.2+).

## References

- PR `#89` (fix for \`response_format\` mode): https://github.com/cuga-project/cuga-agent/pull/89
- Review comment: https://github.com/cuga-project/cuga-agent/pull/89#discussion_r2996097287
- Issue `#88` (original empty-content bug): https://github.com/cuga-project/cuga-agent/issues/88
- Requested by: `@haroldship`
" \
  --assignee haroldship

Length of output: 52


@haroldship, done! Follow-up issue #90 has been opened and assigned to you:

👉 #90

It documents the latent risk that wx_json_mode="function_calling" and "json_mode" on ChatWatsonx will silently hit the same vLLM guided-decoding failure, links back to this PR and issue #88, and proposes the two remediation options (prompt+parser fallback or an explicit error). The note to revisit once IBM upgrades to vLLM 0.8.2+ is included as well.


🧠 Learnings used
Learnt from: haroldship
Repo: cuga-project/cuga-agent PR: 67
File: src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py:377-397
Timestamp: 2026-03-23T08:37:18.878Z
Learning: In `src/cuga/backend/cuga_graph/nodes/cuga_lite/cuga_lite_graph.py`, when `PromptUtils.find_tools` raises an `OutputParserException` or other exception (i.e., the shortlister LLM returns malformed/empty JSON), the correct behaviour is to return a short user-facing error string to the agent — NOT a fallback tool list. This was an explicit design decision: returning all tools on LLM failure bloats the agent context and masks bugs; the error string lets the agent LLM decide whether to retry or proceed. This is distinct from the soft-filtering fallback (which applies when `filter_criteria` is provided but no tools match the criteria).

Expand Down
Loading