You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #89 fixed the response_format empty-content bug on ChatWatsonx by switching to a prompt-based PydanticOutputParser fallback. During review it was noted that the function_calling and json_mode branches in BaseAgent.get_chain (src/cuga/backend/cuga_graph/nodes/shared/base_agent.py, ~line 167) still route through llm.with_structured_output(...), which is known-broken on vLLM versions prior to 0.8.2 (see vllm#15236 and vllm#21148).
Risk
No caller currently passes wx_json_mode="function_calling" or wx_json_mode="json_mode" explicitly for a ChatWatsonx LLM — the default is "response_format" and current overrides use "no_format". However, if a future caller opts in to either of these modes, they will silently hit the same vLLM guided-decoding failure (empty content / no tool_calls returned).
Raise a clear ValueError / NotImplementedError when wx_json_mode is "function_calling" or "json_mode" on ChatWatsonx, so callers get an explicit error rather than a silent failure.
This fix should be reverted or revisited once IBM upgrades the deployed vLLM to a fixed version (0.8.2+).
Background
PR #89 fixed the
response_formatempty-content bug onChatWatsonxby switching to a prompt-basedPydanticOutputParserfallback. During review it was noted that thefunction_callingandjson_modebranches inBaseAgent.get_chain(src/cuga/backend/cuga_graph/nodes/shared/base_agent.py, ~line 167) still route throughllm.with_structured_output(...), which is known-broken on vLLM versions prior to 0.8.2 (see vllm#15236 and vllm#21148).Risk
No caller currently passes
wx_json_mode="function_calling"orwx_json_mode="json_mode"explicitly for aChatWatsonxLLM — the default is"response_format"and current overrides use"no_format". However, if a future caller opts in to either of these modes, they will silently hit the same vLLM guided-decoding failure (empty content / no tool_calls returned).Suggested Fix
Either:
"function_calling"and"json_mode"through the sameprompt_template | llm | parserfallback with.with_retry(stop_after_attempt=3)(same approach as the"response_format"fix in PR fix(watsonx): use prompt-based JSON parsing instead of response_format #89), orValueError/NotImplementedErrorwhenwx_json_modeis"function_calling"or"json_mode"onChatWatsonx, so callers get an explicit error rather than a silent failure.This fix should be reverted or revisited once IBM upgrades the deployed vLLM to a fixed version (0.8.2+).
References
response_formatmode): fix(watsonx): use prompt-based JSON parsing instead of response_format #89