Skip to content

fix(watsonx): guard function_calling and json_mode against vLLM guided-decoding failure #90

@coderabbitai

Description

@coderabbitai

Background

PR #89 fixed the response_format empty-content bug on ChatWatsonx by switching to a prompt-based PydanticOutputParser fallback. During review it was noted that the function_calling and json_mode branches in BaseAgent.get_chain (src/cuga/backend/cuga_graph/nodes/shared/base_agent.py, ~line 167) still route through llm.with_structured_output(...), which is known-broken on vLLM versions prior to 0.8.2 (see vllm#15236 and vllm#21148).

Risk

No caller currently passes wx_json_mode="function_calling" or wx_json_mode="json_mode" explicitly for a ChatWatsonx LLM — the default is "response_format" and current overrides use "no_format". However, if a future caller opts in to either of these modes, they will silently hit the same vLLM guided-decoding failure (empty content / no tool_calls returned).

Suggested Fix

Either:

  1. Route "function_calling" and "json_mode" through the same prompt_template | llm | parser fallback with .with_retry(stop_after_attempt=3) (same approach as the "response_format" fix in PR fix(watsonx): use prompt-based JSON parsing instead of response_format #89), or
  2. Raise a clear ValueError / NotImplementedError when wx_json_mode is "function_calling" or "json_mode" on ChatWatsonx, so callers get an explicit error rather than a silent failure.

This fix should be reverted or revisited once IBM upgrades the deployed vLLM to a fixed version (0.8.2+).

References

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcomponent: wxowatsonx Orchestrate integrationpriority: criticalMust be addressed immediately

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions