Skip to content

fix(sglang): parse tool_call function arguments before applying the chat template#10558

Open
pos-ei-don wants to merge 1 commit into
mudler:masterfrom
pos-ei-don:fix-sglang-tool-call-arguments-parsing
Open

fix(sglang): parse tool_call function arguments before applying the chat template#10558
pos-ei-don wants to merge 1 commit into
mudler:masterfrom
pos-ei-don:fix-sglang-tool-call-arguments-parsing

Conversation

@pos-ei-don

Copy link
Copy Markdown
Contributor

What

Apply the same function.arguments pre-parsing in the sglang backend that the vllm backend received in #10256.

Why

OpenAI wire format carries function.arguments as a JSON-encoded string, but chat templates (e.g. Qwen3-Coder) iterate over it as a mapping. The vllm backend was already fixed (#10256); the sglang backend still serialised the assistant's tool_calls straight through after json.loads, leaving function.arguments as a string.

Without this fix the second turn of any tool-using session crashes inside the tokenizer's Jinja chat template:

TypeError: Can only get item pairs from a mapping.
  File ".../backends/cuda13-nvidia-l4t-arm64-sglang/backend.py", line 343, in _build_prompt
      return self.tokenizer.apply_chat_template(messages_dicts, **template_kwargs)
  File ".../transformers/utils/chat_template_utils.py", line 573, in render_jinja_template
  File ".../jinja2/filters.py", line 249, in do_items
      raise TypeError("Can only get item pairs from a mapping.")

This is the deterministic outcome of the standard OpenAI turn structure once a tool is involved:

[
  {"role": "user",      "content": "..."},
  {"role": "assistant", "content": null, "tool_calls": [{"id": "...", "type": "function",
                                                          "function": {"name": "calc",
                                                                       "arguments": "{\"x\":1}"}}]},
  {"role": "tool",      "tool_call_id": "...", "content": "..."}
]

The template tries for k,v in tool_calls[0].function.arguments.items() and fails because arguments is still the raw JSON string from the wire.

Reproducer (curl)

Two requests against any sglang-backed model with a tool-aware chat template:

# 1) ask for a tool call → returns {"finish_reason":"tool_calls", "tool_calls":[{...}]}
curl -s http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' \
  -d '{"model":"<model>","messages":[{"role":"user","content":"7*8?"}],
       "tools":[{"type":"function","function":{"name":"calc","description":"eval","parameters":{
                  "type":"object","properties":{"expression":{"type":"string"}},"required":["expression"]}}}]}'

# 2) post the tool result → 500: "TypeError: Can only get item pairs from a mapping."
curl -s http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' \
  -d '{"model":"<model>","messages":[
        {"role":"user","content":"7*8?"},
        {"role":"assistant","content":null,
         "tool_calls":[{"id":"call_x","type":"function",
                        "function":{"name":"calc","arguments":"{\"expression\":\"7*8\"}"}}]},
        {"role":"tool","tool_call_id":"call_x","content":"56"}]}'

With this patch applied, request (2) returns http=200 and finish_reason:"stop" with the assistant's follow-up answer.

Tested on

  • LocalAI v4.5.4 (image: localai/localai:v4.5.4-nvidia-l4t-arm64-cuda-13)
  • sglang backend cuda13-nvidia-l4t-arm64-sglang (lmsysorg/sglang v0.5.14)
  • Model: saricles/Qwen3-Coder-Next-NVFP4-GB10 (W4A4 NVFP4 / compressed-tensors)
  • Hardware: NVIDIA DGX Spark (GB10, sm_121)

Same coder.yaml, only backend: swapped between cuda13-nvidia-l4t-arm64-vllm and cuda13-nvidia-l4t-arm64-sglang:

Risk

Pure-Python, ~17-line change inside _messages_to_dicts, gated on isinstance(...) checks. No behaviour change for requests without tool_calls. No change to template kwargs, parsers, or the sglang engine path.

Cross-refs

…hat template

OpenAI wire format carries `function.arguments` as a JSON-encoded string,
but chat templates (e.g. Qwen3-Coder) iterate over it as a mapping. The
vllm backend already parses arguments before applying the chat template
(PR mudler#10256); this mirrors that fix in the sglang backend.

Without this fix the second turn of any tool-using session (assistant
returns tool_calls, user posts `role:"tool"` result, model is invoked
with arguments still as a string) crashes inside transformers' Jinja
chat-template rendering with:

  TypeError: Can only get item pairs from a mapping.
  File ".../transformers/utils/chat_template_utils.py", in render_jinja_template
  File ".../jinja2/filters.py", in do_items
      raise TypeError("Can only get item pairs from a mapping.")

Reproduced on `lmsysorg/sglang:v0.5.14` via LocalAI v4.5.4 with
`saricles/Qwen3-Coder-Next-NVFP4-GB10` (W4A4 NVFP4 / compressed-tensors)
on NVIDIA DGX Spark (GB10, sm_121).

After the patch, a tool-call roundtrip (assistant tool_calls -> tool
result -> assistant final answer) returns http=200 with the expected
follow-up content; no behaviour change on requests that don't carry
tool_calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant