fix(sglang): parse tool_call function arguments before applying the chat template by pos-ei-don · Pull Request #10558 · mudler/LocalAI

pos-ei-don · 2026-06-27T17:28:57Z

What

Apply the same function.arguments pre-parsing in the sglang backend that the vllm backend received in #10256.

Why

OpenAI wire format carries function.arguments as a JSON-encoded string, but chat templates (e.g. Qwen3-Coder) iterate over it as a mapping. The vllm backend was already fixed (#10256); the sglang backend still serialised the assistant's tool_calls straight through after json.loads, leaving function.arguments as a string.

Without this fix the second turn of any tool-using session crashes inside the tokenizer's Jinja chat template:

TypeError: Can only get item pairs from a mapping.
  File ".../backends/cuda13-nvidia-l4t-arm64-sglang/backend.py", line 343, in _build_prompt
      return self.tokenizer.apply_chat_template(messages_dicts, **template_kwargs)
  File ".../transformers/utils/chat_template_utils.py", line 573, in render_jinja_template
  File ".../jinja2/filters.py", line 249, in do_items
      raise TypeError("Can only get item pairs from a mapping.")

This is the deterministic outcome of the standard OpenAI turn structure once a tool is involved:

[
  {"role": "user",      "content": "..."},
  {"role": "assistant", "content": null, "tool_calls": [{"id": "...", "type": "function",
                                                          "function": {"name": "calc",
                                                                       "arguments": "{\"x\":1}"}}]},
  {"role": "tool",      "tool_call_id": "...", "content": "..."}
]

The template tries for k,v in tool_calls[0].function.arguments.items() and fails because arguments is still the raw JSON string from the wire.

Reproducer (curl)

Two requests against any sglang-backed model with a tool-aware chat template:

# 1) ask for a tool call → returns {"finish_reason":"tool_calls", "tool_calls":[{...}]}
curl -s http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' \
  -d '{"model":"<model>","messages":[{"role":"user","content":"7*8?"}],
       "tools":[{"type":"function","function":{"name":"calc","description":"eval","parameters":{
                  "type":"object","properties":{"expression":{"type":"string"}},"required":["expression"]}}}]}'

# 2) post the tool result → 500: "TypeError: Can only get item pairs from a mapping."
curl -s http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' \
  -d '{"model":"<model>","messages":[
        {"role":"user","content":"7*8?"},
        {"role":"assistant","content":null,
         "tool_calls":[{"id":"call_x","type":"function",
                        "function":{"name":"calc","arguments":"{\"expression\":\"7*8\"}"}}]},
        {"role":"tool","tool_call_id":"call_x","content":"56"}]}'

With this patch applied, request (2) returns http=200 and finish_reason:"stop" with the assistant's follow-up answer.

Tested on

LocalAI v4.5.4 (image: localai/localai:v4.5.4-nvidia-l4t-arm64-cuda-13)
sglang backend cuda13-nvidia-l4t-arm64-sglang (lmsysorg/sglang v0.5.14)
Model: saricles/Qwen3-Coder-Next-NVFP4-GB10 (W4A4 NVFP4 / compressed-tensors)
Hardware: NVIDIA DGX Spark (GB10, sm_121)

Same coder.yaml, only backend: swapped between cuda13-nvidia-l4t-arm64-vllm and cuda13-nvidia-l4t-arm64-sglang:

vllm backend: tool roundtrip OK (because fix(vllm): parse tool_call function arguments before applying the chat template #10256 is applied there)
sglang backend before patch: http=500 TypeError
sglang backend after patch: http=200 finish_reason:"stop"

Risk

Pure-Python, ~17-line change inside _messages_to_dicts, gated on isinstance(...) checks. No behaviour change for requests without tool_calls. No change to template kwargs, parsers, or the sglang engine path.

Cross-refs

Mirrors fix landed in vllm backend: fix(vllm): parse tool_call function arguments before applying the chat template #10256 — same wire-format reason, same shape of fix.
Reported in our project tracker as a hard blocker for sglang-backed tool sessions.

…hat template OpenAI wire format carries `function.arguments` as a JSON-encoded string, but chat templates (e.g. Qwen3-Coder) iterate over it as a mapping. The vllm backend already parses arguments before applying the chat template (PR mudler#10256); this mirrors that fix in the sglang backend. Without this fix the second turn of any tool-using session (assistant returns tool_calls, user posts `role:"tool"` result, model is invoked with arguments still as a string) crashes inside transformers' Jinja chat-template rendering with: TypeError: Can only get item pairs from a mapping. File ".../transformers/utils/chat_template_utils.py", in render_jinja_template File ".../jinja2/filters.py", in do_items raise TypeError("Can only get item pairs from a mapping.") Reproduced on `lmsysorg/sglang:v0.5.14` via LocalAI v4.5.4 with `saricles/Qwen3-Coder-Next-NVFP4-GB10` (W4A4 NVFP4 / compressed-tensors) on NVIDIA DGX Spark (GB10, sm_121). After the patch, a tool-call roundtrip (assistant tool_calls -> tool result -> assistant final answer) returns http=200 with the expected follow-up content; no behaviour change on requests that don't carry tool_calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sglang): parse tool_call function arguments before applying the chat template#10558

fix(sglang): parse tool_call function arguments before applying the chat template#10558
pos-ei-don wants to merge 1 commit into
mudler:masterfrom
pos-ei-don:fix-sglang-tool-call-arguments-parsing

pos-ei-don commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pos-ei-don commented Jun 27, 2026

What

Why

Reproducer (curl)

Tested on

Risk

Cross-refs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant