fix(sglang): parse tool_call function arguments before applying the chat template#10558
Open
pos-ei-don wants to merge 1 commit into
Open
fix(sglang): parse tool_call function arguments before applying the chat template#10558pos-ei-don wants to merge 1 commit into
pos-ei-don wants to merge 1 commit into
Conversation
…hat template OpenAI wire format carries `function.arguments` as a JSON-encoded string, but chat templates (e.g. Qwen3-Coder) iterate over it as a mapping. The vllm backend already parses arguments before applying the chat template (PR mudler#10256); this mirrors that fix in the sglang backend. Without this fix the second turn of any tool-using session (assistant returns tool_calls, user posts `role:"tool"` result, model is invoked with arguments still as a string) crashes inside transformers' Jinja chat-template rendering with: TypeError: Can only get item pairs from a mapping. File ".../transformers/utils/chat_template_utils.py", in render_jinja_template File ".../jinja2/filters.py", in do_items raise TypeError("Can only get item pairs from a mapping.") Reproduced on `lmsysorg/sglang:v0.5.14` via LocalAI v4.5.4 with `saricles/Qwen3-Coder-Next-NVFP4-GB10` (W4A4 NVFP4 / compressed-tensors) on NVIDIA DGX Spark (GB10, sm_121). After the patch, a tool-call roundtrip (assistant tool_calls -> tool result -> assistant final answer) returns http=200 with the expected follow-up content; no behaviour change on requests that don't carry tool_calls.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Apply the same
function.argumentspre-parsing in the sglang backend that the vllm backend received in #10256.Why
OpenAI wire format carries function.arguments as a JSON-encoded string, but chat templates (e.g. Qwen3-Coder) iterate over it as a mapping. The vllm backend was already fixed (#10256); the sglang backend still serialised the assistant'stool_callsstraight through afterjson.loads, leavingfunction.argumentsas a string.Without this fix the second turn of any tool-using session crashes inside the tokenizer's Jinja chat template:
This is the deterministic outcome of the standard OpenAI turn structure once a tool is involved:
[ {"role": "user", "content": "..."}, {"role": "assistant", "content": null, "tool_calls": [{"id": "...", "type": "function", "function": {"name": "calc", "arguments": "{\"x\":1}"}}]}, {"role": "tool", "tool_call_id": "...", "content": "..."} ]The template tries
for k,v in tool_calls[0].function.arguments.items()and fails becauseargumentsis still the raw JSON string from the wire.Reproducer (curl)
Two requests against any sglang-backed model with a tool-aware chat template:
With this patch applied, request (2) returns
http=200andfinish_reason:"stop"with the assistant's follow-up answer.Tested on
localai/localai:v4.5.4-nvidia-l4t-arm64-cuda-13)cuda13-nvidia-l4t-arm64-sglang(lmsysorg/sglang v0.5.14)saricles/Qwen3-Coder-Next-NVFP4-GB10(W4A4 NVFP4 / compressed-tensors)Same
coder.yaml, onlybackend:swapped betweencuda13-nvidia-l4t-arm64-vllmandcuda13-nvidia-l4t-arm64-sglang:http=500 TypeErrorhttp=200 finish_reason:"stop"Risk
Pure-Python, ~17-line change inside
_messages_to_dicts, gated onisinstance(...)checks. No behaviour change for requests withouttool_calls. No change to template kwargs, parsers, or the sglang engine path.Cross-refs