fix(server): support gemma-4's plain-text call:<verb>{} tool-call format#329
Closed
easel wants to merge 6 commits into
Closed
fix(server): support gemma-4's plain-text call:<verb>{} tool-call format#329easel wants to merge 6 commits into
easel wants to merge 6 commits into
Conversation
Plan + Codex review for running parse_tool_calls on accumulated
CONTENT-mode text so plain-text `call:<verb>{...}` invocations
(Gemma4) actually produce tool_use blocks instead of stop=end_turn.
Codex verdict: REVISE (residue hazard) → integrated as a new emitter-
level test guarding accumulated_text() span strip. Q4 rebuttal:
tool_allowed enforcement is already inside parse_tool_calls.
Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed sixth detection pattern, the relaxed-JSON arg parser sketch, the unit-test matrix, and codex's review (which forced reordering the new pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception of nested name/arguments-shaped args). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sixth detection pattern to `parse_tool_calls` that recognizes
the plain-text tool invocations gemma emits in chat-completion content
(`call:get_country_info{country: "France"}` /
`call:execute-bead:read-file{path: "..."}` / etc).
The 2026-05-30 gemma full bench scored forge 0/30 because every row's
output carried these `call:<verb>{...}` invocations as text rather
than structured `tool_use` content blocks. None of the existing five
envelope-shaped detectors (`<tool_call>`, `<function=...>`,
`<tool_code>`, bare JSON) match the bare `call:` shape.
The new pattern:
- Anchors on a sentinel character (whitespace, comma, semicolon,
open/close bracket, etc.) before `call:` so narrative usages like
`narrative.call:foo` don't match.
- Supports namespaced verbs (`execute-bead:read-file`,
`default_api:fetch_sales_data`) and strips the namespace before
using the verb as the ToolCall name.
- Extracts the args block via a quote- and escape-aware balanced-brace
scanner that tolerates `"`, `'`, and `` ` `` string literals and
tracks `[]` depth alongside `{}`.
- Parses the args as strict JSON first, then falls back to a relaxed
rewrite that quotes bare identifier keys and normalizes single/
backtick quoted strings to double-quoted before retrying. Malformed
args drop the single invocation without crashing or polluting other
calls.
- Runs *before* the bare-JSON sweep so that inner args of the form
`call:outer{"name": "inner", "arguments": {}}` aren't hijacked into
a spurious `inner` ToolCall by pattern Luce-Org#6.
Downstream the existing wiring takes over: SseEmitter::accumulate
already calls parse_tool_calls; a non-empty ToolCall list flips
finish_reason to `tool_calls`, which the Anthropic /v1/messages
branch maps to `stop_reason="tool_use"` with `tool_use` content
blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to
`choices[].message.tool_calls`.
The forge client-side workaround `_parse_plain_text_tool_calls`
shipping on feat/lucebox-docker (commit deba2fd) becomes redundant
once a server with this fix is deployed. It stays in place as
defense-in-depth for older deployed servers.
Test plan: 14 new C++ unit cases in test_server_unit.cpp covering
single / back-to-back / namespaced / snake- and kebab-case verbs;
tool-allowed filtering; mid-prose rejection vs. whitespace-led
acceptance; malformed args drop; inner `{}` inside string literals;
strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing;
the codex-requested inner `name`/`arguments` interception case; and
multi-line nested-array args mirroring the snapshot data. All pass
in a standalone driver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on sindri's gemma-4-26b revealed a new emission pattern: the model sometimes outputs ``_call:get_country_info{...}`` with a leading underscore. This is a SentencePiece / chat-template tokenizer artifact that became visible after bragi's channel-token routing fix (commit 4b757d1) — the underscore is residual from the tokenizer's internal serialization that earlier handling stripped. Both parsers missed these invocations: * Server-side (tool_parser.cpp:182): the sentinel character class ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added. * Client-side (forge.py:32): ``\bcall:`` requires a word boundary before ``call``, but ``_`` is a word char so ``\b`` doesn't fire between ``_`` and ``c``. Replaced with explicit lookbehind on the same sentinel set (including ``_``). Net result: ``_call:foo{...}`` now parses to a tool_use the same way ``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier would also match, but real model outputs don't emit free-form ``my_call:`` text (tool names come from request tool defs). Tests: +2 cases in test_forge_grader.py (underscore alone, mixed back-to-back with both prefixed and bare). 16 → 18 forge_grader tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Luce-Org#323's parser added pattern 6 (call:<verb>{...}) but the SSE emitter only invokes parse_tool_calls when mode_ == TOOL_BUFFER, which fires only on the literal <tool_call> XML opener. For models like gemma-4 that emit tool calls as plain text, the emitter stays in CONTENT mode and the parser is never called, so no tool_use blocks land in the response (finish_reason="stop" / stop_reason="end_turn"). Add a CONTENT-mode finalize branch that runs parse_tool_calls when the accumulated text contains a plausible `call:<verb>{` opener (checked via a cheap O(N) substring scan to avoid regex cost on no-tool responses). Matches are hoisted into tool_calls_, the covering spans are stripped from accumulated_text, and finish_reason flips to "tool_calls" so the existing Anthropic/OpenAI serialization paths emit proper tool_use content blocks. Pre-check accepts `_call:foo{` (SentencePiece underscore artifact) since `find("call:")` lands inside the `_call:` window — full validation is delegated to parse_tool_calls (tool_parser.cpp). Tests: +9 unit cases covering parsed/skipped/underscore/no-substring/ multi-call/malformed/no-double-fire-on-XML/empty-tools/preserving prior content paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
3 issues found across 6 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
# Conflicts: # server/src/server/tool_parser.cpp
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record the 2026-06-01 12:08 cron pass, including Luce-Org#329's move back to draft status, fresh containment counts, direct conflict probes for the remaining selective-port candidates, and the tmux-driven Codex Luce-Org#305 no-safe-slice report.
Three P2 issues flagged by cubic on commit 8218333: 1. tool_parser.cpp:248 — relaxed-JSON rewrite missed escaping inner `"` when normalizing single- or backtick-quoted strings. Content like `'he said "hi"'` rewrote to `"he said "hi""` (invalid JSON), silently dropping the whole tool call. Now escapes `"` to `\"` when inside a non-`"` string. 2. sse_emitter.cpp:49 — looks_like_plain_text_call() pre-check used isalpha for the first verb char, but the parser regex accepts [A-Za-z0-9_.:\-]. A `call:2nd_pass{...}` emission would pass the parser but skip the pre-check. Switched first-char test to isalnum so digit-led verbs reach the parser. 3. sse_emitter.cpp:703 — stripping accumulated_content_ desynced the Responses-format streaming finalization events (.output_text.done / .content_part.done / .completed) from the raw .delta events the client already received. Captured a responses_streamed_text snapshot at top of emit_finish before any strip, and threaded it through the four Responses finalization sites. Non-streaming accumulated_text() accessor continues to return the stripped version so the non-streaming response shape doesn't carry both text AND tool_use for the same span. Tests: +4 cases. - test_parse_call_verb_singlequote_with_inner_doublequote - test_parse_call_verb_backtick_with_inner_doublequote - test_emitter_content_mode_digit_start_verb_parsed - test_emitter_content_mode_responses_done_uses_pre_strip_text test_server_unit: 1693 -> 1707 assertions, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 2, 2026
…at/lucebox-docker Brings the post-review-feedback emitter wiring + 3 cubic P2 fixes onto feat/lucebox-docker for live testing against the lucebox-hub:cuda12 image. PR Luce-Org#329 itself remains targeting main. Folded: - e974ac3 docs(experiments): plan SSE emitter CONTENT-mode tool parse - 8218333 fix(server): wire sse_emitter to detect plain-text call:<verb>{} tools - ee9cd9e fix(server): address cubic PR Luce-Org#329 review feedback The three parser commits on the fix branch (d67a269, 8055201, 80e6e2a) have cherry-pick-equivalent counterparts already on feat/lucebox-docker (cdb8b9c, 12c50c0, 004a81b respectively) so they no-op in the merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> # Conflicts: # server/src/server/tool_parser.cpp # server/test/test_server_unit.cpp
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 2, 2026
…ebox-docker
Brings the soft-close logit-ratio peek mechanism onto feat/lucebox-docker
so the cuda12 image can be rebuilt with both the call:<verb>{} parser+
emitter fix (Luce-Org#329) AND the auto-thinking-cap dial available in a single
sweep.
Folded:
- 1552495 docs(experiments): plan soft-close thinking termination
- d799d00 feat(server): soft-close thinking termination via logit-ratio peek
Conflicts resolved:
- server/src/qwen35/qwen35_backend.cpp: do_ar_decode signature kept
HEAD's terse comment + soft-close's new bool *soft_forced_close_out
parameter.
- server/test/test_server_unit.cpp: concatenated HEAD's C2-gate tests
with soft-close's comparator/state-machine tests; merged both
RUN_TEST blocks.
Plumbing added in this merge (not on the source branch):
- DFLASH_THINK_SOFT_CLOSE_MIN_RATIO env var in entrypoint.sh, emitted
to the server CLI as --think-soft-close-min-ratio only when nonzero
(preserves byte-identical-when-disabled invariant).
- DflashRuntime.think_soft_close_min_ratio (float, default 0.0) in
lucebox types/config/docker_run so `lucebox config set
dflash.think_soft_close_min_ratio=0.5` propagates through.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Jun 3, 2026
Collaborator
Author
|
Superseded by the 8-PR split. The call:<verb>{} tool-call format is now split into:
Closing in favor of reviewing those two. 🤖 Generated with Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds server-side support for the plain-text
call:<verb>{...}tool-call format that gemma-4 (and other no-XML-template models) emits. Without this,/v1/messagesreturns the call as raw text withstop_reason: end_turninstead of structuredtool_useblocks.Two layers, both needed:
tool_parser.cppgains pattern feat: add DFlash for Windows #6 forcall:<ns>?<verb>{relaxed-JSON args}, including the_call:SentencePiece tokenizer artifact.sse_emitter.cppinvokes the parser on stream finalize when the response stayed in CONTENT mode (no<tool_call>XML opener ever arrived), hoists matches intotool_calls_, and flipsfinish_reasontotool_calls.The parser layer was previously closed as #323 (folded into #285). PR #285 has both layers; this PR is the
main-targeted equivalent so the fix can land independently of the docker-stack merge.Tests
test_server_unit.cpp(single, back-to-back, namespaced, snake/kebab, sentinel anchoring, malformed args, string-quoted braces, strict + relaxed JSON args, scrubbed cleaned_text, inner-name/argumentsinterception, multi-line nested args).call:fast-path / empty-tools skip / malformed-drop / no-double-fire on<tool_call>XML / accumulated_text scrubbed / finish_reason flipped).test_server_unit: 1693 assertions, 0 failures.Out of scope
_parse_plain_text_tool_callsdefense-in-depth stub stays for now.