fix(server): support gemma-4's plain-text call:<verb>{} tool-call format by easel · Pull Request #329 · Luce-Org/lucebox-hub

easel · 2026-06-01T13:42:20Z

Summary

Adds server-side support for the plain-text call:<verb>{...} tool-call format that gemma-4 (and other no-XML-template models) emits. Without this, /v1/messages returns the call as raw text with stop_reason: end_turn instead of structured tool_use blocks.

Two layers, both needed:

Parser — tool_parser.cpp gains pattern feat: add DFlash for Windows #6 for call:<ns>?<verb>{relaxed-JSON args}, including the _call: SentencePiece tokenizer artifact.
Emitter — sse_emitter.cpp invokes the parser on stream finalize when the response stayed in CONTENT mode (no <tool_call> XML opener ever arrived), hoists matches into tool_calls_, and flips finish_reason to tool_calls.

The parser layer was previously closed as #323 (folded into #285). PR #285 has both layers; this PR is the main-targeted equivalent so the fix can land independently of the docker-stack merge.

Tests

14 parser unit tests in test_server_unit.cpp (single, back-to-back, namespaced, snake/kebab, sentinel anchoring, malformed args, string-quoted braces, strict + relaxed JSON args, scrubbed cleaned_text, inner-name/arguments interception, multi-line nested args).
9 emitter unit tests (parsed / underscore prefix / multi-call / no-call: fast-path / empty-tools skip / malformed-drop / no-double-fire on <tool_call> XML / accumulated_text scrubbed / finish_reason flipped).
Full test_server_unit: 1693 assertions, 0 failures.

Out of scope

Docker image rebuild + live e2e validation (operator post-merge).
The forge-side _parse_plain_text_tool_calls defense-in-depth stub stays for now.

Plan + Codex review for running parse_tool_calls on accumulated CONTENT-mode text so plain-text `call:<verb>{...}` invocations (Gemma4) actually produce tool_use blocks instead of stop=end_turn. Codex verdict: REVISE (residue hazard) → integrated as a new emitter- level test guarding accumulated_text() span strip. Q4 rebuttal: tool_allowed enforcement is already inside parse_tool_calls.

Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed sixth detection pattern, the relaxed-JSON arg parser sketch, the unit-test matrix, and codex's review (which forced reordering the new pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception of nested name/arguments-shaped args). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a sixth detection pattern to `parse_tool_calls` that recognizes the plain-text tool invocations gemma emits in chat-completion content (`call:get_country_info{country: "France"}` / `call:execute-bead:read-file{path: "..."}` / etc). The 2026-05-30 gemma full bench scored forge 0/30 because every row's output carried these `call:<verb>{...}` invocations as text rather than structured `tool_use` content blocks. None of the existing five envelope-shaped detectors (`<tool_call>`, `<function=...>`, `<tool_code>`, bare JSON) match the bare `call:` shape. The new pattern: - Anchors on a sentinel character (whitespace, comma, semicolon, open/close bracket, etc.) before `call:` so narrative usages like `narrative.call:foo` don't match. - Supports namespaced verbs (`execute-bead:read-file`, `default_api:fetch_sales_data`) and strips the namespace before using the verb as the ToolCall name. - Extracts the args block via a quote- and escape-aware balanced-brace scanner that tolerates `"`, `'`, and `` ` `` string literals and tracks `[]` depth alongside `{}`. - Parses the args as strict JSON first, then falls back to a relaxed rewrite that quotes bare identifier keys and normalizes single/ backtick quoted strings to double-quoted before retrying. Malformed args drop the single invocation without crashing or polluting other calls. - Runs *before* the bare-JSON sweep so that inner args of the form `call:outer{"name": "inner", "arguments": {}}` aren't hijacked into a spurious `inner` ToolCall by pattern Luce-Org#6. Downstream the existing wiring takes over: SseEmitter::accumulate already calls parse_tool_calls; a non-empty ToolCall list flips finish_reason to `tool_calls`, which the Anthropic /v1/messages branch maps to `stop_reason="tool_use"` with `tool_use` content blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to `choices[].message.tool_calls`. The forge client-side workaround `_parse_plain_text_tool_calls` shipping on feat/lucebox-docker (commit deba2fd) becomes redundant once a server with this fix is deployed. It stays in place as defense-in-depth for older deployed servers. Test plan: 14 new C++ unit cases in test_server_unit.cpp covering single / back-to-back / namespaced / snake- and kebab-case verbs; tool-allowed filtering; mid-prose rejection vs. whitespace-led acceptance; malformed args drop; inner `{}` inside string literals; strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing; the codex-requested inner `name`/`arguments` interception case; and multi-line nested-array args mirroring the snapshot data. All pass in a standalone driver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on sindri's gemma-4-26b revealed a new emission pattern: the model sometimes outputs ``_call:get_country_info{...}`` with a leading underscore. This is a SentencePiece / chat-template tokenizer artifact that became visible after bragi's channel-token routing fix (commit 4b757d1) — the underscore is residual from the tokenizer's internal serialization that earlier handling stripped. Both parsers missed these invocations: * Server-side (tool_parser.cpp:182): the sentinel character class ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added. * Client-side (forge.py:32): ``\bcall:`` requires a word boundary before ``call``, but ``_`` is a word char so ``\b`` doesn't fire between ``_`` and ``c``. Replaced with explicit lookbehind on the same sentinel set (including ``_``). Net result: ``_call:foo{...}`` now parses to a tool_use the same way ``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier would also match, but real model outputs don't emit free-form ``my_call:`` text (tool names come from request tool defs). Tests: +2 cases in test_forge_grader.py (underscore alone, mixed back-to-back with both prefixed and bare). 16 → 18 forge_grader tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR Luce-Org#323's parser added pattern 6 (call:<verb>{...}) but the SSE emitter only invokes parse_tool_calls when mode_ == TOOL_BUFFER, which fires only on the literal <tool_call> XML opener. For models like gemma-4 that emit tool calls as plain text, the emitter stays in CONTENT mode and the parser is never called, so no tool_use blocks land in the response (finish_reason="stop" / stop_reason="end_turn"). Add a CONTENT-mode finalize branch that runs parse_tool_calls when the accumulated text contains a plausible `call:<verb>{` opener (checked via a cheap O(N) substring scan to avoid regex cost on no-tool responses). Matches are hoisted into tool_calls_, the covering spans are stripped from accumulated_text, and finish_reason flips to "tool_calls" so the existing Anthropic/OpenAI serialization paths emit proper tool_use content blocks. Pre-check accepts `_call:foo{` (SentencePiece underscore artifact) since `find("call:")` lands inside the `_call:` window — full validation is delegated to parse_tool_calls (tool_parser.cpp). Tests: +9 unit cases covering parsed/skipped/underscore/no-substring/ multi-call/malformed/no-double-fire-on-XML/empty-tools/preserving prior content paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai

3 issues found across 6 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

# Conflicts: # server/src/server/tool_parser.cpp

Record the 2026-06-01 12:08 cron pass, including Luce-Org#329's move back to draft status, fresh containment counts, direct conflict probes for the remaining selective-port candidates, and the tmux-driven Codex Luce-Org#305 no-safe-slice report.

Three P2 issues flagged by cubic on commit 8218333: 1. tool_parser.cpp:248 — relaxed-JSON rewrite missed escaping inner `"` when normalizing single- or backtick-quoted strings. Content like `'he said "hi"'` rewrote to `"he said "hi""` (invalid JSON), silently dropping the whole tool call. Now escapes `"` to `\"` when inside a non-`"` string. 2. sse_emitter.cpp:49 — looks_like_plain_text_call() pre-check used isalpha for the first verb char, but the parser regex accepts [A-Za-z0-9_.:\-]. A `call:2nd_pass{...}` emission would pass the parser but skip the pre-check. Switched first-char test to isalnum so digit-led verbs reach the parser. 3. sse_emitter.cpp:703 — stripping accumulated_content_ desynced the Responses-format streaming finalization events (.output_text.done / .content_part.done / .completed) from the raw .delta events the client already received. Captured a responses_streamed_text snapshot at top of emit_finish before any strip, and threaded it through the four Responses finalization sites. Non-streaming accumulated_text() accessor continues to return the stripped version so the non-streaming response shape doesn't carry both text AND tool_use for the same span. Tests: +4 cases. - test_parse_call_verb_singlequote_with_inner_doublequote - test_parse_call_verb_backtick_with_inner_doublequote - test_emitter_content_mode_digit_start_verb_parsed - test_emitter_content_mode_responses_done_uses_pre_strip_text test_server_unit: 1693 -> 1707 assertions, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…at/lucebox-docker Brings the post-review-feedback emitter wiring + 3 cubic P2 fixes onto feat/lucebox-docker for live testing against the lucebox-hub:cuda12 image. PR Luce-Org#329 itself remains targeting main. Folded: - e974ac3 docs(experiments): plan SSE emitter CONTENT-mode tool parse - 8218333 fix(server): wire sse_emitter to detect plain-text call:<verb>{} tools - ee9cd9e fix(server): address cubic PR Luce-Org#329 review feedback The three parser commits on the fix branch (d67a269, 8055201, 80e6e2a) have cherry-pick-equivalent counterparts already on feat/lucebox-docker (cdb8b9c, 12c50c0, 004a81b respectively) so they no-op in the merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> # Conflicts: # server/src/server/tool_parser.cpp # server/test/test_server_unit.cpp

…ebox-docker Brings the soft-close logit-ratio peek mechanism onto feat/lucebox-docker so the cuda12 image can be rebuilt with both the call:<verb>{} parser+ emitter fix (Luce-Org#329) AND the auto-thinking-cap dial available in a single sweep. Folded: - 1552495 docs(experiments): plan soft-close thinking termination - d799d00 feat(server): soft-close thinking termination via logit-ratio peek Conflicts resolved: - server/src/qwen35/qwen35_backend.cpp: do_ar_decode signature kept HEAD's terse comment + soft-close's new bool *soft_forced_close_out parameter. - server/test/test_server_unit.cpp: concatenated HEAD's C2-gate tests with soft-close's comparator/state-machine tests; merged both RUN_TEST blocks. Plumbing added in this merge (not on the source branch): - DFLASH_THINK_SOFT_CLOSE_MIN_RATIO env var in entrypoint.sh, emitted to the server CLI as --think-soft-close-min-ratio only when nonzero (preserves byte-identical-when-disabled invariant). - DflashRuntime.think_soft_close_min_ratio (float, default 0.0) in lucebox types/config/docker_run so `lucebox config set dflash.think_soft_close_min_ratio=0.5` propagates through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

easel · 2026-06-04T05:17:11Z

Superseded by the 8-PR split. The call:<verb>{} tool-call format is now split into:

PR feat(server): plain-text call:<verb>{} tool parsing (Gemma4) #340 — tool_parser.cpp Pattern 5 parser (with 25 unit tests)
PR feat(server): card-driven thinking control + reasoning_content channel + /props schema-4 #341 — sse_emitter.cpp Pattern-B streaming detection hooks (CONTENT-mode hoist + responses_streamed_text)

Closing in favor of reviewing those two.

🤖 Generated with Claude Code

easel and others added 5 commits May 31, 2026 22:59

cubic-dev-ai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread server/src/server/tool_parser.cpp

Comment thread server/src/server/sse_emitter.cpp Outdated

Comment thread server/src/server/sse_emitter.cpp

easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026

fix(server): integrate PR Luce-Org#329 tool-call detection

24b9e1e

# Conflicts: # server/src/server/tool_parser.cpp

easel marked this pull request as draft June 1, 2026 16:08

easel changed the title ~~fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring)~~ fix(server): support gemma-4's plain-text call:<verb>{} tool-call format Jun 1, 2026

This was referenced Jun 3, 2026

fix(server): split soft-close probe ids from inject ids #331

Closed

fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done easel/lucebox-hub#1

Merged

easel closed this Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): support gemma-4's plain-text call:<verb>{} tool-call format#329

fix(server): support gemma-4's plain-text call:<verb>{} tool-call format#329
easel wants to merge 6 commits into
Luce-Org:mainfrom
easel:fix/sse-emitter-content-mode-tool-parse

easel commented Jun 1, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

easel commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

easel commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Out of scope

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

easel commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

easel commented Jun 1, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading