Skip to content

fix(server): support gemma-4's plain-text call:<verb>{} tool-call format#329

Closed
easel wants to merge 6 commits into
Luce-Org:mainfrom
easel:fix/sse-emitter-content-mode-tool-parse
Closed

fix(server): support gemma-4's plain-text call:<verb>{} tool-call format#329
easel wants to merge 6 commits into
Luce-Org:mainfrom
easel:fix/sse-emitter-content-mode-tool-parse

Conversation

@easel
Copy link
Copy Markdown
Collaborator

@easel easel commented Jun 1, 2026

Summary

Adds server-side support for the plain-text call:<verb>{...} tool-call format that gemma-4 (and other no-XML-template models) emits. Without this, /v1/messages returns the call as raw text with stop_reason: end_turn instead of structured tool_use blocks.

Two layers, both needed:

  • Parsertool_parser.cpp gains pattern feat: add DFlash for Windows #6 for call:<ns>?<verb>{relaxed-JSON args}, including the _call: SentencePiece tokenizer artifact.
  • Emittersse_emitter.cpp invokes the parser on stream finalize when the response stayed in CONTENT mode (no <tool_call> XML opener ever arrived), hoists matches into tool_calls_, and flips finish_reason to tool_calls.

The parser layer was previously closed as #323 (folded into #285). PR #285 has both layers; this PR is the main-targeted equivalent so the fix can land independently of the docker-stack merge.

Tests

  • 14 parser unit tests in test_server_unit.cpp (single, back-to-back, namespaced, snake/kebab, sentinel anchoring, malformed args, string-quoted braces, strict + relaxed JSON args, scrubbed cleaned_text, inner-name/arguments interception, multi-line nested args).
  • 9 emitter unit tests (parsed / underscore prefix / multi-call / no-call: fast-path / empty-tools skip / malformed-drop / no-double-fire on <tool_call> XML / accumulated_text scrubbed / finish_reason flipped).
  • Full test_server_unit: 1693 assertions, 0 failures.

Out of scope

  • Docker image rebuild + live e2e validation (operator post-merge).
  • The forge-side _parse_plain_text_tool_calls defense-in-depth stub stays for now.

easel and others added 5 commits May 31, 2026 22:59
Plan + Codex review for running parse_tool_calls on accumulated
CONTENT-mode text so plain-text `call:<verb>{...}` invocations
(Gemma4) actually produce tool_use blocks instead of stop=end_turn.

Codex verdict: REVISE (residue hazard) → integrated as a new emitter-
level test guarding accumulated_text() span strip. Q4 rebuttal:
tool_allowed enforcement is already inside parse_tool_calls.
Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed
sixth detection pattern, the relaxed-JSON arg parser sketch, the
unit-test matrix, and codex's review (which forced reordering the new
pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception
of nested name/arguments-shaped args).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sixth detection pattern to `parse_tool_calls` that recognizes
the plain-text tool invocations gemma emits in chat-completion content
(`call:get_country_info{country: "France"}` /
`call:execute-bead:read-file{path: "..."}` / etc).

The 2026-05-30 gemma full bench scored forge 0/30 because every row's
output carried these `call:<verb>{...}` invocations as text rather
than structured `tool_use` content blocks. None of the existing five
envelope-shaped detectors (`<tool_call>`, `<function=...>`,
`<tool_code>`, bare JSON) match the bare `call:` shape.

The new pattern:
- Anchors on a sentinel character (whitespace, comma, semicolon,
  open/close bracket, etc.) before `call:` so narrative usages like
  `narrative.call:foo` don't match.
- Supports namespaced verbs (`execute-bead:read-file`,
  `default_api:fetch_sales_data`) and strips the namespace before
  using the verb as the ToolCall name.
- Extracts the args block via a quote- and escape-aware balanced-brace
  scanner that tolerates `"`, `'`, and `` ` `` string literals and
  tracks `[]` depth alongside `{}`.
- Parses the args as strict JSON first, then falls back to a relaxed
  rewrite that quotes bare identifier keys and normalizes single/
  backtick quoted strings to double-quoted before retrying. Malformed
  args drop the single invocation without crashing or polluting other
  calls.
- Runs *before* the bare-JSON sweep so that inner args of the form
  `call:outer{"name": "inner", "arguments": {}}` aren't hijacked into
  a spurious `inner` ToolCall by pattern Luce-Org#6.

Downstream the existing wiring takes over: SseEmitter::accumulate
already calls parse_tool_calls; a non-empty ToolCall list flips
finish_reason to `tool_calls`, which the Anthropic /v1/messages
branch maps to `stop_reason="tool_use"` with `tool_use` content
blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to
`choices[].message.tool_calls`.

The forge client-side workaround `_parse_plain_text_tool_calls`
shipping on feat/lucebox-docker (commit deba2fd) becomes redundant
once a server with this fix is deployed. It stays in place as
defense-in-depth for older deployed servers.

Test plan: 14 new C++ unit cases in test_server_unit.cpp covering
single / back-to-back / namespaced / snake- and kebab-case verbs;
tool-allowed filtering; mid-prose rejection vs. whitespace-led
acceptance; malformed args drop; inner `{}` inside string literals;
strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing;
the codex-requested inner `name`/`arguments` interception case; and
multi-line nested-array args mirroring the snapshot data. All pass
in a standalone driver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on
sindri's gemma-4-26b revealed a new emission pattern: the model
sometimes outputs ``_call:get_country_info{...}`` with a leading
underscore. This is a SentencePiece / chat-template tokenizer artifact
that became visible after bragi's channel-token routing fix
(commit 4b757d1) — the underscore is residual from the tokenizer's
internal serialization that earlier handling stripped.

Both parsers missed these invocations:

* Server-side (tool_parser.cpp:182): the sentinel character class
  ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added.
* Client-side (forge.py:32): ``\bcall:`` requires a word boundary
  before ``call``, but ``_`` is a word char so ``\b`` doesn't fire
  between ``_`` and ``c``. Replaced with explicit lookbehind on the
  same sentinel set (including ``_``).

Net result: ``_call:foo{...}`` now parses to a tool_use the same way
``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier
would also match, but real model outputs don't emit free-form
``my_call:`` text (tool names come from request tool defs).

Tests: +2 cases in test_forge_grader.py (underscore alone, mixed
back-to-back with both prefixed and bare). 16 → 18 forge_grader
tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Luce-Org#323's parser added pattern 6 (call:<verb>{...}) but the SSE emitter
only invokes parse_tool_calls when mode_ == TOOL_BUFFER, which fires
only on the literal <tool_call> XML opener. For models like gemma-4
that emit tool calls as plain text, the emitter stays in CONTENT mode
and the parser is never called, so no tool_use blocks land in the
response (finish_reason="stop" / stop_reason="end_turn").

Add a CONTENT-mode finalize branch that runs parse_tool_calls when
the accumulated text contains a plausible `call:<verb>{` opener
(checked via a cheap O(N) substring scan to avoid regex cost on
no-tool responses). Matches are hoisted into tool_calls_, the
covering spans are stripped from accumulated_text, and finish_reason
flips to "tool_calls" so the existing Anthropic/OpenAI serialization
paths emit proper tool_use content blocks.

Pre-check accepts `_call:foo{` (SentencePiece underscore artifact)
since `find("call:")` lands inside the `_call:` window — full
validation is delegated to parse_tool_calls (tool_parser.cpp).

Tests: +9 unit cases covering parsed/skipped/underscore/no-substring/
multi-call/malformed/no-double-fire-on-XML/empty-tools/preserving
prior content paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 6 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread server/src/server/tool_parser.cpp
Comment thread server/src/server/sse_emitter.cpp Outdated
Comment thread server/src/server/sse_emitter.cpp
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
# Conflicts:
#	server/src/server/tool_parser.cpp
@easel easel marked this pull request as draft June 1, 2026 16:08
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Record the 2026-06-01 12:08 cron pass, including Luce-Org#329's move back to draft status, fresh containment counts, direct conflict probes for the remaining selective-port candidates, and the tmux-driven Codex Luce-Org#305 no-safe-slice report.
@easel easel changed the title fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring) fix(server): support gemma-4's plain-text call:<verb>{} tool-call format Jun 1, 2026
Three P2 issues flagged by cubic on commit 8218333:

1. tool_parser.cpp:248 — relaxed-JSON rewrite missed escaping inner
   `"` when normalizing single- or backtick-quoted strings. Content
   like `'he said "hi"'` rewrote to `"he said "hi""` (invalid JSON),
   silently dropping the whole tool call. Now escapes `"` to `\"`
   when inside a non-`"` string.

2. sse_emitter.cpp:49 — looks_like_plain_text_call() pre-check used
   isalpha for the first verb char, but the parser regex accepts
   [A-Za-z0-9_.:\-]. A `call:2nd_pass{...}` emission would pass the
   parser but skip the pre-check. Switched first-char test to
   isalnum so digit-led verbs reach the parser.

3. sse_emitter.cpp:703 — stripping accumulated_content_ desynced
   the Responses-format streaming finalization events
   (.output_text.done / .content_part.done / .completed) from the
   raw .delta events the client already received. Captured a
   responses_streamed_text snapshot at top of emit_finish before
   any strip, and threaded it through the four Responses
   finalization sites. Non-streaming accumulated_text() accessor
   continues to return the stripped version so the non-streaming
   response shape doesn't carry both text AND tool_use for the
   same span.

Tests: +4 cases.
- test_parse_call_verb_singlequote_with_inner_doublequote
- test_parse_call_verb_backtick_with_inner_doublequote
- test_emitter_content_mode_digit_start_verb_parsed
- test_emitter_content_mode_responses_done_uses_pre_strip_text

test_server_unit: 1693 -> 1707 assertions, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
…at/lucebox-docker

Brings the post-review-feedback emitter wiring + 3 cubic P2 fixes onto
feat/lucebox-docker for live testing against the lucebox-hub:cuda12
image. PR Luce-Org#329 itself remains targeting main.

Folded:
- e974ac3 docs(experiments): plan SSE emitter CONTENT-mode tool parse
- 8218333 fix(server): wire sse_emitter to detect plain-text call:<verb>{} tools
- ee9cd9e fix(server): address cubic PR Luce-Org#329 review feedback

The three parser commits on the fix branch (d67a269, 8055201, 80e6e2a)
have cherry-pick-equivalent counterparts already on feat/lucebox-docker
(cdb8b9c, 12c50c0, 004a81b respectively) so they no-op in the merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts:
#	server/src/server/tool_parser.cpp
#	server/test/test_server_unit.cpp
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
…ebox-docker

Brings the soft-close logit-ratio peek mechanism onto feat/lucebox-docker
so the cuda12 image can be rebuilt with both the call:<verb>{} parser+
emitter fix (Luce-Org#329) AND the auto-thinking-cap dial available in a single
sweep.

Folded:
- 1552495 docs(experiments): plan soft-close thinking termination
- d799d00 feat(server): soft-close thinking termination via logit-ratio peek

Conflicts resolved:
- server/src/qwen35/qwen35_backend.cpp: do_ar_decode signature kept
  HEAD's terse comment + soft-close's new bool *soft_forced_close_out
  parameter.
- server/test/test_server_unit.cpp: concatenated HEAD's C2-gate tests
  with soft-close's comparator/state-machine tests; merged both
  RUN_TEST blocks.

Plumbing added in this merge (not on the source branch):
- DFLASH_THINK_SOFT_CLOSE_MIN_RATIO env var in entrypoint.sh, emitted
  to the server CLI as --think-soft-close-min-ratio only when nonzero
  (preserves byte-identical-when-disabled invariant).
- DflashRuntime.think_soft_close_min_ratio (float, default 0.0) in
  lucebox types/config/docker_run so `lucebox config set
  dflash.think_soft_close_min_ratio=0.5` propagates through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@easel
Copy link
Copy Markdown
Collaborator Author

easel commented Jun 4, 2026

Superseded by the 8-PR split. The call:<verb>{} tool-call format is now split into:

Closing in favor of reviewing those two.

🤖 Generated with Claude Code

@easel easel closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant