feat(server): plain-text call:<verb>{} tool parsing (Gemma4)#340
Merged
Conversation
d4c2de0 to
8b97d57
Compare
4 tasks
8b97d57 to
89e8505
Compare
6 tasks
## What
Extends server/src/server/tool_parser.{cpp,h} to parse Gemma's
plain-text call:<verb>{} emissions (also accepts the \`\`_call:\`\`
tokenizer-artifact prefix) and render them as Anthropic tool_use +
tool_result blocks. Isolated to tool_parser; the streaming detection
hook in sse_emitter ships with Luce-Org#341. Adds 364 lines of C++ unit
coverage in test_server_unit.cpp plus the call-verb parser plan and
Gemma4-26B parser-fix writeup.
## Why
Gemma4 emits tool calls as plain-text call:<verb>{...} rather than
structured JSON, which breaks the existing Anthropic tool_use pipeline
on agentic workloads. This parser closes that gap so Gemma4 can drive
coding-agent loops end-to-end.
## Dependencies
None - this PR is independent.
89e8505 to
4472aa9
Compare
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 4, 2026
…ps schema-4 ## What User-facing thinking-control API across the HTTP server surface: - chat_template prefills a closed <think> block when thinking is off (Qwen3-gated) so the model skips the reasoning preamble without losing the assistant turn. - http_server bumps /props schema 2 -> 4, adding build / model.target / model.draft / host blocks for client introspection. - server_main adds --debug-thinking-logits and --think-soft-close-* flags plus image/host-info loaders for card-driven boot. - sse_emitter routes Qwen3.6/Laguna think-mode output to the reasoning_content channel so reasoning never leaks into the user-visible content stream (Pattern-B call-verb streaming hook). - Ships the model-card _schema.json, qwen3.6-27b and laguna-xs.2 cards, the /props OpenAPI doc, updated thinking-budget spec, and the thinking-control protocol/mechanism experiments. - test_server_unit gets matching coverage (~1100 lines) for prefill, /props schema-4, and reasoning_content routing. ## Why Gives clients a single, card-driven API to control thinking budgets, soft-close behavior, and reasoning visibility - and an introspectable /props surface to discover what the server supports. ## Dependencies - Luce-Org#336 (server-layer-split): CMake/build references - Luce-Org#338 (server-pflash-drafter): check_admission uses pflash_keep_ratio + pflash_on contracts - Luce-Org#340 (server-call-verb): sse_emitter Pattern-B call-verb streaming hooks rely on tool_parser changes from Luce-Org#340
Contributor
There was a problem hiding this comment.
1 issue found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="server/src/server/tool_parser.cpp">
<violation number="1" location="server/src/server/tool_parser.cpp:582">
P1: Disallowed `call:<verb>{...}` spans are not shadowed, allowing pattern 6 to emit spurious inner tool calls.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| if (colon != std::string::npos) verb = verb.substr(colon + 1); | ||
| if (verb.empty()) continue; | ||
|
|
||
| add_call(verb, args, call_start, brace_close); |
Contributor
There was a problem hiding this comment.
P1: Disallowed call:<verb>{...} spans are not shadowed, allowing pattern 6 to emit spurious inner tool calls.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/tool_parser.cpp, line 582:
<comment>Disallowed `call:<verb>{...}` spans are not shadowed, allowing pattern 6 to emit spurious inner tool calls.</comment>
<file context>
@@ -397,7 +545,45 @@ ToolParseResult parse_tool_calls(const std::string & text, const json & tools) {
+ if (colon != std::string::npos) verb = verb.substr(colon + 1);
+ if (verb.empty()) continue;
+
+ add_call(verb, args, call_start, brace_close);
+ }
+ }
</file context>
Suggested change
| add_call(verb, args, call_start, brace_close); | |
| if (tool_allowed(tools, verb)) { | |
| add_call(verb, args, call_start, brace_close); | |
| } else { | |
| removals.push_back({call_start, brace_close}); | |
| } |
This was referenced Jun 4, 2026
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 4, 2026
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 4, 2026
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 4, 2026
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 4, 2026
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 5, 2026
…ps schema-4 User-facing thinking-control API across the HTTP server surface: - chat_template prefills a closed <think> block when thinking is off (Qwen3-gated) so the model skips the reasoning preamble without losing the assistant turn. - http_server bumps /props schema 2 -> 4, adding build / model.target / model.draft / host blocks for client introspection. - server_main adds --debug-thinking-logits and --think-soft-close-* flags plus image/host-info loaders for card-driven boot. - sse_emitter routes Qwen3.6/Laguna think-mode output to the reasoning_content channel so reasoning never leaks into the user-visible content stream (Pattern-B call-verb streaming hook). - Ships the model-card _schema.json, qwen3.6-27b and laguna-xs.2 cards, the /props OpenAPI doc, updated thinking-budget spec, and the thinking-control protocol/mechanism experiments. - test_server_unit gets matching coverage (~1100 lines) for prefill, /props schema-4, and reasoning_content routing. Gives clients a single, card-driven API to control thinking budgets, soft-close behavior, and reasoning visibility - and an introspectable /props surface to discover what the server supports. - Luce-Org#336 (server-layer-split): CMake/build references - Luce-Org#338 (server-pflash-drafter): check_admission uses pflash_keep_ratio + pflash_on contracts - Luce-Org#340 (server-call-verb): sse_emitter Pattern-B call-verb streaming hooks rely on tool_parser changes from Luce-Org#340
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds plain-text
call:<verb>{...}tool-call parsing for Gemma4 emissions to the server'stool_parser. The parser also tolerates the`_call:tokenizer artifact prefix and renders Anthropic-styletool_use/tool_resultblocks. Changes are localized totool_parser.{cpp,h}; no streaming-path edits in this PR.Files
server/src/server/tool_parser.cpp— parser implementation for thecall:<verb>{...}form (+192 LOC)server/src/server/tool_parser.h— small surface additions for the new branchserver/test/test_server_unit.cpp— unit tests covering the new parser branches (+172 LOC)Single commit:
feat(server): plain-text call:<verb>{} tool parsing (Gemma4).Dependencies
None — this PR is independent of the other split PRs.
Note: while this PR's code is self-contained, the server target itself cross-references symbols introduced/moved by sibling PRs (#336 layer split, #338 pflash drafter, #339 soft-close, #341 thinking-control), so building the
serverCMake target from this branch in isolation will not produce a complete binary. The parser additions themselves do not depend on any sibling PR.Test plan
test_server_unit(the new branches added in this PR) passes locallycall:<verb>{...}(and the`_call:variant) end-to-end once the full server stack is assembled with sibling PRstool_parserbranches (JSON tool_use path, etc.)Note: server PRs in this split cannot be validated as a runnable binary standalone due to CMake cross-references with the other split PRs; the unit tests added here are the only standalone validation.