Skip to content

feat(server): plain-text call:<verb>{} tool parsing (Gemma4)#340

Merged
davide221 merged 1 commit into
Luce-Org:mainfrom
easel:feat/server-call-verb-parser
Jun 4, 2026
Merged

feat(server): plain-text call:<verb>{} tool parsing (Gemma4)#340
davide221 merged 1 commit into
Luce-Org:mainfrom
easel:feat/server-call-verb-parser

Conversation

@easel
Copy link
Copy Markdown
Collaborator

@easel easel commented Jun 3, 2026

Summary

Adds plain-text call:<verb>{...} tool-call parsing for Gemma4 emissions to the server's tool_parser. The parser also tolerates the `_call: tokenizer artifact prefix and renders Anthropic-style tool_use / tool_result blocks. Changes are localized to tool_parser.{cpp,h}; no streaming-path edits in this PR.

Files

  • server/src/server/tool_parser.cpp — parser implementation for the call:<verb>{...} form (+192 LOC)
  • server/src/server/tool_parser.h — small surface additions for the new branch
  • server/test/test_server_unit.cpp — unit tests covering the new parser branches (+172 LOC)

Single commit: feat(server): plain-text call:<verb>{} tool parsing (Gemma4).

Dependencies

None — this PR is independent of the other split PRs.

Note: while this PR's code is self-contained, the server target itself cross-references symbols introduced/moved by sibling PRs (#336 layer split, #338 pflash drafter, #339 soft-close, #341 thinking-control), so building the server CMake target from this branch in isolation will not produce a complete binary. The parser additions themselves do not depend on any sibling PR.

Test plan

  • test_server_unit (the new branches added in this PR) passes locally
  • Smoke a Gemma4 trace emitting call:<verb>{...} (and the `_call: variant) end-to-end once the full server stack is assembled with sibling PRs
  • Confirm no regressions in existing tool_parser branches (JSON tool_use path, etc.)

Note: server PRs in this split cannot be validated as a runnable binary standalone due to CMake cross-references with the other split PRs; the unit tests added here are the only standalone validation.

## What

Extends server/src/server/tool_parser.{cpp,h} to parse Gemma's
plain-text call:<verb>{} emissions (also accepts the \`\`_call:\`\`
tokenizer-artifact prefix) and render them as Anthropic tool_use +
tool_result blocks. Isolated to tool_parser; the streaming detection
hook in sse_emitter ships with Luce-Org#341. Adds 364 lines of C++ unit
coverage in test_server_unit.cpp plus the call-verb parser plan and
Gemma4-26B parser-fix writeup.

## Why

Gemma4 emits tool calls as plain-text call:<verb>{...} rather than
structured JSON, which breaks the existing Anthropic tool_use pipeline
on agentic workloads. This parser closes that gap so Gemma4 can drive
coding-agent loops end-to-end.

## Dependencies

None - this PR is independent.
@easel easel force-pushed the feat/server-call-verb-parser branch from 89e8505 to 4472aa9 Compare June 4, 2026 05:03
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…ps schema-4

## What

User-facing thinking-control API across the HTTP server surface:

- chat_template prefills a closed <think> block when thinking is off
  (Qwen3-gated) so the model skips the reasoning preamble without
  losing the assistant turn.
- http_server bumps /props schema 2 -> 4, adding build / model.target /
  model.draft / host blocks for client introspection.
- server_main adds --debug-thinking-logits and --think-soft-close-*
  flags plus image/host-info loaders for card-driven boot.
- sse_emitter routes Qwen3.6/Laguna think-mode output to the
  reasoning_content channel so reasoning never leaks into the
  user-visible content stream (Pattern-B call-verb streaming hook).
- Ships the model-card _schema.json, qwen3.6-27b and laguna-xs.2 cards,
  the /props OpenAPI doc, updated thinking-budget spec, and the
  thinking-control protocol/mechanism experiments.
- test_server_unit gets matching coverage (~1100 lines) for prefill,
  /props schema-4, and reasoning_content routing.

## Why

Gives clients a single, card-driven API to control thinking budgets,
soft-close behavior, and reasoning visibility - and an introspectable
/props surface to discover what the server supports.

## Dependencies

- Luce-Org#336 (server-layer-split): CMake/build references
- Luce-Org#338 (server-pflash-drafter): check_admission uses pflash_keep_ratio +
  pflash_on contracts
- Luce-Org#340 (server-call-verb): sse_emitter Pattern-B call-verb streaming hooks
  rely on tool_parser changes from Luce-Org#340
@easel easel marked this pull request as ready for review June 4, 2026 05:03
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="server/src/server/tool_parser.cpp">

<violation number="1" location="server/src/server/tool_parser.cpp:582">
P1: Disallowed `call:<verb>{...}` spans are not shadowed, allowing pattern 6 to emit spurious inner tool calls.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

if (colon != std::string::npos) verb = verb.substr(colon + 1);
if (verb.empty()) continue;

add_call(verb, args, call_start, brace_close);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Disallowed call:<verb>{...} spans are not shadowed, allowing pattern 6 to emit spurious inner tool calls.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/tool_parser.cpp, line 582:

<comment>Disallowed `call:<verb>{...}` spans are not shadowed, allowing pattern 6 to emit spurious inner tool calls.</comment>

<file context>
@@ -397,7 +545,45 @@ ToolParseResult parse_tool_calls(const std::string & text, const json & tools) {
+            if (colon != std::string::npos) verb = verb.substr(colon + 1);
+            if (verb.empty()) continue;
+
+            add_call(verb, args, call_start, brace_close);
+        }
+    }
</file context>
Suggested change
add_call(verb, args, call_start, brace_close);
if (tool_allowed(tools, verb)) {
add_call(verb, args, call_start, brace_close);
} else {
removals.push_back({call_start, brace_close});
}

easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
@davide221 davide221 merged commit f4eb504 into Luce-Org:main Jun 4, 2026
2 of 3 checks passed
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 5, 2026
…ps schema-4

User-facing thinking-control API across the HTTP server surface:

- chat_template prefills a closed <think> block when thinking is off
  (Qwen3-gated) so the model skips the reasoning preamble without
  losing the assistant turn.
- http_server bumps /props schema 2 -> 4, adding build / model.target /
  model.draft / host blocks for client introspection.
- server_main adds --debug-thinking-logits and --think-soft-close-*
  flags plus image/host-info loaders for card-driven boot.
- sse_emitter routes Qwen3.6/Laguna think-mode output to the
  reasoning_content channel so reasoning never leaks into the
  user-visible content stream (Pattern-B call-verb streaming hook).
- Ships the model-card _schema.json, qwen3.6-27b and laguna-xs.2 cards,
  the /props OpenAPI doc, updated thinking-budget spec, and the
  thinking-control protocol/mechanism experiments.
- test_server_unit gets matching coverage (~1100 lines) for prefill,
  /props schema-4, and reasoning_content routing.

Gives clients a single, card-driven API to control thinking budgets,
soft-close behavior, and reasoning visibility - and an introspectable
/props surface to discover what the server supports.

- Luce-Org#336 (server-layer-split): CMake/build references
- Luce-Org#338 (server-pflash-drafter): check_admission uses pflash_keep_ratio +
  pflash_on contracts
- Luce-Org#340 (server-call-verb): sse_emitter Pattern-B call-verb streaming hooks
  rely on tool_parser changes from Luce-Org#340
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants