Skip to content

chore(local-inference): bump llama.cpp fork to gemma4-assistant arch support#9268

Merged
lalalune merged 1 commit into
developfrom
chore/bump-fork-gemma4-assistant
Jun 24, 2026
Merged

chore(local-inference): bump llama.cpp fork to gemma4-assistant arch support#9268
lalalune merged 1 commit into
developfrom
chore/bump-fork-gemma4-assistant

Conversation

@lalalune

Copy link
Copy Markdown
Member

The runtime half of the Gemma-4 MTP drafter work. Bumps the llama.cpp fork submodule to the gemma4-assistant arch port (fork PR elizaOS/llama.cpp#32).

With this, the fused engine built from the fork can load mtp/drafter-<tier>.gguf — Google's official Gemma-4 MTP drafter head (gemma4-assistant arch, wired in #9256) — and run separate-drafter MTP.

Validated (Apple M4 Max, Metal)

  • The fork loads gemma4-assistant cleanly (no unknown-arch / missing-tensor errors), produces correct output, and gives ~1.1× decode speedup with the real amaranus/Gemma-4-E2B-it-qat-assistant-MTP-Q8_0 drafter via --spec-type draft-mtp.

bcae29e65 (prior gitlink) is a clean ancestor → fast-forward (adds the metal-tbq attn-score fix + the gemma4-assistant arch, +419/−26 across 22 files). Re-point to the squash/merge commit once fork #32 lands.

🤖 Generated with Claude Code

…support

Points the fork submodule at the gemma4-assistant port (validated: loads Google's
Gemma-4 MTP drafter + ~1.1x decode speedup via --spec-type draft-mtp on M4 Max
Metal). bcae29e65 (prior gitlink) is a clean ancestor, so this is a fast-forward
that adds the metal-tbq attn-score fix + the gemma4-assistant arch. Tracks fork
PR elizaOS/llama.cpp#32; re-point to the merged commit once that lands.

This is the runtime half of the Gemma-4 MTP drafter work — with this, the fused
engine built from the fork can load mtp/drafter-<tier>.gguf (the amaranus/Google
gemma-4-E2B-it-assistant head, wired in #9256) and run separate-drafter MTP.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26ad7be4-4d05-47e5-96e2-e1b7ba1fd445

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/bump-fork-gemma4-assistant

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@lalalune lalalune merged commit 0928b56 into develop Jun 24, 2026
7 of 29 checks passed
@lalalune lalalune deleted the chore/bump-fork-gemma4-assistant branch June 24, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant