chore(local-inference): bump llama.cpp fork to gemma4-assistant arch support#9268
Conversation
…support Points the fork submodule at the gemma4-assistant port (validated: loads Google's Gemma-4 MTP drafter + ~1.1x decode speedup via --spec-type draft-mtp on M4 Max Metal). bcae29e65 (prior gitlink) is a clean ancestor, so this is a fast-forward that adds the metal-tbq attn-score fix + the gemma4-assistant arch. Tracks fork PR elizaOS/llama.cpp#32; re-point to the merged commit once that lands. This is the runtime half of the Gemma-4 MTP drafter work — with this, the fused engine built from the fork can load mtp/drafter-<tier>.gguf (the amaranus/Google gemma-4-E2B-it-assistant head, wired in #9256) and run separate-drafter MTP. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The runtime half of the Gemma-4 MTP drafter work. Bumps the
llama.cppfork submodule to thegemma4-assistantarch port (fork PR elizaOS/llama.cpp#32).With this, the fused engine built from the fork can load
mtp/drafter-<tier>.gguf— Google's official Gemma-4 MTP drafter head (gemma4-assistantarch, wired in #9256) — and run separate-drafter MTP.Validated (Apple M4 Max, Metal)
gemma4-assistantcleanly (no unknown-arch / missing-tensor errors), produces correct output, and gives ~1.1× decode speedup with the realamaranus/Gemma-4-E2B-it-qat-assistant-MTP-Q8_0drafter via--spec-type draft-mtp.bcae29e65(prior gitlink) is a clean ancestor → fast-forward (adds the metal-tbq attn-score fix + the gemma4-assistant arch, +419/−26 across 22 files). Re-point to the squash/merge commit once fork #32 lands.🤖 Generated with Claude Code