feat: Gemma 4 cutover — eliza-1 Qwen→Gemma, eliza-1-only stack, cloud HF-proxy, dynamic-fit memory (#9033) by lalalune · Pull Request #9060 · elizaOS/eliza

lalalune · 2026-06-22T21:04:55Z

Cuts the eliza-1 local model line from the Qwen3.5/3.6 backbone to Gemma 4 end-to-end, and consolidates the local stack to eliza-1-only. Tracking: #9033. Supersedes #8794 (closed); resolves #8808 (closed, generic-GGUF removed); addresses #8807/#8809 (M10).

What's in this PR (all CPU/test-verified)

M2 — source-of-truth cutover: training registry → Gemma 4 E2B/E4B/12B/31B (eliza-1-2b/4b/9b/27b); memory_calc vocab 248320→262144; abliterate dense surgery; catalog/types/device-fit (tokenizerFamily gemma4, separate-drafter MTP, stock-q8_0 KV); voice EOT <|im_end|>→<end_of_turn>; vision family qwen3-vl→gemma-vl; AGENTS/native contracts. ~250 files swept (mechanical + load-bearing), strict KEEP of frozen Qwen3-ASR/OmniVoice/Embedding/turn-detector lineage + banned-name guards.
M6 — kernels: keep geometry-agnostic TurboQuant weight-quant; QJL/PolarQuant/turbo3_tcq KV kernels → optional for Gemma (stock q8_0 KV; head_dim=128-coupled). Dead Qwen-hybrid code removed (hybrid_cache, eagle3, serve_vllm hybrid). Manifest validator reconciled.
M9 — eliza-1-only: generic-GGUF backend + multi-model selection removed; RuntimeClass collapsed to fused-eliza1.
M10a — cloud HF-proxy: packages/cloud-api/v1/hf-proxy + shared/hf-proxy.ts — HF downloads route through Eliza Cloud (no local HF keys).
M10b — memory: dynamic fit-to-RAM quant/context selection (per-token q8_0 KV rate), LRU estimatedMb fix, desktop bench harness.
M8 (code): publish pipeline Gemma-synced; assemble_local_gemma_bundle.py builds a verified-loadable E2B bundle; legacy-tier purge helper.
M3/M4/M5 (native, in the submodule at the M3 tip): multi-backend libelizainference FFI seam + LiteRT/MLX/CoreML scaffolds (gated off; compile-verified).

Verification

@elizaos/plugin-local-inference: 179 files, 1869 tests pass, 0 fail (vitest).
@elizaos/shared: 65 files, 947 pass, 0 fail. cloud-api/cloud-shared/ui/benchmarks typecheck 0 errors.
Gemma 4 E2B runs on the fork (CPU): pp64 56.6 / tg32 5.84; vision (mmproj) loads + reasons; assembled bundle loads.

Deliberately NOT in this PR (gated, tracked in #9033)

The 603-commit llama.cpp upstream merge (CPU-verified, fork PR Sync fork to upstream master (+604 commits) — Gemma4 verified llama.cpp#29) — submodule pointer stays at the M3 tip until GPU device-verify.
HF elizaos/eliza-1 Gemma bundle upload — gated on the non-standard MTP-drafter safetensors→GGUF conversion (base ships MTP-disabled meanwhile).
GPU-kernel runtime verification (CUDA/Metal/Vulkan) + on-device (Mac/iOS/Pixel) — hardware-gated.

🤖 Generated with Claude Code

Gemma 4 E2B+E4B text gen + E2B vision (mmproj) + assembled-bundle load all verified through the fork on CPU (stock-q8_0 KV path). Device/CUDA/Metal lanes remain hardware-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…a auto-provision) Make Feed deployable as a single always-on Railway web service (reusing the existing Steward + cloud inference), with no external cron and no migrate step: - startInternalCronLoop(): in-process scheduler (gated on ENABLE_INTERNAL_CRON_SCHEDULER=true) that fires the game loop's entry crons (game-tick fans out the rest) every 60s against the local server with the CRON_SECRET — so one web container runs the live game. Wired at boot in instrumentation.ts. - railway.json: Nixpacks build (bun install + feed web build) + healthcheck. - scripts/railway-start.sh: ensure schema via drizzle-kit push (the migration history has parallel 0000 baselines that cannot apply to a fresh DB), then next start — auto-provisioned 'just works' boot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…oud inference) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ernel contract (#9033) The M6 schema change (REQUIRED_KERNELS_BY_TIER → turboquant_q4 only; QJL/Polar/ turbo3_tcq optional for Gemma's stock-q8_0 KV) left the validator + tests on the old Qwen contract: - validator.ts: drop the 'ctx>64k requires turbo3_tcq' hard rule (Gemma handles long context via native windowed-SWA + shared-KV; turbo3_tcq is now optional). - manifest.test.ts: assert the Gemma required set (turboquant_q4 only), turbo3_tcq optional-when-long-context is now accepted; retarget rejection triggers. - delete obsolete generic-gguf tests (backend-runtime-class.test.ts, assignment-not-servable-route.test.ts — M9 removed those code paths); trim the removed canServeRuntimeClassOnHost suite from assignment-validation.test.ts (its setAssignment boundary tests still own the non-eliza-1-rejection contract). Gates: manifest 49/49, the manifest+assignment+catalog combo 72/72, typecheck 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…d, idempotent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t drift (#9033) - memory-benchmark: plannedKvQuant qjl1_256→q8_0 (Gemma stock KV). - fused-eliza1-no-regression: decideBackend no longer carries a runtimeClass field or a generic-gguf backend (eliza-1-only); assert backend==llama-cpp for both known + unknown catalog entries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…9033) - eot-scorer: <|im_end|>→<end_of_turn>, <|im_start|>user→<start_of_turn>user (M2 EOT). - text-bundle + mmproj-routing: same-file NextN MTP → separate official Gemma drafter (mtp/drafter-<slug>.gguf component + runtime.mtp.drafterFile, draftMax 4). Also fixed a real test-bundle drafter-path mismatch the catalog change exposed. - vision-describe: cache family qwen3-vl→gemma-vl (M2 vision default). - downloader: HF bearer → Eliza Cloud API-key via cloud HF-proxy (M10a). Full plugin-local-inference suite via vitest: 179 files, 1869 pass, 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r develop merge (#9033) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-06-22T21:05:06Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 608fc9b2-6ea7-4879-abe3-3790ff8cbe78

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/gemma4-cutover

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

claude · 2026-06-22T23:42:45Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

lalalune and others added 9 commits June 22, 2026 13:22

docs(feed): Railway deploy runbook (single service, reuse Steward, cl…

b5956d7

…oud inference) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(feed): unit test the internal cron loop (fires entry crons, gate…

4e3074e

…d, idempotent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/develop' into feat/gemma4-cutover

e6c60c8

fix(local-inference): re-apply Gemma manifest-validator contract afte…

c08e60a

…r develop merge (#9033) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Jun 22, 2026

View reviewed changes

lalalune merged commit 3f3463d into develop Jun 22, 2026
30 checks passed

lalalune deleted the feat/gemma4-cutover branch June 22, 2026 21:06

This was referenced Jun 22, 2026

Gemma 4 cutover + multi-backend libelizainference (LiteRT/AICore/CoreML/MLX) + per-platform kernel optimization #9033

Closed

refactor(local-inference): finish M9 generic-GGUF cleanup + purge M8 legacy tier refs #9064

Merged

github-actions Bot added Docs Tests plugins labels Jun 22, 2026

lalalune mentioned this pull request Jun 23, 2026

docs(local-inference): Gemma GPU evidence via Vulkan (RTX 5080) — closes M6 FA-512 gap on a real GPU #9092

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Gemma 4 cutover — eliza-1 Qwen→Gemma, eliza-1-only stack, cloud HF-proxy, dynamic-fit memory (#9033)#9060

feat: Gemma 4 cutover — eliza-1 Qwen→Gemma, eliza-1-only stack, cloud HF-proxy, dynamic-fit memory (#9033)#9060
lalalune merged 9 commits into
developfrom
feat/gemma4-cutover

lalalune commented Jun 22, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Review skipped

Uh oh!

Uh oh!

claude Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lalalune commented Jun 22, 2026

What's in this PR (all CPU/test-verified)

Verification

Deliberately NOT in this PR (gated, tracked in #9033)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Review skipped

Uh oh!

Uh oh!

claude Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 22, 2026 •

edited

Loading