Skip to content

feat: Gemma 4 cutover — eliza-1 Qwen→Gemma, eliza-1-only stack, cloud HF-proxy, dynamic-fit memory (#9033)#9060

Merged
lalalune merged 9 commits into
developfrom
feat/gemma4-cutover
Jun 22, 2026
Merged

feat: Gemma 4 cutover — eliza-1 Qwen→Gemma, eliza-1-only stack, cloud HF-proxy, dynamic-fit memory (#9033)#9060
lalalune merged 9 commits into
developfrom
feat/gemma4-cutover

Conversation

@lalalune

Copy link
Copy Markdown
Member

Cuts the eliza-1 local model line from the Qwen3.5/3.6 backbone to Gemma 4 end-to-end, and consolidates the local stack to eliza-1-only. Tracking: #9033. Supersedes #8794 (closed); resolves #8808 (closed, generic-GGUF removed); addresses #8807/#8809 (M10).

What's in this PR (all CPU/test-verified)

  • M2 — source-of-truth cutover: training registry → Gemma 4 E2B/E4B/12B/31B (eliza-1-2b/4b/9b/27b); memory_calc vocab 248320→262144; abliterate dense surgery; catalog/types/device-fit (tokenizerFamily gemma4, separate-drafter MTP, stock-q8_0 KV); voice EOT <|im_end|><end_of_turn>; vision family qwen3-vlgemma-vl; AGENTS/native contracts. ~250 files swept (mechanical + load-bearing), strict KEEP of frozen Qwen3-ASR/OmniVoice/Embedding/turn-detector lineage + banned-name guards.
  • M6 — kernels: keep geometry-agnostic TurboQuant weight-quant; QJL/PolarQuant/turbo3_tcq KV kernels → optional for Gemma (stock q8_0 KV; head_dim=128-coupled). Dead Qwen-hybrid code removed (hybrid_cache, eagle3, serve_vllm hybrid). Manifest validator reconciled.
  • M9 — eliza-1-only: generic-GGUF backend + multi-model selection removed; RuntimeClass collapsed to fused-eliza1.
  • M10a — cloud HF-proxy: packages/cloud-api/v1/hf-proxy + shared/hf-proxy.ts — HF downloads route through Eliza Cloud (no local HF keys).
  • M10b — memory: dynamic fit-to-RAM quant/context selection (per-token q8_0 KV rate), LRU estimatedMb fix, desktop bench harness.
  • M8 (code): publish pipeline Gemma-synced; assemble_local_gemma_bundle.py builds a verified-loadable E2B bundle; legacy-tier purge helper.
  • M3/M4/M5 (native, in the submodule at the M3 tip): multi-backend libelizainference FFI seam + LiteRT/MLX/CoreML scaffolds (gated off; compile-verified).

Verification

  • @elizaos/plugin-local-inference: 179 files, 1869 tests pass, 0 fail (vitest).
  • @elizaos/shared: 65 files, 947 pass, 0 fail. cloud-api/cloud-shared/ui/benchmarks typecheck 0 errors.
  • Gemma 4 E2B runs on the fork (CPU): pp64 56.6 / tg32 5.84; vision (mmproj) loads + reasons; assembled bundle loads.

Deliberately NOT in this PR (gated, tracked in #9033)

  • The 603-commit llama.cpp upstream merge (CPU-verified, fork PR Sync fork to upstream master (+604 commits) — Gemma4 verified llama.cpp#29) — submodule pointer stays at the M3 tip until GPU device-verify.
  • HF elizaos/eliza-1 Gemma bundle upload — gated on the non-standard MTP-drafter safetensors→GGUF conversion (base ships MTP-disabled meanwhile).
  • GPU-kernel runtime verification (CUDA/Metal/Vulkan) + on-device (Mac/iOS/Pixel) — hardware-gated.

🤖 Generated with Claude Code

lalalune and others added 9 commits June 22, 2026 13:22
Gemma 4 E2B+E4B text gen + E2B vision (mmproj) + assembled-bundle load all
verified through the fork on CPU (stock-q8_0 KV path). Device/CUDA/Metal lanes
remain hardware-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…a auto-provision)

Make Feed deployable as a single always-on Railway web service (reusing the
existing Steward + cloud inference), with no external cron and no migrate step:
- startInternalCronLoop(): in-process scheduler (gated on
  ENABLE_INTERNAL_CRON_SCHEDULER=true) that fires the game loop's entry crons
  (game-tick fans out the rest) every 60s against the local server with the
  CRON_SECRET — so one web container runs the live game. Wired at boot in
  instrumentation.ts.
- railway.json: Nixpacks build (bun install + feed web build) + healthcheck.
- scripts/railway-start.sh: ensure schema via drizzle-kit push (the migration
  history has parallel 0000 baselines that cannot apply to a fresh DB), then
  next start — auto-provisioned 'just works' boot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oud inference)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ernel contract (#9033)

The M6 schema change (REQUIRED_KERNELS_BY_TIER → turboquant_q4 only; QJL/Polar/
turbo3_tcq optional for Gemma's stock-q8_0 KV) left the validator + tests on the
old Qwen contract:
- validator.ts: drop the 'ctx>64k requires turbo3_tcq' hard rule (Gemma handles
  long context via native windowed-SWA + shared-KV; turbo3_tcq is now optional).
- manifest.test.ts: assert the Gemma required set (turboquant_q4 only), turbo3_tcq
  optional-when-long-context is now accepted; retarget rejection triggers.
- delete obsolete generic-gguf tests (backend-runtime-class.test.ts,
  assignment-not-servable-route.test.ts — M9 removed those code paths); trim the
  removed canServeRuntimeClassOnHost suite from assignment-validation.test.ts
  (its setAssignment boundary tests still own the non-eliza-1-rejection contract).

Gates: manifest 49/49, the manifest+assignment+catalog combo 72/72, typecheck 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d, idempotent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t drift (#9033)

- memory-benchmark: plannedKvQuant qjl1_256→q8_0 (Gemma stock KV).
- fused-eliza1-no-regression: decideBackend no longer carries a runtimeClass
  field or a generic-gguf backend (eliza-1-only); assert backend==llama-cpp for
  both known + unknown catalog entries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…9033)

- eot-scorer: <|im_end|>→<end_of_turn>, <|im_start|>user→<start_of_turn>user (M2 EOT).
- text-bundle + mmproj-routing: same-file NextN MTP → separate official Gemma
  drafter (mtp/drafter-<slug>.gguf component + runtime.mtp.drafterFile, draftMax 4).
  Also fixed a real test-bundle drafter-path mismatch the catalog change exposed.
- vision-describe: cache family qwen3-vl→gemma-vl (M2 vision default).
- downloader: HF bearer → Eliza Cloud API-key via cloud HF-proxy (M10a).

Full plugin-local-inference suite via vitest: 179 files, 1869 pass, 0 fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r develop merge (#9033)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 608fc9b2-6ea7-4879-abe3-3790ff8cbe78

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/gemma4-cutover

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@lalalune lalalune merged commit 3f3463d into develop Jun 22, 2026
30 checks passed
@lalalune lalalune deleted the feat/gemma4-cutover branch June 22, 2026 21:06
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude encountered an error —— View job


I'll analyze this and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local inference is bundle-locked to eliza-1 on desktop: surface real local model selection and a generic GGUF engine path

1 participant