feat: Gemma 4 cutover — eliza-1 Qwen→Gemma, eliza-1-only stack, cloud HF-proxy, dynamic-fit memory (#9033)#9060
Conversation
Gemma 4 E2B+E4B text gen + E2B vision (mmproj) + assembled-bundle load all verified through the fork on CPU (stock-q8_0 KV path). Device/CUDA/Metal lanes remain hardware-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…a auto-provision) Make Feed deployable as a single always-on Railway web service (reusing the existing Steward + cloud inference), with no external cron and no migrate step: - startInternalCronLoop(): in-process scheduler (gated on ENABLE_INTERNAL_CRON_SCHEDULER=true) that fires the game loop's entry crons (game-tick fans out the rest) every 60s against the local server with the CRON_SECRET — so one web container runs the live game. Wired at boot in instrumentation.ts. - railway.json: Nixpacks build (bun install + feed web build) + healthcheck. - scripts/railway-start.sh: ensure schema via drizzle-kit push (the migration history has parallel 0000 baselines that cannot apply to a fresh DB), then next start — auto-provisioned 'just works' boot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oud inference) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ernel contract (#9033) The M6 schema change (REQUIRED_KERNELS_BY_TIER → turboquant_q4 only; QJL/Polar/ turbo3_tcq optional for Gemma's stock-q8_0 KV) left the validator + tests on the old Qwen contract: - validator.ts: drop the 'ctx>64k requires turbo3_tcq' hard rule (Gemma handles long context via native windowed-SWA + shared-KV; turbo3_tcq is now optional). - manifest.test.ts: assert the Gemma required set (turboquant_q4 only), turbo3_tcq optional-when-long-context is now accepted; retarget rejection triggers. - delete obsolete generic-gguf tests (backend-runtime-class.test.ts, assignment-not-servable-route.test.ts — M9 removed those code paths); trim the removed canServeRuntimeClassOnHost suite from assignment-validation.test.ts (its setAssignment boundary tests still own the non-eliza-1-rejection contract). Gates: manifest 49/49, the manifest+assignment+catalog combo 72/72, typecheck 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d, idempotent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t drift (#9033) - memory-benchmark: plannedKvQuant qjl1_256→q8_0 (Gemma stock KV). - fused-eliza1-no-regression: decideBackend no longer carries a runtimeClass field or a generic-gguf backend (eliza-1-only); assert backend==llama-cpp for both known + unknown catalog entries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…9033) - eot-scorer: <|im_end|>→<end_of_turn>, <|im_start|>user→<start_of_turn>user (M2 EOT). - text-bundle + mmproj-routing: same-file NextN MTP → separate official Gemma drafter (mtp/drafter-<slug>.gguf component + runtime.mtp.drafterFile, draftMax 4). Also fixed a real test-bundle drafter-path mismatch the catalog change exposed. - vision-describe: cache family qwen3-vl→gemma-vl (M2 vision default). - downloader: HF bearer → Eliza Cloud API-key via cloud HF-proxy (M10a). Full plugin-local-inference suite via vitest: 179 files, 1869 pass, 0 fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r develop merge (#9033) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Cuts the eliza-1 local model line from the Qwen3.5/3.6 backbone to Gemma 4 end-to-end, and consolidates the local stack to eliza-1-only. Tracking: #9033. Supersedes #8794 (closed); resolves #8808 (closed, generic-GGUF removed); addresses #8807/#8809 (M10).
What's in this PR (all CPU/test-verified)
memory_calcvocab 248320→262144; abliterate dense surgery; catalog/types/device-fit (tokenizerFamilygemma4, separate-drafter MTP, stock-q8_0 KV); voice EOT<|im_end|>→<end_of_turn>; vision familyqwen3-vl→gemma-vl; AGENTS/native contracts. ~250 files swept (mechanical + load-bearing), strict KEEP of frozen Qwen3-ASR/OmniVoice/Embedding/turn-detector lineage + banned-name guards.RuntimeClasscollapsed tofused-eliza1.packages/cloud-api/v1/hf-proxy+shared/hf-proxy.ts— HF downloads route through Eliza Cloud (no local HF keys).estimatedMbfix, desktop bench harness.assemble_local_gemma_bundle.pybuilds a verified-loadable E2B bundle; legacy-tier purge helper.libelizainferenceFFI seam + LiteRT/MLX/CoreML scaffolds (gated off; compile-verified).Verification
@elizaos/plugin-local-inference: 179 files, 1869 tests pass, 0 fail (vitest).@elizaos/shared: 65 files, 947 pass, 0 fail. cloud-api/cloud-shared/ui/benchmarks typecheck 0 errors.Deliberately NOT in this PR (gated, tracked in #9033)
elizaos/eliza-1Gemma bundle upload — gated on the non-standard MTP-drafter safetensors→GGUF conversion (base ships MTP-disabled meanwhile).🤖 Generated with Claude Code