fix(kokoro): accept published GGUF tensor names + emit F16 weights (#9588)#36
Merged
Merged
Conversation
…9588)
The fused-lib loader required unprefixed dev tensor names (bert.embd.tok.weight,
pred.F0_proj.weight, dec.gen.conv_post.weight) while the published elizaOS Kokoro
bundles use the kokoro.* namespace and mainline llama.cpp names
(kokoro.bert.layer.attn_q.weight). The loader, the shipped GGUF, and the in-tree
converter all disagreed, so kokoro_init_from_file failed its weight sanity check
for every available GGUF and the engine silently fell through to OmniVoice/stub.
Two root causes, two fixes:
1. Tensor-name mismatch. Centralize the accepted name variants in
kokoro-tensor-names.h (published + mainline + legacy dev) and look them up via
require_tensor_any(). Required tensors are now a HARD load error instead of the
old "non-fatal during J2 — treat absent tensors as zero" path, which produced
shape-correct but acoustically degraded (noise) output and masked this bug.
2. All-F32 GGUFs load but synthesize noise. The converter now emits weight
matrices / conv kernels (ndim >= 2) as F16 and keeps biases/norms F32, matching
the dtype layout the fused forward pass expects. The stub emitter also writes
kokoro.gen.conv_post.{weight,bias} so it passes the new required-tensor check.
Adds test_kokoro_tensor_names.cpp (LLAMA_BUILD_TESTS-gated) asserting the alias
picker resolves published, mainline, and legacy schemas and returns null when a
tensor is genuinely absent. Closes the loader half of #9588; regenerating +
republishing the bundle GGUF with this converter is the remaining ops step.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the loader/converter side of elizaOS/eliza#9588. The fused-lib loader required unprefixed dev tensor names while the published elizaOS Kokoro bundles use the
kokoro.*/ mainline llama.cpp namespace, sokokoro_init_from_filefailed its weight sanity check for every available GGUF and the engine silently fell through to OmniVoice/stub.Changes
kokoro-tensor-names.hcentralizes the accepted tensor-name variants (publishedkokoro.bert.token_embd.*, mainlinekokoro.bert.layer.*, and legacy unprefixedbert.embd.tok.*);require_tensor_any()resolves them at load. Missing required tensors are now a hard load error instead of the old "non-fatal during J2 — treat absent tensors as zero" path that produced noise and masked the failure.ndim >= 2) as F16, biases/norms stay F32 — the dtype layout the fused forward pass expects. All-F32 GGUFs load but synthesize noise. The stub emitter also writeskokoro.gen.conv_post.{weight,bias}for the new required-tensor check.test_kokoro_tensor_names.cpp(LLAMA_BUILD_TESTS-gated) covers published, mainline, legacy, and missing-tensor cases.Verification
test_kokoro_tensor_names.cppcompiles + passes (clang++ -std=c++17).kokoro.cpprecompiles cleanly against the new header (make kokoro_lib, Apple Metal Release).The matching eliza submodule pin bump is elizaOS/eliza#9684 (that PR pins to the same delta on the eliza-tracked lineage; this PR lands it on
mainso future pin bumps don't regress it).🤖 Generated with Claude Code