feat(elizainference): per-op backend seam + LiteRT C-API embed backend by lalalune · Pull Request #35 · elizaOS/llama.cpp

lalalune · 2026-06-24T23:06:13Z

Stacks on the M3 multi-backend seam (#27 / eliza/gemma-kv-swa-checkpoint-fix) and generalizes it from the streaming-LLM op to every on-device model op, plus lands the first real accelerator backend (LiteRT text embedding).

What

Generic per-op registry (backend-registry.h): one eliza_backend::Registry<F> holds the resolution policy — ELIZA_<MOD>_BACKEND/ELIZA_BACKEND hard-select → highest preference_rank() among available()+can_serve() → nullptr (the in-tree ggml path). Each modality adds a tiny factory interface + selector + a single FFI chokepoint that reuses it.
Per-op seams for embed, vision, asr, tts, eot — each routes to a backend that ships <bundle>/<modality>/* when present, else falls through to ggml. Inert by default: with no -DELIZA_ENABLE_* gate, nothing registers, select() returns nullptr, and every op is byte-identical to before.
LiteRT embed backend (backends/litert-embed-backend.cpp, gated ELIZA_ENABLE_LITERT) on the LiteRT Next C API (the C++ cc/ wrappers aren't standalone): env/model/compiled-model lifecycle + an NPU→GPU→CPU accelerator ladder (preference_rank 100/20/0) reading the in-graph-pooled [1,384] output. Serves <bundle>/embedding/*.tflite; auto-promotes to NPU on a Pixel-10/G5 or Qualcomm/MediaTek device, GPU-delegate (Mali) on a Tensor-G4. The WordPiece tokenizer + tensor binding are the one model-specific step (a converted all-MiniLM-L6-v2 .tflite + I/O signature exist; the binding is marked TODO(MANIFEST)).
Gate split: ELIZA_ENABLE_LITERT = the LiteRT C-API per-op backends (embed); new ELIZA_ENABLE_LITERT_LM = the streaming-LLM backend on the heavier LiteRT-LM Engine SDK (off until that SDK is built). This unblocks the C-API build, which otherwise tried to compile the LLM backend's litert::lm deps.

Verification

11/11 TUs compile (NDK 29 aarch64): the 5 inert per-op selectors with no gate, all 5 backend headers self-contained, and the gated litert-embed-backend.cpp against the staged LiteRT C SDK (load-bearing — fails without -I .../litert/include).
Adversarial review: inert-by-default + correct chokepoint placement/args confirmed across all 5 modalities; no correctness bugs.

SESSION-OPS-TODO.md documents extending the same pattern to the session ops (vad/wakeword/speaker/diariz).

🤖 Generated with Claude Code

Generalize the M3 streaming-LLM seam to ALL on-device model ops. A shared eliza_backend::Registry<F> (backend-registry.h) holds the resolution logic (ELIZA_<MOD>_BACKEND/ELIZA_BACKEND hard-select -> highest preference_rank among available()+can_serve() -> nullptr=ggml); each modality adds a tiny factory interface + selector + one FFI chokepoint. Wired for embed/vision/asr/tts/eot: each routes to a backend that ships <bundle>/<modality>/* when present, else falls through to the in-tree ggml path. Inert-by-default (no backend registered => select() returns nullptr => every op byte-identical to before). First real backend: LiteRT text embedding (backends/litert-embed-backend.cpp, gated ELIZA_ENABLE_LITERT) on the LiteRT Next *C* API (the C++ cc/ wrappers are not standalone): env/model/compiled-model lifecycle + NPU->GPU->CPU accelerator ladder (rank 100/20/0) + reads the in-graph-pooled [1,384] output; the WordPiece tokenizer + tensor binding are the one model-specific step (MANIFEST-gated). Serves <bundle>/embedding/*.tflite; auto-promotes to NPU on Pixel-10/G5 or Qualcomm/MediaTek silicon, GPU-delegate (Mali) on a Tensor-G4. Split the LiteRT gates: ELIZA_ENABLE_LITERT = the LiteRT C-API per-op backends (embed); new ELIZA_ENABLE_LITERT_LM = the streaming-LLM backend on the heavier LiteRT-LM Engine SDK (off until that SDK is built). SESSION-OPS-TODO.md documents the vad/wakeword/speaker/diariz extension. Verified: 11/11 TUs compile (inert selectors + self-contained headers + the gated embed backend against the LiteRT SDK); adversarial review confirms inert-by-default + correct chokepoints across all 5 modalities. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-24T23:06:22Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c53ac6a0-e57e-4a08-bee0-52d6749f0078

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/per-op-backend-seam-litert-embed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

lalalune merged commit 9be54e3 into eliza/gemma-kv-swa-checkpoint-fix Jun 24, 2026
6 of 36 checks passed

github-actions Bot added the examples label Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(elizainference): per-op backend seam + LiteRT C-API embed backend#35

feat(elizainference): per-op backend seam + LiteRT C-API embed backend#35
lalalune merged 1 commit into
eliza/gemma-kv-swa-checkpoint-fixfrom
feat/per-op-backend-seam-litert-embed

lalalune commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lalalune commented Jun 24, 2026

What

Verification

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants