feat(omnivoice): token-by-token streaming vision describe (ABI v13) by lalalune · Pull Request #33 · elizaOS/llama.cpp

lalalune · 2026-06-24T07:38:13Z

Adds eliza_inference_describe_image_stream_open + eliza_inference_vision_stream_supported (ABI 12 → 13).

_stream_open runs the same mmproj prefill as eliza_inference_describe_image (mtmd_tokenize + mtmd_helper_eval_chunks) but returns an EliLlmStream * primed with the image+prompt KV instead of decoding into a buffer. The caller then PULLS tokens with the existing eliza_inference_llm_stream_next loop and frees via eliza_inference_llm_stream_close — reusing the entire streaming-LLM machinery so a vision description streams token-by-token through the same path as chat text.

Pull model (not a callback/push): the host event loop yields between _next steps, so chunks reach the UI live; a push callback would block the caller for the whole decode. The stream carries a greedy sampler + ELIZA_VISION_MAX_TOKENS cap and mtp = null (plain fixed-KV decode path).

Additive + gated on the existing -DELIZA_ENABLE_VISION: a v12 caller is unaffected; a v12 library reports vision_stream_supported() == 0 so loaders fall back to the buffered _describe_image.

Validated on Windows CPU with SmolVLM-500M (mtmd): streams 256 token chunks with real OCR. Consumed by elizaOS/eliza#9289 (JS cascade + handler wiring) — that PR degrades gracefully until this lands and the gitlink is bumped.

🤖 Generated with Claude Code

…Apple/Android-DL) The kokoro subtree failed three distinct CI lanes, none backend-specific: - PIC: kokoro_lib is a STATIC archive folded PRIVATE into the fused SHARED libelizainference.so, but it never set POSITION_INDEPENDENT_CODE, so ld rejected its objects on every BUILD_SHARED_LIBS=ON link ("recompile with -fPIC", R_X86_64_PC32 on x86-64 / R_AARCH64_ADR_PREL_PG_HI21 on arm64) — breaking the openvino, sycl, vulkan and virtgpu builds. Set PIC ON, mirroring eliza_voice_classifiers in the sibling omnivoice subtree. - Apple: kokoro-tts is a CLI harness but CMake defaults Apple executables to MACOSX_BUNDLE, so `install(TARGETS kokoro-tts RUNTIME)` failed configure with "no BUNDLE DESTINATION for MACOSX_BUNDLE executable" on every ios/tvos/ visionos/macos target. Force MACOSX_BUNDLE OFF. - Android: kokoro.cpp called ggml_backend_cpu_init() directly, which is an undefined symbol under -DGGML_BACKEND_DL (the CPU backend is a loadable module). Switch to the registry API (ggml_backend_load_all() + ggml_backend_init_by_type(GGML_BACKEND_DEVICE_TYPE_CPU, nullptr)), matching omnivoice; works in both DL and statically-linked builds. Compile-validated on MSVC (kokoro_lib builds); the Linux/Apple effects are CMake config + a portable registry call requiring no backend SDK to be correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add `eliza_inference_describe_image_stream_open` + `eliza_inference_vision_stream_supported` (ABI 12 -> 13). The open call runs the SAME mmproj prefill as `eliza_inference_describe_image` (mtmd_tokenize + mtmd_helper_eval_chunks) but, instead of decoding the whole description into a buffer, returns an `EliLlmStream *` primed with the image+prompt KV. The caller then PULLS tokens with the existing `eliza_inference_llm_stream_next` loop and frees the handle with `eliza_inference_llm_stream_close` — reusing the entire streaming-LLM machinery, so a vision description streams token-by-token through the same path as chat text (a pull model, so the host event loop yields between steps; a callback/push model would block the caller for the whole decode). The returned stream carries a greedy sampler + ELIZA_VISION_MAX_TOKENS cap and no MTP engine (vision uses the plain fixed-KV decode path). Additive + gated on the existing -DELIZA_ENABLE_VISION flag: a v12 caller is unaffected and a v12 library reports vision_stream_supported() == 0, so loaders fall back to the buffered _describe_image. Validated on Windows CPU (SmolVLM-500M mtmd): streams 256 token chunks with real OCR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-24T07:38:22Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 14a650e0-b886-4d9e-8631-01e95d24a474

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/vision-stream-describe-abi-v13

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

lalalune and others added 2 commits June 22, 2026 13:13

lalalune mentioned this pull request Jun 24, 2026

EPIC — Computer Use × Vision: full-screen OCR, continuous low-token screen understanding, local vision models, full cross-platform control elizaOS/eliza#9105

Closed

github-actions Bot added the examples label Jun 24, 2026

lalalune merged commit 91fd05b into main Jun 24, 2026
19 of 39 checks passed

lalalune mentioned this pull request Jun 24, 2026

chore(local-inference): bump llama.cpp gitlink to ABI v13 (token-by-token vision) elizaOS/eliza#9507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(omnivoice): token-by-token streaming vision describe (ABI v13)#33

feat(omnivoice): token-by-token streaming vision describe (ABI v13)#33
lalalune merged 2 commits into
mainfrom
feat/vision-stream-describe-abi-v13

lalalune commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lalalune commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant