feat: implementation of multimodal runner#892
Open
NorbertKlockiewicz wants to merge 46 commits intomainfrom
Open
feat: implementation of multimodal runner#892NorbertKlockiewicz wants to merge 46 commits intomainfrom
NorbertKlockiewicz wants to merge 46 commits intomainfrom
Conversation
3d22a30 to
0ac42b4
Compare
- Add UnifiedRunner that auto-detects PTE layout at load time (forward method → text-only, token_embedding+text_decoder → multimodal) - Merge MultimodalLLM into LLM using UnifiedRunner - VLMs now have full feature parity: multi-turn, countTextTokens, getMaxContextLength, setCountInterval, setTimeInterval - Remove Runner, MultimodalRunner, MultimodalLLM classes - Add sendMessageWithImage to LLMController and useLLM hook - Remove useMultimodalLLM — callers use useLLM with isMultimodal: true - Migrate multimodal_llm example app to useLLM Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd fix token generation bugs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…age cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…using generateMultimodal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t before generateMultimodal
…emove tokenizerConfig guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndMessage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…modal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… C++ splits on placeholder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e double reset, fix max_context_len fallback, require tokenizerConfigSource, pass tools in multimodal branch, capture callback by value
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… EOS IDs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kenCount JSI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… runner classes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ad image shape from model metadata Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mage_token from config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds vision/multimodal support to useLLM: load a VLM by passing capabilities: ['vision'], then use sendMessage(text, { imagePath }) to send messages with images. Under the hood this introduces a pluggable encoder architecture (IEncoder / VisionEncoder), a dedicated MultimodalRunner, and a refactored BaseLLMRunner with cleaner ownership and shared state. Also exposes getVisualTokenCount() JSI method for accurate token counting with images. No changes to the text-only path.
Documentation and Tests: yet to be written once the changes to the runner are accepted by reviewers
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Run the
llmexample app, selectmultimodal llmscreen. Select an image and prompt the model.Screenshots
Related issues
Checklist
Additional notes