feat: implementation of multimodal runner by NorbertKlockiewicz · Pull Request #892 · software-mansion/react-native-executorch

NorbertKlockiewicz · 2026-03-02T09:12:48Z

Description

Adds vision/multimodal support to useLLM: load a VLM by passing capabilities: ['vision'], then use sendMessage(text, { imagePath }) to send messages with images. Under the hood this introduces a pluggable encoder architecture (IEncoder / VisionEncoder), a dedicated MultimodalRunner, and a refactored BaseLLMRunner with cleaner ownership and shared state. Also exposes getVisualTokenCount() JSI method for accurate token counting with images. No changes to the text-only path.

Documentation and Tests: yet to be written once the changes to the runner are accepted by reviewers

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Run the llm example app, select multimodal llm screen. Select an image and prompt the model.

Screenshots

Related issues

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

- Add UnifiedRunner that auto-detects PTE layout at load time (forward method → text-only, token_embedding+text_decoder → multimodal) - Merge MultimodalLLM into LLM using UnifiedRunner - VLMs now have full feature parity: multi-turn, countTextTokens, getMaxContextLength, setCountInterval, setTimeInterval - Remove Runner, MultimodalRunner, MultimodalLLM classes - Add sendMessageWithImage to LLMController and useLLM hook - Remove useMultimodalLLM — callers use useLLM with isMultimodal: true - Migrate multimodal_llm example app to useLLM Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nd fix token generation bugs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…age cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…using generateMultimodal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t before generateMultimodal

… send

…e bubble

…emove tokenizerConfig guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ndMessage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…modal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… C++ splits on placeholder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e double reset, fix max_context_len fallback, require tokenizerConfigSource, pass tools in multimodal branch, capture callback by value

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… EOS IDs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…g cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…run_tests.sh

…kenCount JSI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… runner classes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ad image shape from model metadata Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…mage_token from config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: initial implementation of multimodal runner with lfm vlm

0ac42b4

NorbertKlockiewicz force-pushed the @nk/lfm-vlm branch from 3d22a30 to 0ac42b4 Compare March 2, 2026 09:15

NorbertKlockiewicz and others added 28 commits March 2, 2026 11:13

feat: add conversational VLM demo with multimodal/text-only support a…

bf50ae2

…nd fix token generation bugs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: default UnifiedRunner temperature to 0.8 and topp to 0.9

1695f7e

feat: add NativeMessage struct and JSI conversion for message history

b660b0f

feat: declare generateMultimodal on LLM and register JSI binding

4331bde

fix: remove redundant unordered_map and vector includes from LLM.h

d6530e4

feat: implement generateMultimodal with per-turn chat template and im…

d261a45

…age cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add mediaPath to Message, remove sendMessageWithImage from LLMType

d91a64a

feat: replace sendMessageWithImage with sendMessage(msg, mediaPath?) …

49f5af6

…using generateMultimodal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: use updatedHistory for multimodal routing, remove redundant rese…

d07ce65

…t before generateMultimodal

fix: skip system messages in generateMultimodal, clear imageUri after…

b29f74c

… send

feat: show image thumbnail in user message bubble when mediaPath is set

e1d0f08

fix: use resizeMode contain so full image is always visible in messag…

11cab57

…e bubble

refactor: derive isMultimodal from load param, unify load branches, r…

9ddd5d7

…emove tokenizerConfig guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: remove isMultimodal flag, inline generateMultimodal into se…

7d2ce9b

…ndMessage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: make tokenizerConfigSource required throughout load pipeline

87fa1f0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: prepend system prompt to multimodal history before generateMulti…

b398952

…modal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: unify generate — Jinja renders prompt+<image> tokens in JS,…

a0b80e3

… C++ splits on placeholder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: collect imagePaths from messageHistoryWithPrompt, not full history

13f631e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: typing

76f9c7c

feat: correctly calculate image tokens

ab8c088

fix: add missing import

c211ba9

fix: fall back to max_seq_len when model doesn't export max_context_len

0e29349

fix: address code review — error on image/placeholder mismatch, remov…

520233f

…e double reset, fix max_context_len fallback, require tokenizerConfigSource, pass tools in multimodal branch, capture callback by value

feat: dynamic sendMessage type based on flag

dfd1a81

fix: model stopping generation in the middle of its answer

3d67b66

feat: add LLMCapability type and parameterize LLMTypeMultimodal

2b26c5d

feat: update sendMessage to accept typed media object

8d1b4eb

NorbertKlockiewicz and others added 16 commits March 3, 2026 11:46

feat: add LFM2_VL_1_6B and LFM2_VL_1_6B_QUANTIZED model constants

f3edf5d

feat: add IEncoder interface and VisionEncoder

6eba3f7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: address vision_encoder quality review issues

0819c20

feat: add BaseLLMRunner with shared state and load()

1de96bb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add TextRunner

e08b391

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add MultimodalRunner with plug-in encoder map

6703559

feat: wire capabilities through LLM.cpp, delete UnifiedRunner

a1edb3c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: forward capabilities from LLMController to native

7076a9f

feat: add logging, fix metadata application, fix module ownership and…

96525bc

… EOS IDs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: replace Image class with ImagePath + VisionEncoder embeddin…

b3ce27e

…g cache Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add TextRunnerTests and VLMTests suites, register in CMake and …

ce6856d

…run_tests.sh

refactor: unify multimodal/text paths in sendMessage, add getVisualTo…

4184bb3

…kenCount JSI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: replace example namespace with rnexecutorch::llm::runner in…

c88d97c

… runner classes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: collapse BaseLLMRunner constructor, deduplicate eos_ids, re…

c7357d3

…ad image shape from model metadata Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: comments etc.

69d454b

fix: cap VLM generation tokens, propagate encoder load errors, pass i…

6a3857b

…mage_token from config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NorbertKlockiewicz changed the title ~~feat: initial implementation of multimodal runner with lfm vlm~~ feat implementation of multimodal runner Mar 5, 2026

revert: remove TextRunnerTests and VLMTests suites

551a306

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NorbertKlockiewicz marked this pull request as ready for review March 5, 2026 16:19

NorbertKlockiewicz requested review from benITo47, chmjkb and msluszniak March 5, 2026 16:20

msluszniak changed the title ~~feat implementation of multimodal runner~~ feat: implementation of multimodal runner Mar 5, 2026

msluszniak assigned NorbertKlockiewicz Mar 5, 2026

msluszniak added the feature PRs that implement a new feature label Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implementation of multimodal runner#892

feat: implementation of multimodal runner#892
NorbertKlockiewicz wants to merge 46 commits intomainfrom
@nk/lfm-vlm

NorbertKlockiewicz commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NorbertKlockiewicz commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NorbertKlockiewicz commented Mar 2, 2026 •

edited

Loading