VLM: fix EmbeddedInput shape handling, error recovery, and parallel loading by stikves · Pull Request #79 · apple/coreai-models

stikves · 2026-07-02T20:12:24Z

Summary

EmbeddedInput.tokenCount: correctly handle 2D [seq_len, hidden_dim] tensors (was returning hidden_dim instead of seq_len). Now a stored property computed once at init.
EmbeddedInput.seqLen(of:): centralized helper for seq_len extraction — scatterMerge and other call sites use one canonical path instead of inline shape checks.
scatterMerge: replace precondition(float16) with a thrown error for bfloat16 model compatibility.
LLMRunnerMain: add comment explaining sequential PreparedModel.prepare (parallel caused runtime errors).
Removed premature --max-tiles flag (tiling model not implemented yet).

Test plan

Verify tokenCount returns correct seq_len for both 2D and 3D embeddings
Confirm bfloat16 model gets a catchable error instead of a crash
VLM inference still works end-to-end

- EmbeddedInput.tokenCount: handle 2D [seq_len, hidden_dim] vs 3D [batch, seq_len, hidden_dim] - scatterMerge: replace precondition(float16) with guard/throw for bfloat16 compatibility

@option

The @option attribute for "Maximum tiles" had no associated var, causing a compile error with ArgumentParser.

carinapeng

This looks good to me overall but before approving I think a few things should be fixed, one is on how we read maxTiles, and the other is consistency - seems correct for the 2d case but scatter merge still issues 3D: [1, seq, hidden]

And let's do the testing on the Qwen3 VL as well since that is released :)

…shape handling - Remove --max-tiles CLI option (tiling model not implemented yet) - Move seq_len extraction into EmbeddedInput.seqLen(of:) so scatterMerge and other call sites use one canonical shape resolution path - tokenCount is now a stored property computed once at init

All current VLM models produce 3D embeddings. Drop the 2D fallback and the seqLen helper -- tokenCount is just shape[1].

VLM: fix tokenCount for 2D tensors, replace precondition with throw

7fdd585

- EmbeddedInput.tokenCount: handle 2D [seq_len, hidden_dim] vs 3D [batch, seq_len, hidden_dim] - scatterMerge: replace precondition(float16) with guard/throw for bfloat16 compatibility

stikves force-pushed the sukru/vlm-fixes branch from e87f992 to 7fdd585 Compare July 2, 2026 20:20

stikves self-assigned this Jul 2, 2026

stikves marked this pull request as ready for review July 2, 2026 20:22

stikves requested review from carinapeng, kevchengcodes and tjia1818 July 2, 2026 20:35

kevchengcodes reviewed Jul 2, 2026

View reviewed changes

Comment thread swift/Sources/Tools/llm-runner/LLMRunnerMain.swift

Comment thread swift/Sources/CoreAILanguageModels/InferenceEngines/EmbeddedInput.swift Outdated

stikves and others added 2 commits July 2, 2026 14:34

Merge branch 'main' into sukru/vlm-fixes

ce7622b

Fix dangling @option for maxTiles (missing var declaration)

5cacbca

The @option attribute for "Maximum tiles" had no associated var, causing a compile error with ArgumentParser.

carinapeng reviewed Jul 3, 2026

View reviewed changes

Comment thread swift/Sources/Tools/llm-runner/LLMRunnerMain.swift Outdated

carinapeng reviewed Jul 3, 2026

View reviewed changes

Comment thread swift/Sources/CoreAILanguageModels/InferenceEngines/CoreAISequentialVLMEngine.swift

carinapeng reviewed Jul 3, 2026

View reviewed changes

Comment thread swift/Sources/CoreAILanguageModels/InferenceEngines/EmbeddedInput.swift Outdated

carinapeng reviewed Jul 3, 2026

View reviewed changes

stikves added 3 commits July 2, 2026 18:31

Simplify EmbeddedInput: assume 3D [batch, seq_len, hidden_dim] layout

8fde6f1

All current VLM models produce 3D embeddings. Drop the 2D fallback and the seqLen helper -- tokenCount is just shape[1].

EmbeddedInput: validate exactly 3D shape at init, throw on mismatch

b5aa78d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VLM: fix EmbeddedInput shape handling, error recovery, and parallel loading#79

VLM: fix EmbeddedInput shape handling, error recovery, and parallel loading#79
stikves wants to merge 6 commits into
apple:mainfrom
stikves:sukru/vlm-fixes

stikves commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carinapeng left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

stikves commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carinapeng left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stikves commented Jul 2, 2026 •

edited

Loading