feat(0406): Add Gemma4 Unified vision-language support by vbaddi · Pull Request #1036 · quic/efficient-transformers

vbaddi · 2026-06-04T08:03:36Z

Summary

Add QEff Gemma4 Unified wrappers for text, causal LM, and conditional-generation paths.
Wire Gemma4 Unified into model replacement, supported architecture, KV-cache, and auto-model flows.
Add dual-QPC vision/lang export support and an image-text example for Gemma4 Unified.
Preserve prefill chunking behavior by keeping compile prefill seq_len independent of image prompt length.
Use vision_config.num_soft_tokens for Gemma4 Unified vision token sizing.

Vision Model Note

Gemma4 Unified changes the vision path from the older heavy vision tower style to a lightweight/encoder-free image embedding path: image patches are embedded through embed_vision, then inserted into the language stream as soft vision tokens. The full model config uses num_soft_tokens=280, so export/runtime now size the retained vision embeddings from that value instead of falling back to older defaults. This matches the model behavior described in the Gemma4 visual guide context (newsletter.maartengrootendorst.com)

Validation

Verified HF original PyTorch vs QEff modified PyTorch parity.
Verified QEff PyTorch vs ONNX Runtime parity for the language graph.
Ran reduced cached-model E2E smoke with QEFF_CTX_LEN=2048, QEFF_PREFILL_SEQ_LEN=128, and QEFF_GENERATION_LEN=64.

cc: @anujgupt-github @quic-hemagnih @tchawada

Add QEff wrappers and transform wiring for Gemma4 Unified text and conditional-generation models. This adds encoder-free vision embedding support, Gemma4 Unified cache/mask handling for mixed sliding/full attention, dual-QPC export helpers, prefill chunking lens fixes, and dummy parity/ONNX quickcheck coverage. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi assigned vbaddi and tchawada Jun 4, 2026

vbaddi added the enhancement New feature or request label Jun 4, 2026

vbaddi added 2 commits June 4, 2026 13:39

nit: update modeling file to remove unused imports and fix lint

7e1d3f2

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: update the transformers version and add license to __init__

d773e86

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0406): Add Gemma4 Unified vision-language support#1036

feat(0406): Add Gemma4 Unified vision-language support#1036
vbaddi wants to merge 3 commits into
release/v1.22.0_tmpfrom
feature/gemma4-unified-support

vbaddi commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vision Model Note

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vbaddi commented Jun 4, 2026 •

edited

Loading