Skip to content

feat(0406): Add Gemma4 Unified vision-language support#1036

Open
vbaddi wants to merge 3 commits into
release/v1.22.0_tmpfrom
feature/gemma4-unified-support
Open

feat(0406): Add Gemma4 Unified vision-language support#1036
vbaddi wants to merge 3 commits into
release/v1.22.0_tmpfrom
feature/gemma4-unified-support

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented Jun 4, 2026

Summary

  • Add QEff Gemma4 Unified wrappers for text, causal LM, and conditional-generation paths.
  • Wire Gemma4 Unified into model replacement, supported architecture, KV-cache, and auto-model flows.
  • Add dual-QPC vision/lang export support and an image-text example for Gemma4 Unified.
  • Preserve prefill chunking behavior by keeping compile prefill seq_len independent of image prompt length.
  • Use vision_config.num_soft_tokens for Gemma4 Unified vision token sizing.

Vision Model Note

Gemma4 Unified changes the vision path from the older heavy vision tower style to a lightweight/encoder-free image embedding path: image patches are embedded through embed_vision, then inserted into the language stream as soft vision tokens. The full model config uses num_soft_tokens=280, so export/runtime now size the retained vision embeddings from that value instead of falling back to older defaults. This matches the model behavior described in the Gemma4 visual guide context (newsletter.maartengrootendorst.com)

Validation

  • Verified HF original PyTorch vs QEff modified PyTorch parity.
  • Verified QEff PyTorch vs ONNX Runtime parity for the language graph.
  • Ran reduced cached-model E2E smoke with QEFF_CTX_LEN=2048, QEFF_PREFILL_SEQ_LEN=128, and QEFF_GENERATION_LEN=64.

cc: @anujgupt-github @quic-hemagnih @tchawada

Add QEff wrappers and transform wiring for Gemma4 Unified text and conditional-generation models.

This adds encoder-free vision embedding support, Gemma4 Unified cache/mask handling for mixed sliding/full attention, dual-QPC export helpers, prefill chunking lens
fixes, and dummy parity/ONNX quickcheck coverage.

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi added the enhancement New feature or request label Jun 4, 2026
vbaddi added 2 commits June 4, 2026 13:39
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants