Skip to content

feat: Qwen3.5 V2 hybrid runtime support#2

Draft
nuri-yoo wants to merge 1 commit into
mainfrom
feat/qwen35-v2-runtime
Draft

feat: Qwen3.5 V2 hybrid runtime support#2
nuri-yoo wants to merge 1 commit into
mainfrom
feat/qwen35-v2-runtime

Conversation

@nuri-yoo
Copy link
Copy Markdown

@nuri-yoo nuri-yoo commented May 1, 2026

Summary

Adds Qwen3.5 (V2 hybrid Gated DeltaNet) runtime support to the Rust shell. Two pieces.

Changes

tvm-runtime: get_from_any_array Any-level helper — extracts a single Tensor (or ObjectRef) from a heterogeneous Array<Any> without needing the full Array<Tensor> downcast. Required by the V2 hybrid prefill/decode return shape [logits, kv_state, rnn_state] which mixes Tensor and state objects.

3rdparty/tvm submodule bump — points to brekkylab/relax @ 307c92915 (brekkylab/relax#3) so the bundled runtime:

  • accepts V2 VM bytecode (tirx / tir alias)
  • exposes the flat C-ABI shim that tvm-runtime-sys relies on for cross-dylib linking

Verification

End-to-end on macOS arm64 + Metal:

  • Qwen3-0.6B / Qwen3-8B (V1 KvCache)
  • Qwen3.5-0.8B / 2B / 4B / 9B (V2 hybrid Gated DeltaNet)
  • BAAI/bge-m3 (V1 embedding)

Dependencies

Depends on brekkylab/relax#3 — submodule pointer here references that PR's head commit.

Test plan

  • Build with new submodule pointer (git submodule update --init --recursive)
  • Verify existing V1 artifacts still load via the Rust shell
  • Verify V2 hybrid rt.dylib loads and produces correct output through get_from_any_array

Two pieces:

1. tvm-runtime: add `get_from_any_array` Any-level helper.
   Lets callers extract a single Tensor (or ObjectRef) from a
   heterogeneous Array<Any> without needing the full Array<Tensor>
   downcast — required by the V2 hybrid prefill/decode return shape
   `[logits, kv_state, rnn_state]` which mixes Tensor and state
   objects.

2. 3rdparty/tvm: bump to brekkylab/relax @ 307c92915
   ("feat: Qwen3.5 V2 VM bytecode runtime support" — PR
   brekkylab/relax#3) so the bundled runtime accepts V2 VM bytecode
   ('tirx' / 'tir' alias) and exposes the flat C-ABI shim that the
   tvm-runtime-sys crate relies on for cross-dylib linking.

Verified end-to-end on macOS arm64 + Metal: Qwen3-{0.6B,8B} (V1
KvCache), Qwen3.5-{0.8B,2B,4B,9B} (V2 hybrid Gated DeltaNet),
BAAI/bge-m3 (V1 embedding) all load and run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant