feat: Qwen3.5 V2 hybrid runtime support#2
Draft
nuri-yoo wants to merge 1 commit into
Draft
Conversation
Two pieces:
1. tvm-runtime: add `get_from_any_array` Any-level helper.
Lets callers extract a single Tensor (or ObjectRef) from a
heterogeneous Array<Any> without needing the full Array<Tensor>
downcast — required by the V2 hybrid prefill/decode return shape
`[logits, kv_state, rnn_state]` which mixes Tensor and state
objects.
2. 3rdparty/tvm: bump to brekkylab/relax @ 307c92915
("feat: Qwen3.5 V2 VM bytecode runtime support" — PR
brekkylab/relax#3) so the bundled runtime accepts V2 VM bytecode
('tirx' / 'tir' alias) and exposes the flat C-ABI shim that the
tvm-runtime-sys crate relies on for cross-dylib linking.
Verified end-to-end on macOS arm64 + Metal: Qwen3-{0.6B,8B} (V1
KvCache), Qwen3.5-{0.8B,2B,4B,9B} (V2 hybrid Gated DeltaNet),
BAAI/bge-m3 (V1 embedding) all load and run.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Qwen3.5 (V2 hybrid Gated DeltaNet) runtime support to the Rust shell. Two pieces.
Changes
tvm-runtime:get_from_any_arrayAny-level helper — extracts a singleTensor(orObjectRef) from a heterogeneousArray<Any>without needing the fullArray<Tensor>downcast. Required by the V2 hybrid prefill/decode return shape[logits, kv_state, rnn_state]which mixes Tensor and state objects.3rdparty/tvmsubmodule bump — points to brekkylab/relax @ 307c92915 (brekkylab/relax#3) so the bundled runtime:tirx/tiralias)Verification
End-to-end on macOS arm64 + Metal:
Dependencies
Depends on brekkylab/relax#3 — submodule pointer here references that PR's head commit.
Test plan
git submodule update --init --recursive)get_from_any_array