[omni] Qwen3-Omni thinker: HF bridge + /generate rollout adapter#1
Open
yxs wants to merge 2 commits into
Open
Conversation
4add1c5 to
4629316
Compare
…>miles loop) First omni<->miles RL loop on the Qwen3-Omni-30B-A3B thinker (text MoE).
4629316 to
6c60ccd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RL loop wiring the Qwen3-Omni-30B-A3B thinker (text MoE) into miles GRPO, against sglang-omni's
rollout
/generate: sgl-project/sglang-omni#785and distributed weight-sync: sgl-project/sglang-omni#784
endpoints. Text-only thinker path.
Components
tools/extract_qwen3_omni_thinker.py— extracts the standalone Qwen3-MoE text backbone from the composite omni checkpoint (drops audio/visual,model_type=qwen3_moe) for the existing bridge..../megatron_to_hf/qwen3omni_moe.py— Megatron→HF converter,body.-prefixed (the namespace [RL] distributed weight-sync sgl-project/sglang-omni#784 demuxes on);--model-name qwen3omni_moe.miles/rollout/generate_hub/omni_thinker.py— thin rollout adapter: reuses miles' payload/parse, adds the omni fields (output_modalities=["text"],return_omni_rollout=False,repetition_penalty=1.0).Notes
/generateemits temp-1 (pre-temperature) full-vocab logprobs — measured directly: with greedy decoding the returned logprobs are identical at temperature 1.0 and 2.0. miles' train recompute divides logits byrollout_temperature, so the two agree only at temp=1; the adapter assertsrollout_temperature == 1.0.repetition_penaltyis forced to 1.0 (the trainer recompute can't replay a repetition penalty).--sglang-router-ip/portat a standalone omni server; TIS +--get-mismatch-metricsabsorb the gap.Verification
tests/fast/test_qwen3_omni_thinker.py— 13 passed (ion-b200 miles image): extraction,body.*converter, adapter contract + guards.omni_thinker.generateagainst a live [RL] add Miles-compatible /generate rollout endpoint sgl-project/sglang-omni#785 omni server (Qwen3-Omni-30B-A3B on B200):input_idsrollout returns text,output_token_logprobslength ==completion_tokens,cached_tokensint (no crash), finish_reason→status, weight_version echoed; logprob convention measured (temp-1).