Skip to content

[model] fix: register both MTP submodule spellings in qwen3_next_bridge#11

Open
Zhichenzzz wants to merge 1 commit into
bridgefrom
fix/qwen3next-mtp-rename
Open

[model] fix: register both MTP submodule spellings in qwen3_next_bridge#11
Zhichenzzz wants to merge 1 commit into
bridgefrom
fix/qwen3next-mtp-rename

Conversation

@Zhichenzzz

Copy link
Copy Markdown

Megatron-LM renamed the MTP submodule transformer_layermtp_model_layer (upstream NVIDIA main has no old-name API surface left). qwen3_next_bridge still registered only the old spelling, so MTP weights fail to map against a renamed Megatron-LM — while qwen35_vl_bridge and deepseek/common.py already expect the new name, and glm45_bridge handles both.

Register both spellings (glm45's approach) for all 12 MTP mappings (6 AutoMapping dict entries + QKV/GatedMLP/Replicated special mappings) so the bridge works with either Megatron-LM version. Mapping entries whose megatron param does not exist at runtime are never consulted, so the extra spelling is inert.

Validated: mapping_registry().megatron_to_hf_lookup resolves all 8 probe names (both spellings × router/QKV/expert-fc1/shared-gate) to the correct mapping types.

Companion PRs: radixark/Megatron-LM#54 (the rename itself), radixark/miles#1307 (miles converters).

…bridge

Megatron-LM renamed the MTP submodule transformer_layer -> mtp_model_layer.
Register every qwen3_next MTP weight mapping (MoE router, layernorms, attention,
experts, shared expert) under both spellings so the bridge converts checkpoints
from either Megatron-LM version, mirroring the miles converter change.
@Zhichenzzz Zhichenzzz force-pushed the fix/qwen3next-mtp-rename branch from 4728c32 to a06b06d Compare June 18, 2026 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant