Skip to content

fix(0306): MoE prefill reductions for subfunction export#1028

Open
vbaddi wants to merge 2 commits into
release/v1.22.0_tmpfrom
fix/subfunction-reducesum-einsum
Open

fix(0306): MoE prefill reductions for subfunction export#1028
vbaddi wants to merge 2 commits into
release/v1.22.0_tmpfrom
fix/subfunction-reducesum-einsum

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented Jun 3, 2026

Summary

Replaces selected MoE prefill and expert aggregation reductions with equivalent einsum forms for GPT-OSS, GLM-MOE and Qwen3-MoE. Adds tiny-model ONNX subfunction quickchecks that verify decoder subfunctions contain einsum.

cc: @quic-rishinr @mohiso22

@vbaddi vbaddi self-assigned this Jun 3, 2026
@vbaddi vbaddi added bugfix 1.22 Release 1.22 candidate labels Jun 3, 2026
@vbaddi vbaddi force-pushed the fix/subfunction-reducesum-einsum branch from 829f6e9 to d44816b Compare June 3, 2026 18:23
vbaddi added 2 commits June 4, 2026 10:03
Replace ReduceSum-prone MoE prefill aggregation paths with equivalent einsum reductions for
GLM4-MoE, Qwen3-MoE, and GPT-OSS. Add tiny-model ONNX subfunction quickcheck coverage to
verify the exported decoder subfunctions include Einsum and expected MoE custom ops.

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Revert Qwen3-MoE top-k probability normalization back to the original .sum(-1, keepdim=True) path, keeping only the
prefill expert-output reduction as einsum. Targeted Qwen3-MoE subfunction and parity tests pass.

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi force-pushed the fix/subfunction-reducesum-einsum branch from d44816b to d75e27d Compare June 4, 2026 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.22 Release 1.22 candidate bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant