Added MDP generation to QEff Compile by quic-mohmeh · Pull Request #930 · quic/efficient-transformers

quic-mohmeh · 2026-04-21T06:47:20Z

This PR adds the MDP generation required in case of disaggregated serving for Prefill. This supports both Pipeline Prefill + Tensor Slicing and also supports passing custom cores to the MDP generator

quic-mohmeh · 2026-04-24T05:44:47Z

Tested and working on the following model classes

CodeLlama-7b-Instruct
falcon-7b-instruct
gemma-2-9b-it
gpt-oss-20b
granite-3.1-8b-instruct
Llama-3.2-1B-Instruct
Llama-3.2-3B
Phi-3-mini-4k-instruct

quic-rishinr · 2026-04-27T07:37:39Z

@mamtsing @ochougul please review the PR

quic-mohmeh · 2026-05-04T04:18:54Z

@quic-rishinr @mamtsing @ochougul A gentle reminder for review

ochougul · 2026-05-04T10:05:27Z

Add warning that ignores mdp_ts_num_partitions whenever seq_len==1
Also, add a warning that ignores when ts_num_devices>1 and seq_len> and mdp_ts_num_partitions>1

quic-mohmeh · 2026-05-14T04:53:17Z

To be tested on:

Qwen3 Dense
Qwen3 MOE
Test with subfunctions

quic-hemagnih · 2026-05-14T10:00:17Z

@quic-mohmeh Please let us know once your testing is complete. Also incorporate the review comments. @quic-mamta Can also please review it.

quic-mohmeh · 2026-05-18T06:37:22Z

Works on Qwen3-4B(Dense) - without subfunctions

quic-mohmeh · 2026-05-18T08:43:09Z

Works on Qwen3-30B-A3B (MOE) model as well - without subfunctions

quic-mohmeh · 2026-05-22T05:42:34Z

Update example scripts for Qwen3(with and without VL) and GPTOSS

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

quic-rishinr · 2026-05-25T16:38:35Z

@quic-mohmeh please rebase it on top of release/v1.22.0_tmp branch

quic-mohmeh · 2026-05-27T06:05:09Z

Tested with subfunctions on the following model:

CodeLlama-7b-Instruct-hf
falcon-7b-instruct
gemma-2-9b-it
granite-3.1-8b-instruct
Llama-3.1-70B-Instruct
Llama-3.2-1B-Instruct
Llama-3.2-3B
Phi-3-mini-4k-instruct
Qwen3-30B-A3B
GPTOSS-20B

quic-mohmeh · 2026-05-27T06:08:29Z

Also verified by Qeff team on following models with subfunctions:

Qwen/Qwen3-235B-A22B (Karthikeya)
moonshotai/Kimi-K2-Instruct (Mamta)

quic-rishinr · 2026-05-27T08:22:10Z

@@ -0,0 +1,263 @@
+# -----------------------------------------------------------------------------


please add unit tests and test for mdp generation. Please make sure the tests are small and exectue under 5 to 10 seconds.

@quic-rishinr as we have discussed that there is no valid approach to compare compiler MDP dump and QEff MDP dump, all we can ensure is that every node present in compiler MDP dump should be present in QEff MDP dump in correct order. For verifying and testing, we might need to compile no PP and PP QPCs and compare the output. Let me know which direction you want to proceed for testing

quic-akuruvil

@quic-mohmeh Have you verified PP+TS combination with this? Or only PP works with this? WHen you say you have verified the above models, have you verified the output sanity too?

quic-mohmeh · 2026-05-29T04:57:02Z

@quic-mohmeh Have you verified PP+TS combination with this? Or only PP works with this? WHen you say you have verified the above models, have you verified the output sanity too?

As vLLM currently doesn't support PP+TS, so I am unable to verify PP+TS. As for output validation, I have only skim the output, the main purpose of these tests is to check whether the model compiles or not with the MDP generated

quic-mohmeh

Addressed the comments @quic-rishinr

quic-mohmeh · 2026-05-29T09:04:04Z

@@ -0,0 +1,263 @@
+# -----------------------------------------------------------------------------


@quic-rishinr as we have discussed that there is no valid approach to compare compiler MDP dump and QEff MDP dump, all we can ensure is that every node present in compiler MDP dump should be present in QEff MDP dump in correct order. For verifying and testing, we might need to compile no PP and PP QPCs and compare the output. Let me know which direction you want to proceed for testing

Addressed Rishin Comments Signed-off-by: Mohit Mehta <mohmeh@qrc706r8-292-05.qualcomm.com>

quic-mohmeh · 2026-05-29T09:40:12Z

@quic-rishinr @quic-hemagnih
Needs to be tested with VLMs as well, please do not merge yet.

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

quic-mohmeh · 2026-06-02T09:23:26Z

VLMs are not working currently with this MDP partition, both with and without subfunctions(E-P-D). Have to look into compiler codebase

quic-akuruvil · 2026-06-04T08:42:51Z

@quic-mohmeh Can you rebase your PR with latest

quic-mohmeh · 2026-06-04T11:03:32Z

Working on VLMs - tested on Qwen2.5VL 3B with and without subfunctions(PP4)

Resolved conflict in QEfficient/base/modeling_qeff.py: - Kept mdp_num_partitions parameter (MDP disagg prefill feature) - Updated num_speculative_tokens to Union[int, List[int]] from release branch Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

quic-akuruvil · 2026-06-05T09:02:56Z

Working on VLMs - tested on Qwen2.5VL 3B with and without subfunctions(PP4)

Hi @quic-mohmeh When you say tested on VLMS, have you enabled PP on prefill part of language model? Or have you checked with PP on vision part as well?

quic-mohmeh · 2026-06-05T09:09:27Z

Working on VLMs - tested on Qwen2.5VL 3B with and without subfunctions(PP4)

Hi @quic-mohmeh When you say tested on VLMS, have you enabled PP on prefill part of language model? Or have you checked with PP on vision part as well?

Yes, PP with Prefill, TS on Encode and Decode

quic-vargupt · 2026-06-05T09:12:44Z

Gemma4-26B-A4B Prefill compiled with subfunctions

quic-mohmeh force-pushed the mdp branch 3 times, most recently from e20d868 to f393d6e Compare April 22, 2026 08:42

quic-rishinr requested review from ochougul and quic-mamta April 24, 2026 05:40

ochougul reviewed May 4, 2026

View reviewed changes

quic-mamta reviewed May 20, 2026

View reviewed changes

Comment thread QEfficient/compile/mdp_generator.py

quic-rishinr requested review from ochougul and quic-mamta May 22, 2026 10:41

quic-rishinr added the 1.22 Release 1.22 candidate label May 22, 2026

quic-rishinr requested a review from quic-hemagnih May 22, 2026 10:41

quic-mohmeh force-pushed the mdp branch 2 times, most recently from 36cd668 to d34dbda Compare May 25, 2026 08:49

quic-mohmeh added 4 commits May 25, 2026 01:51

Added MDP generation to QEff Compile

8a28ceb

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

Formatting and Linting

5b0adf8

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

Add compiler options - 'stages'

6bef372

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

Added support for layerwise export

f924a1d

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

quic-mohmeh force-pushed the mdp branch from d34dbda to f924a1d Compare May 25, 2026 08:51

Merge branch 'main' into mdp

ebf2797

quic-rishinr changed the base branch from main to release/v1.22.0_tmp May 25, 2026 16:37

Merge branch 'release/v1.22.0_tmp' into mdp

6fb19c6

quic-rishinr requested changes May 27, 2026

View reviewed changes

quic-akuruvil reviewed May 27, 2026

View reviewed changes

quic-mohmeh commented May 29, 2026

View reviewed changes

Minor Changes

5b9fb71

Addressed Rishin Comments Signed-off-by: Mohit Mehta <mohmeh@qrc706r8-292-05.qualcomm.com>

quic-mohmeh force-pushed the mdp branch from b8b66c6 to 5b9fb71 Compare May 29, 2026 09:12

quic-mohmeh force-pushed the mdp branch from d8ca262 to 4626809 Compare June 2, 2026 09:11

Add support for VLMs

e176948

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

quic-mohmeh force-pushed the mdp branch from 4626809 to e176948 Compare June 2, 2026 09:12

Mohit Mehta added 2 commits June 4, 2026 23:41

Merge branch 'release/v1.22.0_tmp' into mdp

db73d16

Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

		@@ -0,0 +1,263 @@
		# -----------------------------------------------------------------------------

Conversation

quic-mohmeh commented Apr 21, 2026

Uh oh!

quic-mohmeh commented Apr 24, 2026

Uh oh!

quic-rishinr commented Apr 27, 2026

Uh oh!

quic-mohmeh commented May 4, 2026

Uh oh!

ochougul May 4, 2026

Choose a reason for hiding this comment

Uh oh!

quic-mohmeh commented May 14, 2026

Uh oh!

quic-hemagnih commented May 14, 2026

Uh oh!

quic-mohmeh commented May 18, 2026

Uh oh!

quic-mohmeh commented May 18, 2026

Uh oh!

Uh oh!

quic-mohmeh commented May 22, 2026

Uh oh!

quic-rishinr commented May 25, 2026

Uh oh!

quic-mohmeh commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-mohmeh commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

quic-rishinr May 27, 2026

Choose a reason for hiding this comment

Uh oh!

quic-mohmeh May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quic-akuruvil left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-mohmeh commented May 29, 2026

Uh oh!

quic-mohmeh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

quic-mohmeh May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

quic-mohmeh commented May 29, 2026

Uh oh!

quic-mohmeh commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-akuruvil commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-mohmeh commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-akuruvil commented Jun 5, 2026

Uh oh!

quic-mohmeh commented Jun 5, 2026

Uh oh!

quic-vargupt commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

quic-mohmeh commented May 27, 2026 •

edited

Loading

quic-mohmeh commented May 27, 2026 •

edited

Loading

quic-akuruvil left a comment •

edited

Loading

quic-mohmeh commented Jun 2, 2026 •

edited

Loading

quic-akuruvil commented Jun 4, 2026 •

edited

Loading

quic-mohmeh commented Jun 4, 2026 •

edited

Loading