Repeatkv transform by quic-dhirajku · Pull Request #1037 · quic/efficient-transformers

quic-dhirajku · 2026-06-04T11:25:44Z

This PR improves repeat_kv to be more reliable and portable across model/config variants and quantized backends. It adds shared config-key resolution (for common head aliases), centralized repeat-value calculation from (num_devices, model_config), and consistent propagation of num_kv_heads_repeat into qaic_config and hashing so export/compile state matches the effective runtime settings. It also hardens transform behavior: MLA models are explicitly skipped with a warning (instead of attempting unsafe replication), and ReplicateKVHeadTransform.apply() now reports transformed status only when replication is actually applied.
The main functional upgrade is quantized RepeatKV support. _duplicate_weights_for_linear_layer was updated to correctly handle packed layouts (instead of relying on generic reshape assumptions), including AWQ/GPTQ packed tensors and a full QuantLinearORT path (dequantize → replicate → resize buffers → repack). This directly fixes the observed failures in RepeatKV quantized tests (shape mismatch and MatMulNBits zero-point dtype issues), and the previously failing AWQ TinyLlama RepeatKV test now passes.

…VLMs. Based on PR quic#625. Addressed most of the comments made on the previous PR. Repeat check is done on a subset of models during CI, primarily due to difference in configs of such models. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

…ng with changes made for the new transforms. TODO: Check for the ONNX directory path name being different. Check if the list of classes for mapping covers all the models that we support. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

…oder Wrappers were added to string mapping list to enable dummy model export for CI. Changes were made to prevent multiple application of ReplicateKVTransform if done in either Encoder or Decoder Wrapper already. Modeling files updated to access config in EncoderWrapper as well. Infra added for causalLM and VLM checks for repeatKV setup CI tests. CausalLM script APIRunner instantiation moved to allow updated input shapes to be made. Similarly commented export in VLM script since compile will call it with updated changes already. TODO: Confirm the changes that were made for DeepSeekV3 model for RepeatKV, currently they were removed for a generic approach. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Made changes to allow generic name based transformation of heads (num_attention_heads, n_heads, n_head etc). Minor edits and utils created for this task. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Edited the changes as suggested by quic-mamta. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

This reverts commit b40a34d.

Enabled method to calculate best possible repeat_kv count based on model and num devices. Added repeat_kv method for AWQ quantized models. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

quic-rishinr added the 1.22 Release 1.22 candidate label Jun 5, 2026

quic-rishinr requested review from ochougul and vbaddi June 5, 2026 04:45

quic-dhirajku added 7 commits June 5, 2026 11:49

Added RepeatKVTransform operations needed for DeepseekV3ForCausalLM.

3d14efd

Made changes to allow generic name based transformation of heads (num_attention_heads, n_heads, n_head etc). Minor edits and utils created for this task. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Addressed Internal Code Review comments.

7b60678

Edited the changes as suggested by quic-mamta. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Revert "Addressed Internal Code Review comments."

76ff96c

This reverts commit b40a34d.

Addressed comments.

12ca65e

Enabled method to calculate best possible repeat_kv count based on model and num devices. Added repeat_kv method for AWQ quantized models. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

quic-dhirajku force-pushed the repeatkv_transform branch from cb92eaf to 12ca65e Compare June 5, 2026 06:22

quic-dhirajku marked this pull request as ready for review June 5, 2026 06:24

Renamed num_kv_heads_repeat to num_replicate_kv_heads as suggested.

008adc7

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

quic-dhirajku force-pushed the repeatkv_transform branch 2 times, most recently from fe75c92 to 008adc7 Compare June 5, 2026 08:30

quic-rishinr mentioned this pull request Jun 5, 2026

Repeatkv transform #997

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeatkv transform#1037

Repeatkv transform#1037
quic-dhirajku wants to merge 8 commits into
quic:release/v1.22.0_tmpfrom
quic-dhirajku:repeatkv_transform

quic-dhirajku commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

quic-dhirajku commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants