✨[Feature] Support IAttention based quantization for MHA

**Is your feature request related to a problem? Please describe.**



**Describe the solution you'd like**

The IAttention APIs introduce some new code paths for some of the quantized scales that should unlock further performance. We need to enable using these APIs from ModelOpt generated checkpoints. I would suspect that the implementation will be similar to how we handle Convolution and its layer specific quantization. 

**Describe alternatives you've considered**


**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨[Feature] Support IAttention based quantization for MHA #4167

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

✨[Feature] Support IAttention based quantization for MHA #4167

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions