Skip to content

fix npu compatibility#13465

Open
HsiaWinter wants to merge 3 commits intohuggingface:mainfrom
HsiaWinter:add_npu_compatibility
Open

fix npu compatibility#13465
HsiaWinter wants to merge 3 commits intohuggingface:mainfrom
HsiaWinter:add_npu_compatibility

Conversation

@HsiaWinter
Copy link
Copy Markdown
Contributor

What does this PR do?

Fix attention_mask broadcasting for NPU compatibility

@github-actions github-actions bot added models size/S PR with diff < 50 LOC labels Apr 14, 2026
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 14, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR, I left one question


if attention_mask.ndim == 4:
# NPU does not support automatic broadcasting for this type; the mask must be expanded.
if attention_mask.device.type == 'npu' and attention_mask.shape[1:3] == (1, 1):
Copy link
Copy Markdown
Collaborator

@yiyixuxu yiyixuxu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we verify if we explicitly seet the backend to npu, this would also work?

def _native_npu_attention(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error similar to:
get unsupported atten_mask shape, the shape is [B, 1, 1, S] – while only shapes like [B, N, S, S], [B, 1, S, S], [1, 1, S, S], or [S, S] are accepted.

The _native_npu_attention function operates correctly as it leverages _maybe_modify_attn_mask_npu to reshape the attention mask from [batch_size, seq_len_k] to [batch_size, 1, seq_len_q, seq_len_k]. This reshaped format is compatible with the NPU backend.

Reference:
Ascend NPU fusion attention API:
https://www.hiascend.com/document/detail/zh/Pytorch/730/apiref/torchnpuCustomsapi/docs/context/torch_npu-npu_fusion_attention.md

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error

just want to make sure we're on the same page, could you share a code example that would produce this error on npu? specifically, I;d like to know if you are running the default attention backend, i.e. without wrapping your model call inside with attention_backend("_naive_npu")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants