Skip to content

Fix attention_mask broadcasting for NPU compatibility#13451

Open
chang-zhijie wants to merge 3 commits intohuggingface:mainfrom
chang-zhijie:main
Open

Fix attention_mask broadcasting for NPU compatibility#13451
chang-zhijie wants to merge 3 commits intohuggingface:mainfrom
chang-zhijie:main

Conversation

@chang-zhijie
Copy link
Copy Markdown

This PR resolves the unsupported atten_mask shape error when running attention with NPU (Ascend) devices.

Problem:
The NPU’s fusion attention operator (e.g., npu_fusion_attention) does not support automatic broadcasting for attention masks.
When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error similar to:
get unsupported atten_mask shape, the shape is [B, 1, 1, S] – while only shapes like [B, N, S, S], [B, 1, S, S], [1, 1, S, S], or [S, S] are accepted.

Solution:
When running on NPU (is_torch_npu_available()), explicitly expand the mask to [batch, num_heads, seq_len, seq_len] to satisfy the operator’s shape constraints.
For non‑NPU devices, the mask is kept in broadcastable form ([batch, 1, 1, seq_len]) for efficiency.

Reference:
Ascend NPU fusion attention API:
https://www.hiascend.com/document/detail/zh/Pytorch/730/apiref/torchnpuCustomsapi/docs/context/torch_npu-npu_fusion_attention.md

Tested:
✅ NPU (Ascend) with explicit expansion – no shape error
✅ GPU / CPU with broadcastable mask – no performance regression

@github-actions github-actions bot added models size/S PR with diff < 50 LOC labels Apr 13, 2026
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 13, 2026
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 13, 2026
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 14, 2026
@chang-zhijie
Copy link
Copy Markdown
Author

@yiyixuxu This PR is a follow-up to #13432, adding NPU hardware support for ERNIE-Image. Could you please help review it? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant