FlashMaskV3 Single-node Speed Optimization by Enigmatisms · Pull Request #119 · PaddlePaddle/flash-attention

Enigmatisms · 2026-03-26T12:40:51Z

本 PR 包括如下三个部分：

对 global sliding window mask 优化 bwd multiple for loop & inline lambda 导致的寄存器溢出问题，使得大 tile size 可成功应用。解决 global sliding window 反向性能瓶颈问题。
对 scheduler barrier 的使用调整，可优化 hdim128 前向的性能。
@xxyux 此前的 PR: Optimize fwd hdim64 #90。对 hdim64 进行的寄存器分配优化以及 tile size 调整。

umiswing · 2026-04-09T02:46:42Z

csrc/flashmask_v2/flash_bwd_launch_template.h

                } else {
-                    if ((params.seqlen_q >= 1024 || params.seqlen_k >= 1024) && !(Has_lt_end && Has_ut_start)) {
+                    if (params.seqlen_q >= 1024 || params.seqlen_k >= 1024) {
                    run_mha_bwd_dispatch<Arch, T, 64, 128, 128, Is_causal, Is_local, Has_softcap, Is_flashmask_, Has_lt_end, Has_ut_start, Deterministic, Is_blockmask_, 2, 2, true, false, true, 2, 1, 2, 1, false>(params, stream);


https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/kernels/gpu/flash_attn_v3_grad_kernel.cu#L1124

这里也需要对应修改下

umiswing · 2026-04-14T14:47:30Z

LGTM

Enigmatisms and others added 4 commits March 26, 2026 10:52

[Fix] Fix flawed logic with simpler unittest logic alignment

592a9e4

[Compat] Backward compatible with other mask types

486a4b5

[Trial] Test UseSchedulerBarrier heuristic for hd128

1ec0f7c

fine-tuned tile size & regitser for fwd_hdim64

f52f35c

umiswing approved these changes Apr 9, 2026

View reviewed changes

GuoxiaWang approved these changes Apr 14, 2026

View reviewed changes

GuoxiaWang merged commit a44cf15 into PaddlePaddle:main Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlashMaskV3 Single-node Speed Optimization#119

FlashMaskV3 Single-node Speed Optimization#119
GuoxiaWang merged 4 commits intoPaddlePaddle:mainfrom
Enigmatisms:new_optim

Enigmatisms commented Mar 26, 2026

Uh oh!

umiswing Apr 9, 2026

Uh oh!

umiswing commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Enigmatisms commented Mar 26, 2026

Uh oh!

umiswing Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

umiswing commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants