PaddlePaddle / flash-attention Public

forked from Dao-AILab/flash-attention

Notifications You must be signed in to change notification settings
Fork 32
Star 20

Code
Pull requests 24
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Pull requests
Actions
Projects
Security and quality
Insights

Pull requests: PaddlePaddle/flash-attention

Labels 9 Milestones 0

New pull request New

24 Open 105 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add FA4 varlen.

#129 opened Apr 13, 2026 by baoqiwen

Loading…

[Feat] CP-balance formal incorporation as flash_mask sub-module via build_ext

#128 opened Apr 10, 2026 by Enigmatisms

Loading…

[Feat] CP-balance formal incorporation as flash_mask sub-module

#127 opened Apr 9, 2026 by Enigmatisms

Loading…

bwd support (192, 128) for sm100

#123 opened Apr 3, 2026 by baoqiwen

Loading…

Tune registers

#122 opened Apr 1, 2026 by baoqiwen

Loading…

FlashMaskV3 Single-node Speed Optimization

#119 opened Mar 26, 2026 by Enigmatisms

Loading…

Add rrattn estimate func and interface

#117 opened Mar 13, 2026 by LLSGYN

Loading…

Optimize FlashMask v3, which is ~20% slower than FA3 Varlen.

#116 opened Mar 10, 2026 by baoqiwen

Loading…

Support Global Sliding Window (num_vec == 4) on FM4 BWD

#111 opened Mar 3, 2026 by umiswing Member

Loading…

adapt to torch version flashmaskv4

#103 opened Jan 28, 2026 by clouds1238

Loading…

add_flashmask_cpbalance

#99 opened Dec 30, 2025 by starcrown001

Loading…

add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py

#98 opened Dec 23, 2025 by clouds1238

Loading…

fine-tuned tile size & regitser for fwd_hdim64

#92 opened Nov 14, 2025 by xxyux

Loading…

Removed redundant templates and related compile-time/runtime code

#91 opened Nov 14, 2025 by Enigmatisms

Loading…

1 task

fix fa2 flashmask oob read

#67 opened Jun 26, 2025 by umiswing Member • Draft

[WIP] fa3 varlen fix int32 overflow

#65 opened Jun 19, 2025 by umiswing Member

Loading…

scan from right to left and skip masked block for each row at kernel begin

#55 opened Sep 23, 2024 by GuoxiaWang Collaborator

Loading…

optimize skip block calculate in bwd

#49 opened Aug 28, 2024 by GuoxiaWang Collaborator

Loading…

[BugFix] fix_mask error using unpadding api

#41 opened Apr 23, 2024 by wwbitejotunn

Loading…

Fix unpadding input with padding mask compute error

#38 opened Apr 15, 2024 by wwbitejotunn

Loading…

Fa cmake extends op

#31 opened Dec 14, 2023 by AnnaTrainingG

Loading…

Fa cmake

#29 opened Dec 6, 2023 by AnnaTrainingG

Loading…

[WIP]Sparse seqparallel

#9 opened Jun 8, 2023 by zkh2016

Loading…

add block sparse api

#7 opened May 27, 2023 by kuizhiqing Member

Loading…

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!