Skip to content

Releases: HKUSTDial/flash-sparse-attention

v2.0.1

28 Apr 09:48
d21c3cf

Choose a tag to compare

What's Changed

Full Changelog: v2.0.0...v2.0.1

What's Changed

Full Changelog: v2.0.0...v2.0.1

v2.0.0

23 Mar 03:33
969a280

Choose a tag to compare

What's Changed

  • Improve numerical stability in sparse attention with sink auxiliary logits by @LoserCheems in #220
  • [PERFORMANCE OPTIMIZATION] Flash Sparse Attention by @LoserCheems in #221
  • [BUG FIX] Refactor block min/max calculations by @LoserCheems in #223
  • [BUG FIX] Improve packed GQA handling by @LoserCheems in #224
  • Add utility functions for device management and input validation by @LoserCheems in #225
  • [PERFORMANCE OPTIMIZATION] Triton Sparse Base Forward Kernel with Gate-Based Sparsity by @LoserCheems in #226
  • [FEATURE] Enhance forward combine kernel and split attention by @LoserCheems in #227
  • Improves softmax stability with log2 scaling by @LoserCheems in #228
  • Renames variables and refactors functions for clarity by @LoserCheems in #229
  • Improve performance and configuration for SM90 forward path by @LoserCheems in #231
  • Refactor rescaling logic in online_softmax and rescale_o functions by @LoserCheems in #232
  • [BUG FIX] Improve forward kernel configuration and validation by @LoserCheems in #233
  • Refactor qheads_per_kvhead calculations for clarity by @LoserCheems in #234
  • [FEATURE SUPPORT] Add Triton backward support by @LoserCheems in #235
  • [FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Forward Kernel by @LoserCheems in #236
  • Refactor log_sigmoid function for improved performance and accuracy by @LoserCheems in #237
  • [FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Backward Kernel by @LoserCheems in #238
  • Enhance forward kernel for block range and masking logic by @LoserCheems in #239
  • Refactor backward kernels for clarity and optimization by @LoserCheems in #240
  • [BUG FIX] Update launch configuration for RTX Pro 6000 by @LoserCheems in #241
  • Add benchmark functions for Triton attention operations by @LoserCheems in #242
  • [FEATURE SUPPORT] Enable Softmax-Threshold Block Skipping in Triton Dense/Sparse Forward Attention by @LoserCheems in #243
  • [BUG FIX] Improve clarity and accuracy in gating mechanisms by @LoserCheems in #244
  • [BUG FIX] Update stride parameters for consistency by @LoserCheems in #245
  • Add softmax threshold parameter for enhanced flexibility by @LoserCheems in #246
  • [FEATURE] Implement dense attention with masking support by @LoserCheems in #247
  • Enhance sparse attention implementation and documentation by @LoserCheems in #248
  • [FEATURE] Implement gated attention mechanism and enhance performance by @LoserCheems in #249
  • Update project structure and dependencies by @LoserCheems in #250
  • [BUG FIX] Improve error reporting and occupancy in benchmarks by @LoserCheems in #251
  • Update repository URLs and improve documentation by @LoserCheems in #252
  • Refactor benchmark tests to simplify tensor initialization by @LoserCheems in #253
  • Refactor test utilities and add CUDA tensor operation tests by @LoserCheems in #254
  • Refactor masking logic in backward kernel functions by @LoserCheems in #255
  • Refactor GitHub Actions workflows for package building and publishing by @LoserCheems in #256

Full Changelog: v1.2.4...v2.0.0

What's Changed

  • Improve numerical stability in sparse attention with sink auxiliary logits by @LoserCheems in #220
  • [PERFORMANCE OPTIMIZATION] Flash Sparse Attention by @LoserCheems in #221
  • [BUG FIX] Refactor block min/max calculations by @LoserCheems in #223
  • [BUG FIX] Improve packed GQA handling by @LoserCheems in #224
  • Add utility functions for device management and input validation by @LoserCheems in #225
  • [PERFORMANCE OPTIMIZATION] Triton Sparse Base Forward Kernel with Gate-Based Sparsity by @LoserCheems in #226
  • [FEATURE] Enhance forward combine kernel and split attention by @LoserCheems in #227
  • Improves softmax stability with log2 scaling by @LoserCheems in #228
  • Renames variables and refactors functions for clarity by @LoserCheems in #229
  • Improve performance and configuration for SM90 forward path by @LoserCheems in #231
  • Refactor rescaling logic in online_softmax and rescale_o functions by @LoserCheems in #232
  • [BUG FIX] Improve forward kernel configuration and validation by @LoserCheems in #233
  • Refactor qheads_per_kvhead calculations for clarity by @LoserCheems in #234
  • [FEATURE SUPPORT] Add Triton backward support by @LoserCheems in #235
  • [FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Forward Kernel by @LoserCheems in #236
  • Refactor log_sigmoid function for improved performance and accuracy by @LoserCheems in #237
  • [FEATURE SUPPORT] Add Configurable Sparse Gate Modes and Adaptive Thresholding in Triton Backward Kernel by @LoserCheems in #238
  • Enhance forward kernel for block range and masking logic by @LoserCheems in #239
  • Refactor backward kernels for clarity and optimization by @LoserCheems in #240
  • [BUG FIX] Update launch configuration for RTX Pro 6000 by @LoserCheems in #241
  • Add benchmark functions for Triton attention operations by @LoserCheems in #242
  • [FEATURE SUPPORT] Enable Softmax-Threshold Block Skipping in Triton Dense/Sparse Forward Attention by @LoserCheems in #243
  • [BUG FIX] Improve clarity and accuracy in gating mechanisms by @LoserCheems in #244
  • [BUG FIX] Update stride parameters for consistency by @LoserCheems in #245
  • Add softmax threshold parameter for enhanced flexibility by @LoserCheems in #246
  • [FEATURE] Implement dense attention with masking support by @LoserCheems in #247
  • Enhance sparse attention implementation and documentation by @LoserCheems in #248
  • [FEATURE] Implement gated attention mechanism and enhance performance by @LoserCheems in #249
  • Update project structure and dependencies by @LoserCheems in #250
  • [BUG FIX] Improve error reporting and occupancy in benchmarks by @LoserCheems in #251
  • Update repository URLs and improve documentation by @LoserCheems in #252
  • Refactor benchmark tests to simplify tensor initialization by @LoserCheems in #253
  • Refactor test utilities and add CUDA tensor operation tests by @LoserCheems in #254
  • Refactor masking logic in backward kernel functions by @Loserc...
Read more

v1.2.4

20 Dec 14:05
bd824a7

Choose a tag to compare

Last attn_mask version

We will adopt a new strategy to alleviate the memory bottleneck of attn_mask. This is the last version with attn_mask. Future versions will not pass attn_mask.

What's Changed

Full Changelog: v1.2.3...v1.2.4

What's Changed

Full Changelog: v1.2.3...v1.2.4

v1.2.3

09 Nov 15:55
b746952

Choose a tag to compare

What's Changed

  • Add selectable masking strategies for attention by @LoserCheems in #204
  • Refactor attention block smoothing for consistency by @LoserCheems in #205
  • Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
  • [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
  • Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207

Full Changelog: v1.2.2...v1.2.3

What's Changed

  • Add selectable masking strategies for attention by @LoserCheems in #204
  • Refactor attention block smoothing for consistency by @LoserCheems in #205
  • Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
  • [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
  • Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207

Full Changelog: v1.2.2...v1.2.3

v1.2.2

05 Nov 08:10

Choose a tag to compare

What's Changed

  • [FEATURE SUPPORT] Robust dBias accumulation for seqlen_q_bias == 1 by @LoserCheems in #194
  • [FEATURE SUPPORT] Centralize dynamic mask creation for FDMA by @LoserCheems in #197
  • Update documentation to use mask utility in examples by @LoserCheems in #198
  • Fix attention bias calculation and dbias handling by @LoserCheems in #199
  • Add block-wise smoothing to attention mask by @LoserCheems in #201
  • [FEATURE SUPPORT] Move scaling out of streaming loops, bias-initialized acc_s, and fix dQ double-scaling by @LoserCheems in #203

Full Changelog: v1.2.1...v1.2.2

v1.2.1

16 Oct 04:51

Choose a tag to compare

What's Changed

  • Implement variable-length attention with mask and bias support by @LoserCheems in #185
  • Add issue/PR templates by @LoserCheems in #186
  • [FEATURE SUPPORT] Variable-Length Attention with Padding-Free Execution by @LoserCheems in #188
  • [FEATURE SUPPORT] Broadcastable 4D mask/bias, 128‑rounded key length, stride‑0 broadcasting, and dbias reductions by @LoserCheems in #190
  • Refactor bias initialization and enhance bias computation in FlashDMAttnFunc by @LoserCheems in #191
  • Fix attention_mask and attention_bias shape descriptions and remove redundant checks by @LoserCheems in #192
  • Enhance bias gradient accumulation in backward pass by @LoserCheems in #193

Full Changelog: v1.2.0...v1.2.1

v1.2.0

01 Oct 16:58

Choose a tag to compare

What's Changed

  • [BUG FIX] Fix mask/bias memory access and vectorization issues in kernels by @LoserCheems in #182

Full Changelog: v1.1.9...v1.2.0

v1.1.9

22 Sep 16:19

Choose a tag to compare

What's Changed

  • Refactor attention mask and bias handling for efficiency by @LoserCheems in #177
  • [BUG FIX] SM80 NaN in bias.grad when both mask and bias are enabled by @LoserCheems in #179

Full Changelog: v1.1.8...v1.1.9

v1.1.8

21 Sep 02:05
ad7a3ab

Choose a tag to compare

What's Changed

Full Changelog: v1.1.7...v1.1.8

v1.1.7

20 Sep 18:30
a73c635

Choose a tag to compare

What's Changed

Full Changelog: v1.1.6...v1.1.7