Skip to content

Block-sparse attention primitive #21

@scttfrdmn

Description

@scttfrdmn

The gather-matmul-scatter pattern in bsr_spmm is architecturally identical to sparse attention: Longformer's local+global attention, BigBird's random+local+global, or any structured-sparse transformer mask. A BSRMatrix with a sliding-window block pattern IS a local-attention mask.

Acceptance:

  • docs/sparse_attention.md — writeup showing how to build BSRMatrix patterns for common attention variants (local window, dilated, global tokens)
  • examples/block_sparse_attention.py — minimal reference: build the pattern, compute softmax(Q @ K.T) @ V with bsr_spmm for the masked parts, verify against a dense-mask reference

No new kernel — this is framing + example code. The claim is that trnsparse's BSR path already provides the primitive; block-sparse attention is a consumer.

Depends on #18.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestneuronRequires AWS Neuron / Trainium hardware

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions