Skip to content

Turn existing kernels into structured, documented examples #2

@sandlbn

Description

@sandlbn

We already have a large set of Triton kernels (e.g. 14_Gemm_, 81_Gemm_, 95_Matmul_*, FlashAttention, etc.), but they are currently:
flat (no structure)
undocumented
hard to navigate for new users
not clearly categorized by use case or optimization pattern
Instead of adding new examples, we should convert a subset of existing kernels into curated, documented examples.

Something like:

GEMM

  • 14_Gemm_Divide_Sum_Scaling
  • 39_Gemm_Scale_BatchNorm

Fused / complex pipelines

  • 81_Gemm_Swish_Divide_Clamp_Tanh_Clamp
  • 95_Matmul_Add_Swish_Tanh_GELU_Hardtanh

Reduction / normalization

  • 84_Gemm_BatchNorm_Scaling_Softmax

Attention

  • 1_FlashAttention_Fwd

  • Mixed ops

  • 55_Matmul_MaxPool_Sum_Scale

  • 68_Matmul_Min_Subtract

Plus add EXAMPLES.md file

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions