Turn existing kernels into structured, documented examples

 We already have a large set of Triton kernels (e.g. 14_Gemm_*, 81_Gemm_*, 95_Matmul_*, FlashAttention, etc.), but they are currently:
flat (no structure)
undocumented
hard to navigate for new users
not clearly categorized by use case or optimization pattern
Instead of adding new examples, we should convert a subset of existing kernels into curated, documented examples. 


Something like:

GEMM

- 14_Gemm_Divide_Sum_Scaling
- 39_Gemm_Scale_BatchNorm

Fused / complex pipelines

- 81_Gemm_Swish_Divide_Clamp_Tanh_Clamp
- 95_Matmul_Add_Swish_Tanh_GELU_Hardtanh

Reduction / normalization

- 84_Gemm_BatchNorm_Scaling_Softmax

Attention

- 1_FlashAttention_Fwd
- Mixed ops

- 55_Matmul_MaxPool_Sum_Scale
- 68_Matmul_Min_Subtract

Plus add EXAMPLES.md file 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn existing kernels into structured, documented examples #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Turn existing kernels into structured, documented examples #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions