TorchPlusPlus/todo.md at master · AmanSwar/TorchPlusPlus · GitHub

59 lines (48 loc) · 835 Bytes

(both backward and forward)

attention
- vanilla
- grouped query
- multihead
- multi latent
- paged attention
- ring attention
- vertical slash attention
- kv_cache
activation (including quantized version too)
- SiLU
- GeLU
- GeLU-tanh
- FATReLU
- SwiGLU
gemm
- gptq gemm
- marlin optimized gptq
- nvfp4_scaled_mm_kernels
- awq gemm kernels
- allspark qgemm w8a16
- cutlass w4a8 gemm
- gguf gemm kernel
- int 8 gemm kernel
MoE
- moe_gemm
- grouped top k
- top k softmax
- moe align and sum
- moe permute
- marlin moe
- moe permute unpermute
Sparse
- sparse gemm
- 2:4 sparsity kernel
DLops
- RMSNorm
- LayerNorm
- softmax
- fused Linear layers with activation
- RoPE
blas
- HGEMV
- reduction
  - sum
  - max
- vector addition