Skip to content

Latest commit

 

History

History
59 lines (48 loc) · 835 Bytes

File metadata and controls

59 lines (48 loc) · 835 Bytes

(both backward and forward)

  • attention

    • vanilla
    • grouped query
    • multihead
    • multi latent
    • paged attention
    • ring attention
    • vertical slash attention
    • kv_cache
  • activation (including quantized version too)

    • SiLU
    • GeLU
    • GeLU-tanh
    • FATReLU
    • SwiGLU
  • gemm

    • gptq gemm
    • marlin optimized gptq
    • nvfp4_scaled_mm_kernels
    • awq gemm kernels
    • allspark qgemm w8a16
    • cutlass w4a8 gemm
    • gguf gemm kernel
    • int 8 gemm kernel
  • MoE

    • moe_gemm
    • grouped top k
    • top k softmax
    • moe align and sum
    • moe permute
    • marlin moe
    • moe permute unpermute
  • Sparse

    • sparse gemm
    • 2:4 sparsity kernel
  • DLops

    • RMSNorm
    • LayerNorm
    • softmax
    • fused Linear layers with activation
    • RoPE
  • blas

    • HGEMV
    • reduction
      • sum
      • max
    • vector addition