Pure PyTorch + 🤗 Transformers reimplementation of Megalodon (CEMA + chunked attention) - readable, hackable, no CUDA kernels required
pytorch rope ema pytorch-implementation linear-attention efficient-transformers llm sub-quadratic-attention long-context-modeling streaming-inference complex-ema
-
Updated
Jan 12, 2026 - Python