operator-fusion

Here are 5 public repositories matching this topic...

AICL-Lab / triton-fused-ops

Fused Triton kernels for Transformer inference: RMSNorm+RoPE, Gated MLP, FP8 GEMM — CPU-testable references, autotuning, and benchmarking

Updated May 25, 2026
Python

maharab549 / ai-accelerator-compiler

Star

A specialized compiler that optimizes deep learning models for AI accelerators with operator fusion, memory optimization, and hardware-specific passes.

machine-learning ai deep-learning compiler optimization accelerator pytorch mlir operator-fusion

Updated Mar 1, 2026
Python

debanjan06 / latency-serve-edge

Star

Native Rust edge inference engine with zero-copy memmap2 tensor loading, register-fused Linear+ReLU kernels, and scenario-aware MoE routing via rayon work-stealing — achieving 352µs lightweight and 1.39ms dense expert execution.

rust high-performance zero-copy parallel-computing systems-programming inference-engine edge-computing memory-mapping mixture-of-experts operator-fusion

Updated May 31, 2026
Rust

stefan-lafon / tensor-morph

Star

TensorMorph is an AI-assisted MLIR compiler for TOSA graph optimization and operator fusion.

compiler graph-optimization mlir operator-fusion ml-compiler tosa tensor-optimization ai-assisted-compilation

Updated Jan 12, 2026
C++

dbhan08 / inferc

Star

C++17 ONNX inference optimizer + CPU runtime for Apple Silicon. Operator fusion via IR passes, Accelerate AMX-backed sgemm; benched vs ONNX Runtime CPU EP on DistilBERT (1.26x baseline speedup, 6.99x ORT on raw MatMul).

machine-learning protobuf compiler cpp transformer accelerate cpp17 amx inference-engine sgemm ml-infrastructure onnx gelu distilbert apple-silicon operator-fusion

Updated May 29, 2026
C++

Improve this page

Add a description, image, and links to the operator-fusion topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the operator-fusion topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator-fusion

Here are 5 public repositories matching this topic...

AICL-Lab / triton-fused-ops

maharab549 / ai-accelerator-compiler

debanjan06 / latency-serve-edge

stefan-lafon / tensor-morph

dbhan08 / inferc

Improve this page

Add this topic to your repo