fp8

Here are 43 public repositories matching this topic...

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8 fp4

Updated Jun 6, 2026
Python

NVIDIA / cudnn-frontend

Star

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

Updated Jun 5, 2026
Python

Azure / MS-AMP

Star

Microsoft Automatic Mixed Precision Library

deep-learning gpu amp pytorch transformer mixed-precision fp8

Updated Dec 1, 2025
Python

intel / neural-speed

Star

An innovative library for efficient LLM inference via low-bit quantization

Updated Aug 30, 2024
C++

aredden / flux-fp8-api

Star

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

flux pytorch quantization diffusion fast-inference fp8

Updated Oct 12, 2024
Python

Sandermage / genesis-vllm-patches

Star

vLLM patcher for Qwen3.6 on consumer NVIDIA — Qwen3.6-35B-A3B-FP8 (192 tok/s, +68% over stock) + Qwen3.6-27B-int4-AutoRound + 256K context. 126 patches: TurboQuant k8v4 KV, MTP/DFlash spec-decode, FULL cudagraph, hybrid GDN streaming, structured boot summary, one-command installer, 1958 tests. v7.72.2.

Updated May 12, 2026
Python

graphcore-research / jax-scalify

Star

JAX Scalify: end-to-end scaled arithmetics

jax low-precision llm fp8

Updated Oct 30, 2024
Python

MerkyorLynn / lynn-engine

Star

Lynn 原生 LLM 推理引擎 · W4A8/NVFP4 量化 · 自写 CUDA/Triton kernel · MoE · 投机解码 | Lynn-native LLM inference engine for NVIDIA Blackwell

Updated Jun 4, 2026
Python

tashiscool / fp8-mps-metal

Star

FP8 Metal compute kernels for Apple Silicon MPS — fixing what PyTorch doesn't support yet. FLUX/SD3.5/ComfyUI on Mac.

flux metal pytorch mps quantization apple-silicon fp8 stable-diffusion comfyui m4-pro

Updated Feb 8, 2026
Python

massif-01 / vllm_benchmark_block_fp8

Star

Automated Triton w8a8 block FP8 kernel tuning tool for vLLM. Auto-detects model architecture, supports Qwen3-Coder-30B-A3B-Instruct-FP8/DeepSeek-V3/custom models, multi-GPU parallel tuning, and generates optimized kernel configs for quantization.

triton performance-tuning kernel-tuning fp8 vllm

Updated May 15, 2026
Python

zpqiu / rl-infra-notes

Star

Source-code level analysis of LLM RL training infra: async RL, weight sync, FP8, MoE routing | LLM RL 训练基础设施源码级分析

reinforcement-learning rl moe distributed-training megatron llm fp8 vllm llm-training async-rl

Updated May 10, 2026
HTML

theogravity / dual-rtx-6000-blackwell-qwen3.6-27b-fp8

Sponsor

Star

Optimized vLLM setup for Qwen3.6-27B-FP8 on dual RTX PRO 6000 Blackwell (192 GB GDDR7, no NVLink) ; config, benchmark sweep results, and custom chat template with thinking mode off by default.

benchmark blackwell fp8 vllm local-llm llm-inference speculative-decoding qwen3 multi-token-prediction rtx-pro-6000

Updated May 10, 2026
Shell

MurrellGroup / Microfloats.jl

Star

Narrow precision floating point types

floating-point minifloat microfloat fp6 fp8 fp4

Updated Jun 3, 2026
Julia

stlin256 / CUDABurner

Star

An stress and benchmark utility for NVIDIA GPUs. Measures performance across various precisions (FP64, FP32, TF32, FP16, INT8) and monitors real-time vitals like power, temperature, and clock speeds.

benchmarking performance sparsity cpp hpc cuda nvidia stress-testing fp16 int8 gpu-benchmark fp32 fp64 fp8 fp4 tf32 tf16

Updated Dec 12, 2025
C++

maeddesg / vulkanforge

Star

LLM inference engine for AMD RDNA4 — Rust + Vulkan compute shaders, gguf & native FP8.

rust machine-learning amd vulkan inference mesa llm fp8 gguf rdna4 gfx1201 gemma4

Updated Jun 5, 2026
Rust

klessydra / spike-with-minifloat-fp8-support

Star

Spike, a RISC-V ISA Simulator with added 8-bit vector floating point support

spike riscv minifloat fp8

Updated Sep 12, 2025
C

zerfoo / zerfoo

Star

Pure Go machine learning framework. Train, run, and serve ML models with go build. Zero CGo.

go golang machine-learning deep-learning neural-network transformer fp16 autodiff distributed-training float16 onnx graph-ml ml-framework fp8 float8

Updated May 8, 2026
Go

AICL-Lab / triton-fused-ops

Star

Fused Triton kernels for Transformer inference: RMSNorm+RoPE, Gated MLP, FP8 GEMM — CPU-testable references, autotuning, and benchmarking

Updated May 25, 2026
Python

pathcosmos / EVAFRILL-Mo

Star

Hybrid Mamba-2 + Transformer 2.94B LLM (Nemotron-H style) — Korean 3B model pretrained from scratch on 7× NVIDIA B200 GPUs with SFT + DPO alignment

transformer sft dpo pretraining fp8 korean-llm nemotron hybrid-architecture mamba2 nvidia-b200

Updated Mar 26, 2026
Python

jcartu / qwen36-27b-blackwell-inference-study

Star

Systematic 24-hour benchmark study of Qwen3.6-27B inference on dual NVIDIA RTX PRO 6000 Blackwell SM120 (TP=2). 8 experiments comparing repne/vllm fork vs upstream vLLM across FP8/BF16/NVFP4/Q8_0 quants and MTP/DFlash speculative decoding. Peak: 2,083 tok/s at c=32. Quality: KLD vs BF16 = 0.0018 (noise floor).

benchmark inference blackwell bf16 fp8 vllm qwen speculative-decoding qwen3 nvfp4 rtx-pro-6000

Updated Jun 3, 2026
Python

Improve this page

Add a description, image, and links to the fp8 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fp8 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8

Here are 43 public repositories matching this topic...

NVIDIA / TransformerEngine

NVIDIA / cudnn-frontend

Azure / MS-AMP

intel / neural-speed

aredden / flux-fp8-api

Sandermage / genesis-vllm-patches

graphcore-research / jax-scalify

MerkyorLynn / lynn-engine

tashiscool / fp8-mps-metal

massif-01 / vllm_benchmark_block_fp8

zpqiu / rl-infra-notes

theogravity / dual-rtx-6000-blackwell-qwen3.6-27b-fp8

MurrellGroup / Microfloats.jl

stlin256 / CUDABurner

maeddesg / vulkanforge

klessydra / spike-with-minifloat-fp8-support

zerfoo / zerfoo

AICL-Lab / triton-fused-ops

pathcosmos / EVAFRILL-Mo

jcartu / qwen36-27b-blackwell-inference-study

Improve this page

Add this topic to your repo