A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
PyPI: https://pypi.org/project/nvfuser
nvFuser provides pre-built wheels for Python 3.10 and 3.12, available through multiple channels depending on your PyTorch version requirements.
Nightly nvFuser wheels are built against PyTorch:nightly and published to
https://pypi.nvidia.com:
pip install --pre nvfuser-cuXXY --extra-index-url https://pypi.nvidia.comNote
nvFuser supports CUDA 12.6+. cuXXY denotes the CUDA major XX and minor
Y version. If you have CUDA 12.8 use nvfuser-cu128.
To install nvFuser with a compatible PyTorch nightly build:
pip install --pre "nvfuser-cu128[torch]" --extra-index-url https://pypi.nvidia.comWarning
Installing with the [torch] extra will replace your existing PyTorch
installation with a compatible nightly build.
Stable wheels are built against PyTorch stable releases and published to both
https://pypi.org and https://pypi.nvidia.com. Select the package matching your
CUDA Toolkit version:
pip install nvfuser-cu128-torch29Releases are published on the 1st and 15th of each month, and when significant changes are introduced. For legacy versions, see PyPI.
Recommendation: Use the latest nvFuser build with the most recent CUDA Toolkit and PyTorch versions for optimal performance and features.
Important
Stable nvFuser release wheels are not guaranteed to be compatible with PyTorch nightly builds. Select the appropriate package for your environment.
- C++20 compliant compiler:
GCC>=13.1orClang>=19
Python>=3.10CMake>=3.18NinjaCUDA Toolkit>=12.6(recommend12.8+)PyTorch>=2.9(recommend lateststable/nightlyrelease)pybind11>=3.0LLVM>=18.1
Note
PyTorchMUST be built w/CUDAsupport.- The
PyTorch CUDA versionMUST match theCUDAToolkit version.
nvidia-matmul-heuristics(enhanced matmul scheduling)
- Clone the repository and initialize submodules:
git clone --recursive https://github.com/NVIDIA/Fuser.git
cd FuserIf you already cloned without --recursive, initialize submodules:
git submodule update --init --recursive- Install Python dependencies:
pip install -r requirements.txt- Build and install nvFuser:
pip install --no-build-isolation -e python -vThe build system will automatically validate all dependencies and provide helpful error messages if anything is missing.
You can customize the build using environment variables:
Build Configuration:
MAX_JOBS=<n>- Control compilation parallelism (e.g.,MAX_JOBS=8)NVFUSER_BUILD_BUILD_TYPE- Build in (Debug/RelWithDebInfo/Release) mode.NVFUSER_BUILD_DIR=<path>- Custom build directory (default:./python/build)NVFUSER_BUILD_INSTALL_DIR=<path>- Custom install directory (default:./nvfuser)
Build Targets:
NVFUSER_BUILD_NO_PYTHON=1- Skip Python bindings.NVFUSER_BUILD_NO_TEST=1- Skip C++ tests.NVFUSER_BUILD_NO_BENCHMARK=1- Skip benchmarks.
Advanced Options:
NVFUSER_BUILD_WITH_UCC=1- Enable UCC support for multi-device operations.NVFUSER_BUILD_WITHOUT_DISTRIBUTED=1- Build without multi-device support.NVFUSER_BUILD_CPP_STANDARD=<n>- Specify C++ standard (default: 20).
Example with custom options:
MAX_JOBS=8 NVFUSER_BUILD_BUILD_TYPE=Debug pip install --no-build-isolation -e python -vTest your installation with a simple fusion:
python -c "import nvfuser; print('nvFuser successfully imported from:', nvfuser.__file__)"Run the Python test suite:
pytest tests/python/Run C++ tests (if built):
./build/bin/test_nvfuser