This repository contains benchmark and accuracy tests for vkdispatch. It is intended to evaluate vkdispatch itself for paper-quality comparisons and repeatable measurements.
The primary user-facing entrypoint for running benchmarks is run_all_tests.py.
Required tools: Python 3, git, wget, tar, and bash
Required Python packages: numpy, and matplotlib
vkdispatch can be built from source or installed from PyPI:
pip install vkdispatchThis package includes the core vkdispatch library and the Vulkan backend. The OpenCL and CUDA backends can be optionally enabled by installing the pyopencl and cuda-python packages, respectively.
CUDA benchmark helper binaries and cuFFT/cuFFTDx comparisons require a CUDA toolkit (version 12 or higher) installation with nvcc. run_all_tests.py checks CUDA_HOME/bin/nvcc first if CUDA_HOME is set, otherwise it uses nvcc from PATH
CUDA accuracy helpers and cuFFTDx benchmarks rely on downloaded NVIDIA dependencies in dependencies/. These are automatically downloaded on the first run of run_all_tests.py and pinned to specific revisions for repeatability:
- NVIDIA MathDx version 25.06.1
- cutlass pinned to commit
e6e2cc29f5e7611dfc6af0ed6409209df0068cf2 - CUDALibrarySamples pinned to commit
a94482ebecf8b16d5b83ab276b7db3a84979f0e5(used by thetests/conv_scaled_nvidia/test suite)
run_all_tests.py is the main entrypoint for benchmark and accuracy generation. It accepts flags to specify which backends to test and whether to run validation-only paths for CUDA benchmarks. Some example invocations:
# For all 3 backends
python3 run_all_tests.py --vulkan --opencl --cuda
# For Vulkan only (this is the only backend that includes VkFFT)
python3 run_all_tests.py --vulkan
# For OpenCL only
python3 run_all_tests.py --opencl
# For CUDA only, with normal benchmarks
python3 run_all_tests.py --cudaThere is also a validation-only path for CUDA benchmarks that compares the cufftDx fused convolution outputs against the standard cufft + pointwise reference instead of running the normal throughput benchmarks. To run this path:
python3 run_all_tests.py --validate --cudaCurrent built-in benchmark settings in run_all_tests.py:
DATA_SIZE = 2**27
ITER_COUNT = 200
BATCH_SIZE = 20
REPEATS = 5Notes about first-time setup:
- Raw benchmark outputs are first written into
tests/<suite>/test_results/, then copied and merged into the top-leveltest_results/tree test_results/is created locally when the benchmarks run; it does not exist in the public repository until you generate results
To remove generated result directories before a fresh run:
bash clean_results.shThis removes the top-level test_results/ directory and the suite-local tests/*/test_results/ directories listed in the script.