A fast and memory-efficient image processing library utilizing parallel programming on CPU implemented in C++, with GPU acceleration capabilities using CUDA. Bindings generated using nanobind provide a Pythonic interface to the library.
The supported image processing operations are:
- Grayscale Conversion
- Histogram Equalization
- Edge Detection (GPU support)
- Blur
The supported image formats are:
- PNG
- JPG
Supported platforms are:
- Linux
- macOS (coming soon!)
- C++ (required): compiler that supports C++20
- CMake 3.18+ (required) :
- Linux:
sudo apt-get -y install cmake - macOS:
brew install cmake
- Linux:
- Python 3.7+ (required) :
- Linux:
sudo apt install python3-dev - macOS:
brew install python - [github link]
- Linux:
- TBB (required) :
- Linux:
sudo apt-get install libtbb-dev - macOS:
brew install tbb - [github link]
- Linux:
- nanobind (required) : included as a git submodule in
src/external/nanobind[github link] - CUDA Toolkit & Driver (optional): NVIDIA Installation guide
git clone --recurse-submodules git@github.com:pavan1011/fast-img-proc
cd fast-img-proc && mkdir build && cd buildcmake -S ../ -B .translates to
cmake -DCMAKE_BUILD_TYPE=Release -S ../ -B .Linux:
# Install python virtual env (if not installed)
sudo apt install python3-virtualenv
# Create build directory
mkdir build && cd build
# Create python virtual environment
python -m venv /path/to/venv
source /path/to/venv/bin/activate
# virtual environment activated
# To deactivate python venv
deactivateRequires CUDA compiler installed
cmake -S ../ -B . -DUSE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12/bin/nvccNOTE: The paths provided above /usr/local/cuda-12 might be different on your machine. Update them with the paths specific to your CUDA configuration.
cmake --build .Following the above steps generates fast_image_processing.cpython-<python-version>-<arch>-<platform>.so in /path/to/fast-img-proc/build
export PYTHONPATH=$PYTHONPATH:/path/to/build/directory-
-DCMAKE_BUILD_TYPE: provides option to set the following types of builds:- Debug: shows debug, info, warn, and, error logs
- Release: warn and error logs only
-
-DUSE_CUDA: optionally enables GPU acceleration for supported image processing algorithms -
-DBUILD_DOCUMENTATION: optionally enables detailed documentation generation locally using Doxygen -
-DCMAKE_CUDA_COMPILER: path to CUDA compiler. Required if-DUSE_CUDAis set to ON. Usually at/usr/local/cuda-<version>/bin/nvcc -
-DCUDA_TOOLKIT_ROOT_DIR: path to CUDA toolkit. Required if-DUSE_CUDAis set to ON. Usually at/usr/local/cuda-<version> -
-DPYTHON_EXECUTABLEprovides a hint to the CMake build system to help it find a specific version of Python (for virtual environments and non-default python installations).
More examples along with performance profiling are available in fast_img_proc/scripts.
Below is the basic usage:
# fast-img-proc exposed as fast_image_processing using nanobind
import fast_image_processing as fip
# If you didn't update the PYTHONPATH to point to the build directory
# Do the following to link the generated .so to your python script
import sys
sys.path.append('/path/to/your/build/directory')
def main():
# Load an RGB image (PNG or JPG supported)
# using stbi_load from stb library
input_image = fip.Image("input.png")
#Check if GPU with CUDA available
print(f"GPU Available: {fip.is_gpu_available()}")
print(f"Active Hardware: {fip.get_active_hardware()}")
# Examples using automatic hardware selection
# Convert to grayscale using automatic hardware selection (default)
auto_grayscale = fip.grayscale(input_image)
# Save resultant grayscale image
auto_grayscale.save("grayscale_auto.png")
# Apply histogram equalization using automatic hardware selection (default)
auto_equalize_histogram = fip.equalize_histogram(input_image)
auto_equalize_histogram.save("blur_auto.png")
# Apply Gaussian blur using automatic hardware selection (default)
auto_blur = fip.blur(input_image)
auto_blur.save("grayscale_auto.png")
# Apply Sobel edge detection using automatic hardware selection (default)
# Derivative on x-axis, smoothing on y-axis, kernel_size = 5x5
auto_edge_detect_1_0_5 = edge_detect(input_image, 1, 0, 5, fip.Hardware.CPU)
auto_edge_detect.save("auto_edge_detect_1_0_5.png")
# Examples using CPU
# Convert to grayscale on CPU
cpu_grayscale = fip.grayscale(input_image, fip.Hardware.CPU)
cpu_grayscale.save("grayscale_cpu.png")
# Equalize Histogram of an RGB image on CPU
cpu_hist_equalized_rgb = fip.equalize_histogram(input_image, fip.Hardware.CPU)
cpu_hist_equalized_rgb.save("hist_equalized_rgb_cpu.png")
# Equalize Histogram of a grayscale image on CPU
cpu_hist_equalized_gray = fip.equalize_histogram(cpu_grayscale, fip.Hardware.CPU)
cpu_hist_equalized_gray.save("hist_equalized_gray_cpu.png")
# Edge detection on CPU
cpu_edge_1_0_5 = fip.edge_detect(input_image, 1, 0, 5, fip.Hardware.CPU)
cpu_edge_1_0_5.save("edge_1_0_5_cpu.png")
# Derivatives on y-axis, smoothing on x-axis, kernel_size = 5
cpu_edge_0_1_5 = fip.edge_detect(input_image, 0, 1, 5, fip.Hardware.CPU)
cpu_edge_0_1_5.save("edge_0_1_5_cpu.png")
# Derivatives on x-axis and y axis, kernel_size = 5
cpu_edge_1_1_5 = fip.edge_detect(input_image, 0, 1, 5, fip.Hardware.CPU)
cpu_edge_1_1_5.save("edge_0_1_5_cpu.png")
try:
# Edge detection on GPU
# Derivatives on x-axis, smoothing on y-axis, kernel_size = 5x5
gpu_edge_1_0_5 = fip.edge_detect(input_image, 1, 0, 5, fip.Hardware.GPU)
gpu_edge_1_0_5.save("edge_1_0_5_gpu.png")
# Smoothing on x-axis, derivative on y-axis, kernel_size = 5x5
gpu_edge_0_1_5 = fip.edge_detect(input_image, 0, 1, 5, fip.Hardware.GPU)
gpu_edge_0_1_5.save("edge_0_1_5_gpu.png")
# Derivative on x and y axis, kernel_size = 5x5
gpu_edge_1_1_5 = fip.edge_detect(input_image, 0, 1, 5, fip.Hardware.GPU)
gpu_edge_1_1_5.save("edge_0_1_5_gpu.png")
except RuntimeError as ex:
printf(f"GPU processing failed: {ex}")
The default build disables building tests. However, if you want to enable them to run tests locally you can follow the below instructions.
Linux:
sudo apt-get install libgtest-devmacOS:
brew install googletestpython3 -m pip install pytestcmake -DBUILD_TESTS=ON .. <other CMake options>
cmake --build . --target cpp_tests
# Run all tests
ctest
# Run with verbose output
ctest -V
ctest -R python # Run only Python testsA detailed version of documentation of the source files, including class and member definitions, function signatures, and other implementation details can be generated locally using this project's source files.
Linux :
sudo apt-get install doxygen graphvizmacOS:
brew install doxygen graphvizcd fast-img-proc && mkdir build_docs && cd build_docs
cmake -S ../ -B . <your-build-flags> -DBUILD_DOCUMENTATION=ONcmake --build . --target docsThis will generate detailed documentation which can be viewed by opening path/to/build_docs/docs/html/html.index.
cmake -DCMAKE_BUILD_TYPE=<build-type> -DUSE_CUDA=ON -DPYTHON_EXECUTABLE=<path-to-python> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12/bin/nvcc -DBUILD_TESTS=ON -DBUILD_DOCUMENTATION=ON -S ../ -B .cmake --build .
cmake --build . --target docsCurrently, only Sobel edge detection is supported to run on GPU.
The benchmarking script can be run locally from fast-img-proc/scripts/benchmark_edge_detect.py as follows:
cd fast-img-proc
mkdir benchmark_results
python3 ./scripts/benchmark_edge_detect.py <path/to/input_images> benchmark_resultsThis runs sobel edge detection and stores resulting images in benchmark_results and benchmark_results.csv in the current directory.
Results suggest 3-5X improvement in runtimes when comparing GPU runtimes with CPU on larger images (1+ MB).
After benchmarking on images of different dimensions (w x h).
Mean Speedup on GPU vs CPU by Kernel Size:
| kernel_size | Speedup factor |
|---|---|
| 3 | 3.813 |
| 5 | 3.969 |
| 7 | 4.869 |
Mean Speedup by Image Size::
| Image Dims | Image Size | Speedup factor |
|---|---|---|
| 960 x 640 | 38 KB | 1.091 |
| 2048 x 2048 | 3.6 MB | 3.340 |
| 2400 x 2400 | 6 MB | 4.060 |
| 3600 x 3600 | 49 KB (gray) | 2.562 |
| 6200 x 6200 | 26.7 MB | 4.264 |
| 9393 x 4270 | 27.3 MB | 4.868 |
| 11472 x 6429 | 93 MB | 4.527 |
The stb library from https://github.com/nothings/stb (MIT and Public Domain licenses) was used to populate fast-img-proc/external/stb.
stb_image.h: used to load images and represent them as buffers for further processing.stb_image_write.h: used to save images after processing.
The nanobind library from https://github.com/wjakob/nanobind (BSD-3-Clause license) was used to generate pythonic bindings to fast-img-proc C++ library.
The detailed documentation is generated locally using Doyxgen: https://www.doxygen.nl/index.html.
The graphviz library from https://github.com/graphp/graphviz (MIT license) was used to generate dependency diagrams from this project's source files.
The doxygen-awesome-css library from https://github.com/jothepro/doxygen-awesome-css (MIT license) was used for custom styling in this documentation, namely:
fast-img-proc/docs/doxygen-awesome-sidebar-only.cssfast-img-proc/docs/doxygen-awesome.css
My special thanks to the authors and contributors of all the above libraries.