Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
strategy:
matrix:
operating-system: [ubuntu-latest]
python-version: ["3.10"]
python-version: ["3.12"]
fail-fast: false
steps:
- name: Checkout
Expand All @@ -30,6 +30,5 @@ jobs:
- name: Install dev requirements
run: |
pip install -r requirements-dev.txt
pip install -r requirements.txt
- name: Run checks
run: pre-commit run --files $(find imread_benchmark -type f)
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,5 @@ venv.bak/

.idea/
.ruff_cache/

venvs/
4 changes: 0 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,6 @@ repos:
hooks:
- id: codespell
additional_dependencies: ["tomli"]
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.43.0
hooks:
- id: markdownlint
- repo: https://github.com/tox-dev/pyproject-fmt
rev: "v2.5.0"
hooks:
Expand Down
189 changes: 118 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,148 @@
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

# Image Loading Benchmark: From JPG to RGB Numpy Arrays

![Benchmark-2024-06-05](images/2024-06-05.png)

This benchmark evaluates the efficiency of different libraries in loading JPG images and converting them into RGB numpy arrays, essential for neural network training data preparation. Inspired by the [Albumentations library](https://github.com/albumentations-team/albumentations/).
# Image Loading Benchmark
## Overview

This benchmark evaluates the efficiency of different libraries in loading JPG images
and converting them into RGB numpy arrays, essential for neural network training
data preparation. The study compares traditional image processing libraries (Pillow, OpenCV),
machine learning frameworks (TensorFlow, PyTorch), and specialized decoders (jpeg4py, kornia-rs)
across different computing architectures.

<table>
<tr>
<td><img src="images/performance_darwin.png" alt="Darwin Performance" width="400"/></td>
<td><img src="images/performance_linux.png" alt="Linux Performance" width="400"/></td>
</tr>
<tr>
<td align="center">Performance on Apple Silicon (M4 Max)</td>
<td align="center">Performance on Linux (AMD Threadripper)</td>
</tr>
</table>

## Important Note on Image Conversion

In the benchmark, it's crucial to standardize image formats for a fair comparison, despite different default formats used by OpenCV (BGR), torchvision, and TensorFlow (tensors). A conversion step to RGB numpy arrays is included for consistency. Note that in typical use cases, torchvision and TensorFlow do not require this conversion. Preliminary analysis shows that this extra step does not significantly impact the comparative performance of the libraries, ensuring that the benchmark accurately reflects realistic end-to-end image loading and preprocessing times.
In the benchmark, it's crucial to standardize image formats for a fair comparison.
Different libraries use different default formats: OpenCV (BGR), torchvision and
TensorFlow (tensors). A conversion step to RGB numpy arrays is included for
consistency. Note that in typical use cases, torchvision and TensorFlow do not
require this conversion.

## Installation and Setup

Before running the benchmark, ensure your system is equipped with the necessary dependencies. Start by installing `libturbojpeg`:
Before running the benchmark, ensure your system is equipped with the necessary
dependencies:

### System Requirements

```bash
# On Ubuntu/Debian
sudo apt-get install libturbojpeg
```

Next, install all required Python libraries listed in `requirements.txt`:
### Python Setup

The benchmark uses separate virtual environments for each library to avoid
dependency conflicts. You'll need:

```bash
sudo apt install requirements.txt
# Install uv for faster package installation
pip install uv
```

Note: If you want to update package versions in `requirements.txt`
## Running the Benchmark

The benchmark script creates separate virtual environments for each library and
runs tests independently:

```bash
pip install pip-tools
```
# Make the script executable
chmod +x run_benchmarks.sh

```bash
pip-compile requirements.in
```
this will create new `requirements.txt` file
# Show help and options
./run_benchmarks.sh --help

```bash
pip install -r requirements.txt
# Run benchmark with default settings (2000 images, 5 runs)
./run_benchmarks.sh /path/to/images

# Run with custom settings
./run_benchmarks.sh /path/to/images 1000 3
```
to install latest versions

## Running the Benchmark
The script will:

To understand the benchmark's configuration options and run it according to your setup, use the following commands:
1. Create separate virtual environments for each library
2. Install required dependencies using `uv`
3. Run benchmarks independently
4. Save results to OS-specific directories

```bash
python imread_benchmark/benchmark.py -h

usage: benchmark.py [-h] [-d DIR] [-n N] [-r N] [--show-std] [-m] [-p] [-s] [-o OUTPUT_PATH]

Image reading libraries performance benchmark

options:
-h, --help show this help message and exit
-d DIR, --data-dir DIR
path to a directory with images
-n N, --num_images N number of images for benchmarking (default: 2000)
-r N, --num_runs N number of runs for each benchmark (default: 5)
--show-std show standard deviation for benchmark runs
-m, --markdown print benchmarking results as a markdown table
-p, --print-package-versions
print versions of packages
-s, --shuffle Shuffle the list of images.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path to save resulting dataframe.
```
### Results Structure

Results are saved in JSON format under:

```bash
python imread_benchmark/benchmark.py \
--data-dir <path to image folder> \
--num_images <num_images> \
--num_runs <number of runs> \
--show-std \
--print-package-versions \
--print-package-versions
```text
output/
├── linux/ # When run on Linux
│ ├── opencv_results.json
│ ├── pil_results.json
│ └── ...
└── darwin/ # When run on macOS
├── opencv_results.json
├── pil_results.json
└── ...
```

Extra options:
`--print-package-versions` - to print benchmarked libraries versions
`--print-package-versions` - to shuffle images on every run
`--show-std` - to show standard deviation for measurements
## Libraries Being Benchmarked

Each library uses different underlying JPEG decoders and implementation approaches:

### Direct libjpeg-turbo Users (Fastest)
- jpeg4py (Linux only) - Direct libjpeg-turbo binding
- kornia-rs - Modern Rust-based implementation
- OpenCV (opencv-python-headless)
- torchvision

### Standard libjpeg Users
- PIL (Pillow)
- Pillow-SIMD (Linux only)
- scikit-image
- imageio

### Machine Learning Framework Components
- tensorflow
- torchvision
- kornia-rs


## Performance Considerations

Several factors influence real-world performance beyond raw decoding speed:

### Memory Usage
- Memory utilization varies significantly across libraries
- Some implementations (like kornia-rs) have specific memory allocation optimizations
- Consider available system resources when scaling to batch processing

### System Integration
- All benchmarks performed on NVMe SSDs to minimize I/O variance
- Single-threaded performance reported
- Multi-threading capabilities vary between libraries

### Image Characteristics
- Results based on typical ImageNet JPEG images (~500x400 pixels)
- Performance scaling with image size varies between implementations
- Compression ratio and JPEG encoding parameters can influence decoding speed

## Hardware and Software Specifications
## Recommendations

**CPU**: AMD Ryzen Threadripper 3970X 32-Core Processor
### High-Performance Applications
- Use kornia-rs or OpenCV for consistent cross-platform performance
- On Linux, consider jpeg4py for maximum performance
- Consider memory usage if processing many images simultaneously

## Results
### Cross-Platform Development
- kornia-rs provides the most consistent performance
- OpenCV and torchvision offer good balance of features and speed
- Test with representative image sizes and batching patterns

| | Library | Version | Performance (images/sec) |
|---:|:-----------------------|:----------|:---------------------------|
| 0 | scikit-image | 0.23.2 | 538.48 ± 6.86 |
| 1 | imageio | 2.34.1 | 538.58 ± 6.84 |
| 2 | opencv-python-headless | 4.10.0.82 | 631.46 ± 0.43 |
| 3 | pillow | 10.3.0 | 589.56 ± 8.79 |
| 4 | jpeg4py | 0.1.4 | 700.60 ± 0.88 |
| 5 | torchvision | 0.18.1 | 658.68 ± 0.78 |
| 6 | tensorflow | 2.16.1 | 704.43 ± 1.10 |
| 7 | kornia-rs | 0.1.1 | 682.95 ± 1.21 |
### Feature-Rich Applications
- When needing extensive image processing features, OpenCV remains a strong choice
- Consider dependency size and installation complexity
- Evaluate the full image processing pipeline, not just JPEG decoding
Binary file removed images/2024-02-26.png
Binary file not shown.
Binary file removed images/2024-03-11.png
Binary file not shown.
Binary file removed images/2024-06-05.png
Binary file not shown.
Binary file added images/performance_darwin.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/performance_linux.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading