ternaus · ternaus · Jan 22, 2025 · Jan 21, 2025 · Jan 21, 2025 · Jan 21, 2025
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -14,7 +14,7 @@ jobs:
     strategy:
       matrix:
         operating-system: [ubuntu-latest]
-        python-version: ["3.10"]
+        python-version: ["3.12"]
       fail-fast: false
     steps:
     - name: Checkout
@@ -30,6 +30,5 @@ jobs:
     - name: Install dev requirements
       run: |
         pip install -r requirements-dev.txt
-        pip install -r requirements.txt
     - name: Run checks
       run: pre-commit run --files $(find imread_benchmark -type f)
diff --git a/.gitignore b/.gitignore
@@ -108,3 +108,5 @@ venv.bak/
 
 .idea/
 .ruff_cache/
+
+venvs/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -59,10 +59,6 @@ repos:
     hooks:
       - id: codespell
         additional_dependencies: ["tomli"]
-  - repo: https://github.com/igorshubovych/markdownlint-cli
-    rev: v0.43.0
-    hooks:
-      - id: markdownlint
   - repo: https://github.com/tox-dev/pyproject-fmt
     rev: "v2.5.0"
     hooks:

diff --git a/README.md b/README.md
@@ -1,101 +1,148 @@
-[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
-[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
-
-# Image Loading Benchmark: From JPG to RGB Numpy Arrays
-
-![Benchmark-2024-06-05](images/2024-06-05.png)
-
-This benchmark evaluates the efficiency of different libraries in loading JPG images and converting them into RGB numpy arrays, essential for neural network training data preparation. Inspired by the [Albumentations library](https://github.com/albumentations-team/albumentations/).
+# Image Loading Benchmark
+## Overview
+
+This benchmark evaluates the efficiency of different libraries in loading JPG images
+and converting them into RGB numpy arrays, essential for neural network training
+data preparation. The study compares traditional image processing libraries (Pillow, OpenCV),
+machine learning frameworks (TensorFlow, PyTorch), and specialized decoders (jpeg4py, kornia-rs)
+across different computing architectures.
+
+<table>
+  <tr>
+    <td><img src="images/performance_darwin.png" alt="Darwin Performance" width="400"/></td>
+    <td><img src="images/performance_linux.png" alt="Linux Performance" width="400"/></td>
+  </tr>
+  <tr>
+    <td align="center">Performance on Apple Silicon (M4 Max)</td>
+    <td align="center">Performance on Linux (AMD Threadripper)</td>
+  </tr>
+</table>
 
 ## Important Note on Image Conversion
 
-In the benchmark, it's crucial to standardize image formats for a fair comparison, despite different default formats used by OpenCV (BGR), torchvision, and TensorFlow (tensors). A conversion step to RGB numpy arrays is included for consistency. Note that in typical use cases, torchvision and TensorFlow do not require this conversion. Preliminary analysis shows that this extra step does not significantly impact the comparative performance of the libraries, ensuring that the benchmark accurately reflects realistic end-to-end image loading and preprocessing times.
+In the benchmark, it's crucial to standardize image formats for a fair comparison.
+Different libraries use different default formats: OpenCV (BGR), torchvision and
+TensorFlow (tensors). A conversion step to RGB numpy arrays is included for
+consistency. Note that in typical use cases, torchvision and TensorFlow do not
+require this conversion.
 
 ## Installation and Setup
 
-Before running the benchmark, ensure your system is equipped with the necessary dependencies. Start by installing `libturbojpeg`:
+Before running the benchmark, ensure your system is equipped with the necessary
+dependencies:
+
+### System Requirements
 
 ```bash
+# On Ubuntu/Debian
 sudo apt-get install libturbojpeg
-```
 
-Next, install all required Python libraries listed in `requirements.txt`:
+### Python Setup
+
+The benchmark uses separate virtual environments for each library to avoid
+dependency conflicts. You'll need:
 
 ```bash
-sudo apt install requirements.txt
+# Install uv for faster package installation
+pip install uv
 ```
 
-Note: If you want to update package versions in `requirements.txt`
+## Running the Benchmark
+
+The benchmark script creates separate virtual environments for each library and
+runs tests independently:
 
 ```bash
-pip install pip-tools
-```
+# Make the script executable
+chmod +x run_benchmarks.sh
 
-```bash
-pip-compile requirements.in
-```
-this will create new `requirements.txt` file
+# Show help and options
+./run_benchmarks.sh --help
 
-```bash
-pip install -r requirements.txt
+# Run benchmark with default settings (2000 images, 5 runs)
+./run_benchmarks.sh /path/to/images
+
+# Run with custom settings
+./run_benchmarks.sh /path/to/images 1000 3
 ```
-to install latest versions
 
-## Running the Benchmark
+The script will:
 
-To understand the benchmark's configuration options and run it according to your setup, use the following commands:
+1. Create separate virtual environments for each library
+2. Install required dependencies using `uv`
+3. Run benchmarks independently
+4. Save results to OS-specific directories
 
-```bash
-python imread_benchmark/benchmark.py -h
-
-usage: benchmark.py [-h] [-d DIR] [-n N] [-r N] [--show-std] [-m] [-p] [-s] [-o OUTPUT_PATH]
-
-Image reading libraries performance benchmark
-
-options:
-  -h, --help            show this help message and exit
-  -d DIR, --data-dir DIR
-                        path to a directory with images
-  -n N, --num_images N  number of images for benchmarking (default: 2000)
-  -r N, --num_runs N    number of runs for each benchmark (default: 5)
-  --show-std            show standard deviation for benchmark runs
-  -m, --markdown        print benchmarking results as a markdown table
-  -p, --print-package-versions
-                        print versions of packages
-  -s, --shuffle         Shuffle the list of images.
-  -o OUTPUT_PATH, --output_path OUTPUT_PATH
-                        Path to save resulting dataframe.
-```
+### Results Structure
 
+Results are saved in JSON format under:
 
-```bash
-python imread_benchmark/benchmark.py \
-    --data-dir <path to image folder> \
-    --num_images <num_images> \
-    --num_runs <number of runs> \
-    --show-std \
-    --print-package-versions \
-    --print-package-versions
+```text
+output/
+├── linux/          # When run on Linux
+│   ├── opencv_results.json
+│   ├── pil_results.json
+│   └── ...
+└── darwin/         # When run on macOS
+    ├── opencv_results.json
+    ├── pil_results.json
+    └── ...
 ```
 
-Extra options:
-`--print-package-versions` - to print benchmarked libraries versions
-`--print-package-versions` - to shuffle images on every run
-`--show-std` - to show standard deviation for measurements
+## Libraries Being Benchmarked
+
+Each library uses different underlying JPEG decoders and implementation approaches:
+
+### Direct libjpeg-turbo Users (Fastest)
+- jpeg4py (Linux only) - Direct libjpeg-turbo binding
+- kornia-rs - Modern Rust-based implementation
+- OpenCV (opencv-python-headless)
+- torchvision
+
+### Standard libjpeg Users
+- PIL (Pillow)
+- Pillow-SIMD (Linux only)
+- scikit-image
+- imageio
+
+### Machine Learning Framework Components
+- tensorflow
+- torchvision
+- kornia-rs
+
+
+## Performance Considerations
+
+Several factors influence real-world performance beyond raw decoding speed:
+
+### Memory Usage
+- Memory utilization varies significantly across libraries
+- Some implementations (like kornia-rs) have specific memory allocation optimizations
+- Consider available system resources when scaling to batch processing
+
+### System Integration
+- All benchmarks performed on NVMe SSDs to minimize I/O variance
+- Single-threaded performance reported
+- Multi-threading capabilities vary between libraries
+
+### Image Characteristics
+- Results based on typical ImageNet JPEG images (~500x400 pixels)
+- Performance scaling with image size varies between implementations
+- Compression ratio and JPEG encoding parameters can influence decoding speed
 
-## Hardware and Software Specifications
+## Recommendations
 
-**CPU**: AMD Ryzen Threadripper 3970X 32-Core Processor
+### High-Performance Applications
+- Use kornia-rs or OpenCV for consistent cross-platform performance
+- On Linux, consider jpeg4py for maximum performance
+- Consider memory usage if processing many images simultaneously
 
-## Results
+### Cross-Platform Development
+- kornia-rs provides the most consistent performance
+- OpenCV and torchvision offer good balance of features and speed
+- Test with representative image sizes and batching patterns
 
-|    | Library                | Version   | Performance (images/sec)   |
-|---:|:-----------------------|:----------|:---------------------------|
-|  0 | scikit-image           | 0.23.2    | 538.48 ± 6.86              |
-|  1 | imageio                | 2.34.1    | 538.58 ± 6.84              |
-|  2 | opencv-python-headless | 4.10.0.82 | 631.46 ± 0.43              |
-|  3 | pillow                 | 10.3.0    | 589.56 ± 8.79              |
-|  4 | jpeg4py                | 0.1.4     | 700.60 ± 0.88              |
-|  5 | torchvision            | 0.18.1    | 658.68 ± 0.78              |
-|  6 | tensorflow             | 2.16.1    | 704.43 ± 1.10              |
-|  7 | kornia-rs              | 0.1.1     | 682.95 ± 1.21              |
+### Feature-Rich Applications
+- When needing extensive image processing features, OpenCV remains a strong choice
+- Consider dependency size and installation complexity
+- Evaluate the full image processing pipeline, not just JPEG decoding
diff --git a/images/2024-02-26.png b/images/2024-02-26.png
diff --git a/images/2024-03-11.png b/images/2024-03-11.png
diff --git a/images/2024-06-05.png b/images/2024-06-05.png
diff --git a/images/performance_darwin.png b/images/performance_darwin.png
diff --git a/images/performance_linux.png b/images/performance_linux.png
Original file line number	Diff line number	Diff line change
Expand Up		@@ -108,3 +108,5 @@ venv.bak/

		.idea/
		.ruff_cache/

		venvs/