High-performance image fingerprinting library for Rust with multi-algorithm perceptual hashing, exact matching, and semantic embeddings.
imgfprint provides multiple complementary approaches to image identification and similarity detection:
| Method | Use Case | Speed | Precision |
|---|---|---|---|
| BLAKE3 | Exact deduplication | ~0.2ms | 100% exact |
| AHash | Fast similarity | ~0.3ms | Average-based, simplest |
| PHash | Perceptual similarity | ~1.5ms | DCT-based, resilient to compression |
| DHash | Structural similarity | ~0.5ms | Gradient-based, good for crops |
| Multi | Combined accuracy | ~1.8ms | Weighted AHash+PHash+DHash (10/60/30) |
| Semantic | Content understanding | Local or API | Captures visual meaning |
Perfect for:
- Duplicate image detection
- Similarity search
- Content moderation
- Image deduplication at scale
- Content-based image retrieval (CBIR)
- Multi-Algorithm Support - AHash (average) + PHash (DCT-based) + DHash (gradient-based) with weighted combination
- Deterministic Output - Same input always produces same fingerprint
- BLAKE3 Exact Hash - Byte-identical detection (6-8x faster than SHA256)
- Block-Level Hashing - 4x4 grid for crop resistance
- EXIF Orientation - Automatically corrects JPEG orientation from camera metadata
- Semantic Embeddings - CLIP-style vector representations via external providers or local ONNX models
- Embedding Model ID - Tag embeddings with model identifiers to prevent comparing incompatible models
- SIMD Acceleration - AVX2/NEON optimized resizing
- Parallel Processing - Multi-core batch operations
- Zero-Copy APIs - Minimal allocations in hot paths
- Serde Support - JSON/binary serialization
- Security Hardened - OOM protection (8192px max), no panics on malformed input
- Multiple Formats - PNG, JPEG, GIF, WebP, BMP
[dependencies]
imgfprint = "0.4.1"| Feature | Default | Description |
|---|---|---|
serde |
Yes | Serialization support (JSON, binary) |
parallel |
Yes | Parallel batch processing with rayon |
local-embedding |
No | Local ONNX model inference for semantic embeddings |
tracing |
No | Observability hooks for production debugging |
Minimal build (no parallel processing):
[dependencies]
imgfprint = { version = "0.4.1", default-features = false }With local embeddings (requires ONNX model):
[dependencies]
imgfprint = { version = "0.4.1", features = ["local-embedding"] }use imgfprint::ImageFingerprinter;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let img1 = std::fs::read("photo1.jpg")?;
let img2 = std::fs::read("photo2.jpg")?;
// Compute all hashes (AHash + PHash + DHash) for best accuracy
let fp1 = ImageFingerprinter::fingerprint(&img1)?;
let fp2 = ImageFingerprinter::fingerprint(&img2)?;
let sim = fp1.compare(&fp2);
println!("Similarity: {:.2}", sim.score);
println!("Exact match: {}", sim.exact_match);
if sim.score > 0.8 {
println!("Images are perceptually similar");
}
Ok(())
}use imgfprint::{ImageFingerprinter, HashAlgorithm};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let img = std::fs::read("photo.jpg")?;
// Use specific algorithm for better speed
let fp = ImageFingerprinter::fingerprint_with(&img, HashAlgorithm::DHash)?;
Ok(())
}For complete API reference and usage examples, see USAGE.md.
Contains AHash, PHash, and DHash for enhanced accuracy:
MultiHashFingerprint
├── exact: [u8; 32] // BLAKE3 of original bytes
├── ahash: ImageFingerprint // AHash results
│ ├── global_hash: u64
│ └── block_hashes: [u64; 16]
├── phash: ImageFingerprint // PHash results
│ ├── global_hash: u64
│ └── block_hashes: [u64; 16]
└── dhash: ImageFingerprint // DHash results
├── global_hash: u64
└── block_hashes: [u64; 16]
ImageFingerprint
├── exact: [u8; 32] // BLAKE3 of original bytes
├── global_hash: u64 // Algorithm-specific hash (center 32x32)
└── block_hashes: [u64; 16] // Block-level hashes (4x4 grid, 64x64 each)
- Decode - Parse any supported format (PNG, JPEG, GIF, WebP, BMP) into RGB with EXIF orientation correction for JPEG
- Normalize - Resize to 256x256 using SIMD-accelerated Lanczos3 filter
- Convert - RGB to Grayscale (luminance)
- Parallel Hash Computation - All three algorithms computed simultaneously:
- AHash: Average-based, resample to 8x8, compare to mean
- PHash: DCT-based, center 32x32 + 4x4 blocks
- DHash: Gradient-based, resample to 9x8
- Exact Hash - BLAKE3 of original bytes
When using MultiHashFingerprint, the similarity score uses weighted combination with block-level similarity. The defaults below ship as MultiHashConfig::default() and are reproduced by compare():
- 10% AHash similarity (average hash, fastest, simplest)
- 60% PHash similarity (DCT-based, robust to compression)
- 30% DHash similarity (gradient-based, good for structural changes)
Within each algorithm, similarity is computed as:
- 40% global hash similarity (overall structure)
- 60% block-level similarity (crop resistance via 4x4 grid)
All six knobs above plus block_distance_threshold (default 32 of 64) are
tunable via MultiHashConfig without forking.
This provides superior crop resistance compared to global-only comparison.
Pass a MultiHashConfig to compare_with_config to shift the trade-off — useful when an integrator (UCFP, downstream pipelines) wants per-deployment scoring without forking:
use imgfprint::{ImageFingerprinter, MultiHashConfig};
let bytes_a = std::fs::read("a.jpg")?;
let bytes_b = std::fs::read("b.jpg")?;
let fp_a = ImageFingerprinter::fingerprint(&bytes_a)?;
let fp_b = ImageFingerprinter::fingerprint(&bytes_b)?;
// PHash-only scoring — useful when AHash/DHash aren't trusted on this corpus.
// Setting an algorithm weight to 0.0 removes it from the score.
let cfg = MultiHashConfig {
ahash_weight: 0.0,
phash_weight: 1.0,
dhash_weight: 0.0,
..MultiHashConfig::default()
};
let sim = fp_a.compare_with_config(&fp_b, &cfg);
# Ok::<_, Box<dyn std::error::Error>>(())PreprocessConfig exposes the input-size and dimension caps that were previously hardcoded. The same config gates both the pre-read file-size check and the decode-time guards, so tightened limits aren't silently bypassed via the path API:
use imgfprint::{ImageFingerprinter, PreprocessConfig};
// Tight ingest path: 1 MiB max, 2048 max edge, default 32 min edge.
let cfg = PreprocessConfig {
max_input_bytes: 1 * 1024 * 1024,
max_dimension: 2048,
..PreprocessConfig::default()
};
let fp = ImageFingerprinter::fingerprint_path_with_preprocess("untrusted.jpg", &cfg)?;
# Ok::<_, Box<dyn std::error::Error>>(())Benchmarked on Intel i5 11th gen (16 GB RAM , 4 cores 8 threads):
| Operation | Time | Throughput |
|---|---|---|
fingerprint() |
1.35ms | ~740 images/sec |
compare() |
385ns | 2.6B comparisons/sec |
batch() (10 images) |
6.16ms | 1,620 images/sec (parallel) |
semantic_similarity() |
~500ns | 2M comparisons/sec |
Run benchmarks:
cargo bench- Maximum image dimension: 8192x8192 (OOM protection)
- Dimension check before full decode
- Pre-allocated buffers in context API
- Zero-copy where possible
- OOM Protection: Maximum image size 8192x8192 pixels (configurable)
- Deterministic Output: Same input always produces same output
- No Panics: All error conditions return
Result - Fast Hashing: BLAKE3 computation (6-8x faster than SHA256)
- Input Validation: Comprehensive format and size validation
| Feature | imgfprint-rs | imagehash | img_hash |
|---|---|---|---|
| BLAKE3 exact | Yes | No | No |
| AHash | Yes | Yes | Yes |
| PHash | Yes | Yes | Yes |
| DHash | Yes | Yes | Yes |
| Multi-algorithm | Yes | No | No |
| Block hashes | Yes | No | No |
| Semantic embeddings | Yes | No | No |
| Local ONNX inference | Yes | No | No |
| Parallel batch | Yes | No | No |
| SIMD acceleration | Yes | No | No |
| Context API | Yes | No | No |
See the examples/ directory for complete working examples:
batch_process.rs- Process millions of images efficientlycompare_images.rs- Compare two images and show similarityfind_duplicates.rs- Find duplicate images in a directoryserialize.rs- Serialize/deserialize fingerprints to JSON and binarysimilarity_search.rs- Perceptual similarity search in a directorysemantic_search.rs- Content-based image search with CLIP embeddings (requireslocal-embeddingfeature)
Run an example:
# Compare two images
cargo run --example compare_images -- images/photo1.jpg images/photo2.jpg
# Semantic search with local CLIP model (requires local-embedding feature)
cargo run --example semantic_search --features local-embedding -- model.onnx query.jpg ./images 0.85Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests:
cargo test - Run clippy:
cargo clippy --all-targets -- -D warnings - Run benchmarks:
cargo bench - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Clone
git clone https://github.com/themankindproject/imgfprint-rs
cd imgfprint-rs
# Run tests
cargo test
# Run with all features
cargo test --all-features
# Generate documentation
cargo doc --no-deps --openMIT License - See LICENSE file for details.