MatchPerf: Benchmarking 1:1 Inner Hash Joins

Read the full write-up and benchmark analysis: A Cache-Conscious Hash Join for 19× Faster Binary Descriptor Matching

Benchmarks 1:1 inner hash joins between two multisets A and B. Unlike standard hash joins that compute cartesian products on duplicate keys, this implementation excludes keys that have duplicates in either multiset. Only pairs with equal keys that have cardinality 1 in each multiset are emitted.

This problem appears e.g. in the hot loop of feature matching and pose estimation of computer vision pipelines, where the goal is to find unique correspondences between two sets of (binary) features/embeddings. While this problem can also be solved with ANN, the exclusive match criterion is interesting because it allows to both find a matching and exclude weak matches through the exclusion of duplicates in O(N).

In such feature matching contexts, the hash table is only retained to match one image's features to as little as a single other image's feature. As a result, table construction time is equally important as efficient lookups.

The benchmarks in this repo compare optimized implementations against unordered_map baselines as well as against more specialized hashmap implementations from Google's abseil(absl) and Martin Ankerl's ankrl.

Note: All of the optimized implementations only leverage cache effects and ILP. In particular, no explicit SIMD or multi-threading is used.

Build, test, benchmark

Unzip the two dataset files: gunzip {statesSrcLarge,statesTarLarge}.txt.gz.

# Build
mkdir build
cd build
cmake ..
make 

# Test
ctest

# Benchmark
./run_and_plot_benchmark

Benchmarking other implementations

To add and compare new implementations:

add implementation in new .cpp and .hpp to src
include and register the new implementation in registry.cpp/.hpp
add the new .cpp to the CMakeLists.txt and build
Tests and benchmarks will automatically include the new implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
test		test
.gitignore		.gitignore
.python-version		.python-version
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
m3_plot.svg		m3_plot.svg
plot_benchmarks.py		plot_benchmarks.py
pyproject.toml		pyproject.toml
run_and_plot_benchmark		run_and_plot_benchmark
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatchPerf: Benchmarking 1:1 Inner Hash Joins

Build, test, benchmark

Benchmarking other implementations

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MatchPerf: Benchmarking 1:1 Inner Hash Joins

Build, test, benchmark

Benchmarking other implementations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages