This repository contains implementations and experimental code for efficient Random-Access Ranked Retrieval (RAR) and Random-Access Similarity Search (RAS) algorithms.
Requires Python 3.9+.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Install the project as an editable package so `ranges`, `methods`,
# and `knn` import from anywhere (recommended)
pip install -e .Optional extras:
pip install -e ".[profiling]" # psutil, pympler, miniball — memory-usage experiments
pip install -e ".[test]" # pytest — test suiteContains implementations of RAR algorithms:
EpsRange- Epsilon-based range approachEpsHier- Epsilon-based hierarchical methodKthLevel- Level-based k-th element algorithm- Baselines:
TA(Threshold Algorithm) andFagin
Contains implementations of range search algorithms:
Hierarchical Sampling- Novel hierarchical sampling approach- Baseline algorithms:
Partition Tree,KD-tree,R-tree
Contains the implementations of the RAS problem on Euclidean distance and cosine similarity.
EpsHier- Our algorithm for random-access similarity searchFaiss- Common ANN indexes as baselines for random-access similarity search
Code to reproduce results for Random-Access Ranked Retrieval experiments.
Code to reproduce results for Random-Access Similarity Search experiments.
RAS experiments use standard ANN-Benchmarks datasets
(e.g. glove-100-angular, sift-128-euclidean, fashion-mnist-784-euclidean).
They are downloaded automatically on first use into knn/datasets/data/
(git-ignored) by knn/datasets/data_loader.py. Some RAR experiments use the
US Flights dataset, which is fetched via kagglehub.
The experiment files use # %%
cell markers, so they can be run interactively (VS Code Jupyter / Jupyter) or
as plain scripts. Run them from the repository root, for example:
# Random-Access Ranked Retrieval
python experiments/ith_problem.py # k-th element problem
python experiments/srs_problem.py # simple range search
python experiments/us_flights.py # US Flights dataset
python experiments/memory_usage.py # memory profiling (needs [profiling] extra)
# Random-Access Similarity Search
python knn/experiments/time_recall_benchmark.py # time vs. recall
python knn/experiments/space_time_benchmark.py # space vs. timepytest knn/testsIf you use this code, please cite our paper (see CITATION.cff).
Released under the MIT License.