TSDG is an index graph for graph-based approximate nearest neighbor search, which is fast to build, efficient to search, and GPU friendly.
$ cd tsdg
$ mkdir build && cd build
$ cmake ..
$ make -jThe demos for building index and searching are under the directory examples/
Firstly, we need to prepare a SC-graph.
The SC-graph need to be built in advance, we constructed SC-graph by modified NN-Descent in nndescent.hpp
You can use our example code to build the SC-graph as follows:
$ build/examples/scg_build DATA_PATH K SAMPLE_NUM ITERATION_NUM OUTPUT_PATHDATA_PATHis the path of the base data infvecsformat.Kis the number of edges of the SC-Graph.SAMPLE_NUMcontrols the number of samples in each NN-Descent iteration.ITERATION_NUMcontrols the number of NN-Descent iterations.OUTPUT_PATHis the path of the generated SC-graph.
An example:
$ build/examples/scg_build \
./data/sift1m/sift1m_base.fvecs \
200 16 10\
./data/sift1m/index/sift1m_scg.ivecsThe following are the parameters used for building the SC-graphs on the CPU in our paper.
| Dataset | K |
SAMPLE_NUM |
ITERATION_NUM |
|---|---|---|---|
| SIFT1M | 200 | 16 | 10 |
| DEEP1M | 200 | 16 | 10 |
| T2I1M | 400 | 16 | 10 |
| SPACEV1M | 200 | 20 | 12 |
| GIST1M | 400 | 16 | 12 |
| Turing1M | 200 | 20 | 15 |
Note: The above parameter settings consider the balance of building efficiency and search efficiency, appropriately increasing SAMPLE_NUM and ITERATION_NUM may improve the search efficiency.
The above datasets can be downloaded from http://corpus-texmex.irisa.fr/ and https://big-ann-benchmarks.com/
Secondly, we will convert the SC-graph to the TSDG index.
You can use our example code to achieve this conversion as follows:
$ build/examples/tsdg_build DATA_PATH SCG_PATH RELAXED_FACTOR OCCL_THRESHOLD MAX_EDGES OUTPUT_PATHDATA_PATHis the path of the base data infvecsformat.SCG_PATHis the path of the pre-built SC-graph inivecsformat.RELAXED_FACTORused for first-stage graph diversification, controls the average degree of index graph.OCCL_THRESHOLDused for second-stage graph diversification, controls the average degree of index graph.MAX_EDGESused for limiting the degree of index graph.OUTPUT_PATHis the path of the generated TSDG index.
An example:
$ build/examples/tsdg_build \
./data/sift1m/sift1m_base.fvecs \
./data/sift1m/sift1m_scg.ivecs \
1.2 4 60 \
./data/sift1m/index/sift1m.tsdgThe parameters RELAXED_FACTOR in the source code corresponds to the parameter
The parameters OCCL_THRESHOLD in the source code corresponds to the parameter
The following are the parameters used for building and searching experiments on the CPU.
| Dataset | RELAXED_FACTOR |
OCCL_THRESHOLD |
MAX_EDGES |
|---|---|---|---|
| SIFT1M | 1.2 | 4 | 60 |
| DEEP1M | 1.2 | 4 | 60 |
| T2I1M | 1.2 | 4 | 60 |
| SPACEV1M | 1.2 | 4 | 80 |
| GIST1M | 1.1 | 8 | 80 |
| Turing1M | 1.2 | 4 | 80 |
Note: The above parameter settings consider the efficiency of graph construction and the consistency of parameters. The search performance is better when the parameter RELAXED_FACTOR of T2I1M and SPACEV1M is 1.3, but it will increase the time of graph construction.
The index graph building time (s) is shown below (including the SC-graph and k-NN graph building time):
| Dataset | TSDG⁺ |
NSG |
SSG |
DPG |
HNSW |
|---|---|---|---|---|---|
| SIFT1M | 307.5 | 492.8 | 359.8 | 462.8 | 364.0 |
| DEEP1M | 282.1 | 449.5 | 320.4 | 400.9 | 326.1 |
| T2I1M | 595.1 | 1472.4 | 913.9 | 1888.9 | 937.2 |
| SPACEV1M | 539.9 | 657.9 | 676.8 | 659.5 | 615.4 |
| GIST1M | 1363.7 | 2794.6 | 1728.0 | 6138.7 | 1595.9 |
| Turing1M | 726.5 | 1267.8 | 1052.9 | 827.6 | 792.7 |
You can use our demo code to search on graph as follows:
build/examples/tsdg_search INDEX_PATH QUERY_PATH GROUND_TRUTH_PATHAn example:
$ build/examples/tsdg_search \
./data/sift1m/index/sift1m.tsdg \
./data/sift1m/sift1m_query.fvecs \
./data/sift1m/sift1m_gt.ivecsThe performance of the search on the graph is shown below:
Due to licensing issues, this library is not able to provide the code on the GPU, the GPU related code will be open source in the future.
