Open
Conversation
Contributor
|
Thanks for the PR. We are busy evaluating turboquant. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds initial TurboQuant (see #4990) support to Faiss and integrates it into the main codepaths needed for local evaluation.
Changes in this PR:
IndexTurboQuantMSEand the underlyingTurboQuantizerimplementationPreliminary Benchmarks
These are preliminary local results from a macOS CPU-only run.
Command:
python bench_quantizer.py glove 100x4 turboquant pq rq eval on glove 100x4 maxtrain=100000 No training set: training on database ===== PQ training time: 0.594 s encode time: 0.317 reconstruction error: 0.010 recall@1: 0.7036 recons_err_compat 0.100 code_size: 50 B/vector ===== RQ training time: 208.554 s max_beam_size=1 encode time: 2.977 reconstruction error: 0.027 recall@1: 0.6034 recons_err_compat 0.162 code_size: 50 B/vector max_beam_size=2 encode time: 5.774 reconstruction error: 0.023 recall@1: 0.6280 recons_err_compat 0.151 code_size: 50 B/vector max_beam_size=4 encode time: 12.271 reconstruction error: 0.020 recall@1: 0.6512 recons_err_compat 0.140 code_size: 50 B/vector max_beam_size=8 encode time: 25.617 reconstruction error: 0.018 recall@1: 0.6751 recons_err_compat 0.131 code_size: 50 B/vector max_beam_size=16 encode time: 50.257 reconstruction error: 0.016 recall@1: 0.6890 recons_err_compat 0.123 code_size: 50 B/vector max_beam_size=32 encode time: 105.272 reconstruction error: 0.014 recall@1: 0.7067 recons_err_compat 0.116 code_size: 50 B/vector ===== TurboQuant training time: 0.002 s encode time: 0.080 reconstruction error: 0.009 recall@1: 0.7189 recons_err_compat 0.095 code_size: 50 B/vectorInitial takeaway
On this macOS CPU-only benchmark, TurboQuant shows:
recall@1than PQ at the same code sizerecall@1than the best RQ setting tested hereTODO
add CUDA support(will be in another PR if this is accepted)