Skip to content

Add TurboQuant (CPU)#5049

Open
Mistobaan wants to merge 4 commits intofacebookresearch:mainfrom
Mistobaan:turboquant
Open

Add TurboQuant (CPU)#5049
Mistobaan wants to merge 4 commits intofacebookresearch:mainfrom
Mistobaan:turboquant

Conversation

@Mistobaan
Copy link
Copy Markdown
Contributor

@Mistobaan Mistobaan commented Apr 7, 2026

Summary

This PR adds initial TurboQuant (see #4990) support to Faiss and integrates it into the main codepaths needed for local evaluation.

Changes in this PR:

  • add IndexTurboQuantMSE and the underlying TurboQuantizer implementation
  • add cloning and serialization support for TurboQuant indexes
  • expose TurboQuant through the Python SWIG bindings
  • extend quantizer benchmarking to cover TurboQuant and additional datasets
  • add unit tests for TurboQuant reconstruction behavior

Preliminary Benchmarks

These are preliminary local results from a macOS CPU-only run.

Command:

python bench_quantizer.py glove 100x4 turboquant pq rq
eval on glove 100x4 maxtrain=100000
No training set: training on database
===== PQ
        training time: 0.594 s
        encode time: 0.317 reconstruction error: 0.010 recall@1: 0.7036 recons_err_compat 0.100 code_size: 50 B/vector
===== RQ
        training time: 208.554 s
max_beam_size=1
        encode time: 2.977 reconstruction error: 0.027 recall@1: 0.6034 recons_err_compat 0.162 code_size: 50 B/vector
max_beam_size=2
        encode time: 5.774 reconstruction error: 0.023 recall@1: 0.6280 recons_err_compat 0.151 code_size: 50 B/vector
max_beam_size=4
        encode time: 12.271 reconstruction error: 0.020 recall@1: 0.6512 recons_err_compat 0.140 code_size: 50 B/vector
max_beam_size=8
        encode time: 25.617 reconstruction error: 0.018 recall@1: 0.6751 recons_err_compat 0.131 code_size: 50 B/vector
max_beam_size=16
        encode time: 50.257 reconstruction error: 0.016 recall@1: 0.6890 recons_err_compat 0.123 code_size: 50 B/vector
max_beam_size=32
        encode time: 105.272 reconstruction error: 0.014 recall@1: 0.7067 recons_err_compat 0.116 code_size: 50 B/vector
===== TurboQuant
        training time: 0.002 s
        encode time: 0.080 reconstruction error: 0.009 recall@1: 0.7189 recons_err_compat 0.095 code_size: 50 B/vector

Initial takeaway

On this macOS CPU-only benchmark, TurboQuant shows:

  • better recall@1 than PQ at the same code size
  • lower reconstruction error than PQ
  • higher recall@1 than the best RQ setting tested here
  • much lower training and encoding cost than RQ

TODO

  • add JQL support (only MSE at the moment)
  • reproduce the paper benchmarks with same datasets (need a beefy machine but script is ready)
  • add CUDA support (will be in another PR if this is accepted)

@meta-cla meta-cla bot added the CLA Signed label Apr 7, 2026
@Mistobaan Mistobaan marked this pull request as draft April 7, 2026 08:32
@Mistobaan Mistobaan changed the title Add TurboQuant index and benchmark support Add TurboQuant (WIP) Apr 7, 2026
@Mistobaan Mistobaan marked this pull request as ready for review April 8, 2026 07:13
@Mistobaan Mistobaan changed the title Add TurboQuant (WIP) Add TurboQuant (CPU) Apr 8, 2026
@mdouze
Copy link
Copy Markdown
Contributor

mdouze commented Apr 9, 2026

Thanks for the PR. We are busy evaluating turboquant.
Would you mind implementing it as a ScalarQuantizer type (QT_xxx) instead of creating a new Quantizer class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants