Implement Panorama into IndexIVFPQPanorama#4970
Implement Panorama into IndexIVFPQPanorama#4970AlSchlo wants to merge 45 commits intofacebookresearch:mainfrom
IndexIVFPQPanorama#4970Conversation
IVFPQPanoramaIndexIVFPQPanorama
This reverts commit cc59df9.
solve lint error.
|
@mnorris11 Yes please, it should be ready :) |
There was a problem hiding this comment.
At a high level, could you explain what is the level based data layout of the PQ panorama storage?
There is already a block oriented format used in FastScan indices that stores data per columns, see
Line 20 in e6f5c0c
faiss/faiss/impl/fast_scan/fast_scan.h
Line 105 in e6f5c0c
The difference here seems to be that Panorama does not have a fixed block size.
In any case, it would be better to avoid integrating Panorama adaptations into the InvertedLists object. Please inherit from it (like BlockInvertedLists does).
|
Thanks @mdouze The storage layout is as follows:
Agreed, this seems to be duplicated, we will figure out a way to use the existing class. It should not be too difficult. |
This PR implements Panorama on IVFPQ achieving up to 18x speedups at high recall.
We observe that the speedup is roughly 2× larger than the pruning ratio. This is due to two main factors: (1) vertical LUT lookups, which are faster because they avoid horizontal additions across SIMD lanes, and (2) improved LUT locality. Specifically, we keep a single level of the LUT resident in cache while processing an entire batch, analogous to loop tiling in matrix computations.
Unlike flat indexes, we rely on vertical kernels to keep SIMD lanes fully utilized. This requires a vertical data layout within each batch. We provide kernels for both AVX512 and AVX2. When available, we enable the
mbmi2flag to leverage the PEXT instruction, which compresses the current batch prior to filtering.Because PCA degrades PQ performance, we instead redistribute excess energy across dimensions using localized random projections. Additionally, we apply a random projection within each level to better equalize energy across dimensions.
The additional memory overhead consists of
nlevels + 1floats per point. This is acceptable at 4× compression, but becomes more noticeable at higher compression rates. Scalar quantization of these coefficients appears to be a reasonable approach to reduce this metadata footprint by approximately 4×.Cosine similarity (and other metrics) is deferred to a future PR.
Many thanks to @aknayar for the help on this PR.
cc: @alexanderguzhva @mdouze @mnorris11