Description
When benchmarking zvec v0.2.1 with SQ8 quantization on the OpenAI Performance1536D50K dataset using VectorDBBench, I observed a significant recall drop to 0.7377, which is much lower than expected.
Command used:
vectordbbench zvec --path Performance1536D50K --db-label 16c64g-v0.1 \
--case-type Performance1536D50K --num-concurrency 1 \
--quantize-type int8 --m 15 --ef-search 180
After investigating, I found that the root cause is in src/core/quantizer/record_quantizer.h. Currently, the metadata fields sum (extras[2]) and squared_sum (extras[3]) are computed from the pre-rounded float values to preserve precision. However, on datasets with asymmetric value ranges — like OpenAI embeddings, where |x_min| ≈ 0.64 >> x_max ≈ 0.21, causing most values to be compressed into the right half of the quantization interval [−127, 127] — computing these metadata fields from the rounded int8 values instead actually yields significantly better recall.
By changing the computation to use rounded values — consistent with what is actually stored — recall improves from 0.7377 to 0.9588 on the same benchmark, with no regression observed on other datasets (Cohere 768D, BIOASQ 1024D).
A detailed mathematical analysis and experimental validation are included below. The document was collaboratively authored by me and Claude Code, and is written in Chinese.
sq8_recall_analysis.md
Steps to Reproduce
1. pip install zvec==v0.2.1
2. vectordbbench zvec --path Performance1536D50K --db-label 16c64g-v0.1 \
--case-type Performance1536D50K --num-concurrency 1 \
--quantize-type int8 --m 15 --ef-search 180
Logs / Stack Trace
Operating System
Ubuntu 22.04
Build & Runtime Environment
Python 3.11
Additional Context
Description
When benchmarking zvec v0.2.1 with SQ8 quantization on the OpenAI Performance1536D50K dataset using VectorDBBench, I observed a significant recall drop to 0.7377, which is much lower than expected.
Command used:
After investigating, I found that the root cause is in
src/core/quantizer/record_quantizer.h. Currently, the metadata fields sum (extras[2]) and squared_sum (extras[3]) are computed from the pre-rounded float values to preserve precision. However, on datasets with asymmetric value ranges — like OpenAI embeddings, where |x_min| ≈ 0.64 >> x_max ≈ 0.21, causing most values to be compressed into the right half of the quantization interval [−127, 127] — computing these metadata fields from the rounded int8 values instead actually yields significantly better recall.By changing the computation to use rounded values — consistent with what is actually stored — recall improves from 0.7377 to 0.9588 on the same benchmark, with no regression observed on other datasets (Cohere 768D, BIOASQ 1024D).
A detailed mathematical analysis and experimental validation are included below. The document was collaboratively authored by me and Claude Code, and is written in Chinese.
sq8_recall_analysis.md
Steps to Reproduce
Logs / Stack Trace
Operating System
Ubuntu 22.04
Build & Runtime Environment
Python 3.11
Additional Context
git status— no uncommitted submodule changesCMAKE_BUILD_TYPE=DebugCOVERAGE=ON