[Bug]: SQ8 quantization causes significant recall drop on OpenAI Performance1536D50K dataset

### Description

When benchmarking zvec v0.2.1 with SQ8 quantization on the OpenAI Performance1536D50K dataset using VectorDBBench, I observed a significant recall drop to 0.7377, which is much lower than expected.

Command used:

```
vectordbbench zvec --path Performance1536D50K --db-label 16c64g-v0.1 \
    --case-type Performance1536D50K --num-concurrency 1 \
    --quantize-type int8 --m 15 --ef-search 180
```

After investigating, I found that the root cause is in `src/core/quantizer/record_quantizer.h`. Currently, the metadata fields sum (extras[2]) and squared_sum (extras[3]) are computed from the pre-rounded float values to preserve precision. However, on datasets with asymmetric value ranges — like OpenAI embeddings, where |x_min| ≈ 0.64 >> x_max ≈ 0.21, causing most values to be compressed into the right half of the quantization interval [−127, 127] — computing these metadata fields from the rounded int8 values instead actually yields significantly better recall.

By changing the computation to use rounded values — consistent with what is actually stored — recall improves from 0.7377 to 0.9588 on the same benchmark, with no regression observed on other datasets (Cohere 768D, BIOASQ 1024D).

A detailed mathematical analysis and experimental validation are included below. The document was collaboratively authored by me and Claude Code, and is written in Chinese.

[sq8_recall_analysis.md](https://github.com/user-attachments/files/26567781/sq8_recall_analysis.md)

### Steps to Reproduce

```python
1. pip install zvec==v0.2.1
2. vectordbbench zvec --path Performance1536D50K --db-label 16c64g-v0.1 \
    --case-type Performance1536D50K --num-concurrency 1 \
    --quantize-type int8 --m 15 --ef-search 180
```

### Logs / Stack Trace

```shell

```

### Operating System

Ubuntu 22.04

### Build & Runtime Environment

Python 3.11

### Additional Context

- [ ] I've checked `git status` — no uncommitted submodule changes
- [ ] I built with `CMAKE_BUILD_TYPE=Debug`
- [ ] This occurs with or without `COVERAGE=ON`
- [ ] The issue involves Python ↔ C++ integration (pybind11)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: SQ8 quantization causes significant recall drop on OpenAI Performance1536D50K dataset #328

Description

Steps to Reproduce

Logs / Stack Trace

Operating System

Build & Runtime Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: SQ8 quantization causes significant recall drop on OpenAI Performance1536D50K dataset #328

Description

Description

Steps to Reproduce

Logs / Stack Trace

Operating System

Build & Runtime Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions