Skip to content

Native INT8 (byte vector) HNSW build + search API #665

@lvca

Description

@lvca

Hi! Filing this from the ArcadeDB project, which uses JVector as the HNSW backend for its LSM_VECTOR index.

Background

We're adding pre-quantized int8 ingest to ArcadeDB (ArcadeData/arcadedb#4132) so callers using providers that emit int8 directly (Cohere embed-english-v3.0, OpenAI text-embedding-3-large reduced precision, Sentence Transformers with int8 quantization) can skip a precision-losing client-side int8 → float32 round-trip.

We dug into JVector 4.0.0-rc.8 to wire the path through and found the HNSW graph API operates on VectorFloat<?> end-to-end:

  • RandomAccessVectorValues.getVector(int ordinal) returns VectorFloat<?>.
  • GraphIndexBuilder constructors take RandomAccessVectorValues (float-only).
  • VectorSimilarityFunction.compare(VectorFloat<?>, VectorFloat<?>) - the abstract method signature.

So a caller with int8 input must dequantize to float32 for graph build and for every query, even when the application semantics are int8-throughout. ByteSequence<?> exists in the type system but is used only for PQ/BQ codes (sidecar against the float-vector graph), not as a primary HNSW vector type.

Ask

A native byte (int8) vector path:

  • A RandomAccessByteVectorValues (or a generalised RandomAccessVectorValues<T>).
  • VectorSimilarityFunction overloads / variants for byte[] (cosine + dot product on byte vectors with per-block min/max calibration; euclidean on bytes is also straightforward).
  • GraphIndexBuilder constructor(s) that accept the byte-vector RAVV + byte similarity function.
  • Search-side equivalent in GraphSearcher.

Why it matters

Modern embedding providers default to int8/binary outputs at scale - Cohere binary embeddings are 1/32× the size of float32, Cohere int8 is 1/4×. Forcing dequantize-on-build/search means:

  • Build cost: O(N * dim * 4) bytes of transient float32 even though the source is bytes.
  • Search cost: every query vector dequantises before comparison - JVector's SIMD intrinsics for the byte-similarity case never get exercised.
  • Storage cost: applications keep bytes in their primary store but JVector wants floats, so RAM and on-disk size grow 4× beyond what the application needs.

Lucene 9.x added VectorEncoding.BYTE for similar reasons; we'd love the same in JVector to close the precision/size loop.

ArcadeDB context

For reference, our matrix vs Qdrant / Milvus 2.5 (docs/arcadedb-vs-leading-vector-dbms.md) flags pre-quantized ingest as a P2 gap. We can ship an MVP that dequantizes int8 → float32 server-side (covered in ArcadeData/arcadedb#4132) and that closes the API ergonomics gap, but the full "end-to-end int8 with no float32 transient" win requires the JVector-side support.

Happy to contribute if there's a design direction the maintainers are considering. Otherwise, this is a tracking request.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions