feat(ruvllm): TurboQuant KV cache & vector compression by ruvnet · Pull Request #297 · ruvnet/RuVector

ruvnet · 2026-03-25T12:14:49Z

Summary

Implement TurboQuant (ICLR 2026) data-oblivious KV cache and embedding compression for ruvLLM
Two-stage pipeline: PolarQuant (Hadamard rotation + scalar quantization) + QJL residual correction (1-bit)
Add TurboQuantKvCache three-tier cache (FP16 hot + TurboQuant ~3.5-bit cold) with auto-migration
Add TurboQuantEmbeddingStore for RuVector-compatible compressed vector search
Research document mapping TurboQuant to ruvLLM architecture with PiQ3 comparison

Key metrics

~6× memory reduction on cold KV cache tier
2.5/3.0/3.5/4.0 bit configurations with geometry-preserving compression
No training, no codebooks, no dataset-specific tuning
13 passing tests covering roundtrip, compression ratios, inner product preservation, batch ops, KV cache, eviction, and embedding search

Files changed

File	Change
`crates/ruvllm/src/quantize/turbo_quant.rs`	New: Core TurboQuant compressor, KV cache tier, embedding store
`crates/ruvllm/src/quantize/mod.rs`	Updated: Module declaration + public exports
`crates/ruvllm/src/kv_cache.rs`	Updated: `CacheTier::TurboQuant`, `TurboQuantKvCache` integration
`docs/research/quantization-edge/08-turboquant-kv-cache-compression.md`	New: Research document

Test plan

cargo build -p ruvllm --features quantize succeeds
cargo test -p ruvllm --features quantize -- turbo_quant — 13/13 tests pass
Verify compression ratio > 4× on real KV cache workloads
Benchmark attention speedup with TurboQuant cold tier vs Q4

https://claude.ai/code/session_011ogX2uc7Zf8d8aQ3UAbNcd

Implement data-oblivious KV cache and embedding compression based on TurboQuant (ICLR 2026). Two-stage pipeline: PolarQuant (Hadamard rotation + scalar quantization) + QJL residual correction (1-bit), achieving ~3.5 bits per value with geometry-preserving compression. New modules: - turbo_quant.rs: Core TurboQuantCompressor with compress/decompress, TurboQuantCacheTier for KV cache, TurboQuantEmbeddingStore for RuVector integration, asymmetric inner product for attention - TurboQuantKvCache: Three-tier cache (FP16 hot + TurboQuant cold) integrated into kv_cache.rs with auto-migration Key features: - 2.5/3.0/3.5/4.0 bit configurations with QJL residual toggle - ~6x memory reduction on cold tier, preserves inner product geometry - Bitstream packing handles non-byte-aligned bit widths - Embedding store with batch build, search, and nearest-neighbor - 13 passing tests covering roundtrip, compression, inner products, batch ops, KV cache tier, eviction, and embedding search https://claude.ai/code/session_011ogX2uc7Zf8d8aQ3UAbNcd

Comprehensive research document covering TurboQuant (ICLR 2026) and its mapping to ruvLLM. Covers algorithm details, performance results, integration architecture, PiQ3 comparison, risks/mitigations, and implementation summary. https://claude.ai/code/session_011ogX2uc7Zf8d8aQ3UAbNcd

Resolve Code Quality CI failure by applying cargo fmt. Co-Authored-By: claude-flow <ruv@ruv.net>

…benchmarks - Add rotated-domain inner product (skip inverse Hadamard via orthogonal invariance: <Hq,Hk> = <q,k>), ~2x faster for attention computation - Add batch-optimized variant that rotates query once across all keys - Add Criterion benchmark suite: compression, decompression, inner product, KV cache ops, embedding store, dimension scaling, memory efficiency - 5 new tests verifying optimized methods match original results - All 18 TurboQuant tests passing Co-Authored-By: claude-flow <ruv@ruv.net>

claude and others added 3 commits March 25, 2026 12:13

style(ruvllm): fix rustfmt formatting in turbo_quant and kv_cache

962672c

Resolve Code Quality CI failure by applying cargo fmt. Co-Authored-By: claude-flow <ruv@ruv.net>

ruvnet mentioned this pull request Mar 25, 2026

feat(ruvllm): TurboQuant KV Cache & Vector Compression — Full Implementation Plan #298

Open

35 tasks

ruvnet merged commit 7acf180 into main Mar 25, 2026

ruvnet mentioned this pull request Mar 25, 2026

Release v2.0.6 — TurboQuant, Brain Cognition, Resend Email, 30+ Crate Publishes #299

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ruvllm): TurboQuant KV cache & vector compression#297

feat(ruvllm): TurboQuant KV cache & vector compression#297
ruvnet merged 4 commits intomainfrom
claude/turboquant-kv-cache-P3oo2

ruvnet commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruvnet commented Mar 25, 2026

Summary

Key metrics

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants