Simulated Quantum Computing for VLM Compression
Quantum-inspired discretization Β· Entanglement-driven multiplexing Β· Variational recovery
RiverONE treats VLM compression as a simulated quantum computing problem. A 4B-parameter multimodal model (8.9 GB) is compressed to 3.2 GB (2.8Γ) through three quantum-inspired stages β without running on quantum hardware. Each stage maps to a core quantum computing primitive: state discretization, entanglement sharing, and variational optimization. Additionally, VQC ParamGen explores quantum circuit-based weight synthesis for neural network parameters.
"Every weight lives in a 16-qubit Hilbert space."
Classical neural network weights exist in a continuous vector space βα΅. AQLM discretizes this space into a finite quantum state basis, analogous to how a quantum system collapses into one of 2βΏ measurement outcomes.
| Quantum Concept | AQLM Implementation |
|---|---|
| Qubit register (16 qubits) | Codebook of size 2ΒΉβΆ = 65,536 basis vectors |
| State vector | Each weight group of 16 values encoded as one codebook index |
| Measurement | Nearest-neighbor lookup: each group collapses to the closest codebook entry |
| Superposition | Additive quantization reconstructs weights as linear combinations of basis states |
| State space | 252 matrices Γ thousands of groups = millions of "quantum measurements" |
The 1Γ16 scheme (out_group_size=1, in_group_size=16) means each group of 16 weights is represented by a single 16-bit index β exactly one measurement outcome from a 16-qubit system. The LLM's 7.9 GB of linear projections compress to 0.98 GB, an 8Γ reduction β the same ratio as classical bits to qubits in certain encodings.
Key insight: The codebook is a learned quantum basis. K-Means initialization finds the natural clustering of weight patterns (the "energy eigenstates"), and Adam optimization refines them β exactly analogous to finding the optimal measurement basis for a quantum system.
"Two layers, one state β entanglement without the hardware."
In quantum mechanics, entangled particles share a single quantum state regardless of distance. MiniViT applies this principle to vision transformers: adjacent blocks (23 and 24) are forced to share the same weight state, creating an entanglement-like coupling.
| Quantum Concept | MiniViT Implementation |
|---|---|
| Entanglement | Blocks 23 and 24 share MSA + MLP weights (one state, two observers) |
| Unitary transform | F1, F2 (16Γ16 matrices) act as learned unitary rotations between the shared state and each block |
| Weak measurement | Depthwise convolution (dwconv) applies a minimal perturbation to break symmetry |
| Decoherence protection | LayerNorm and TransformNorm preserve independent phase information per block |
The result: ~14M parameters replaced by ~12K β a compression ratio of >1000Γ for the coupled blocks. The transform matrices (F1, F2) function as unitary gates rotating the shared representation into each block's local "measurement basis." Distillation from the original ViT acts as a quantum state tomography β reconstructing the optimal transform from the teacher's output distribution.
Key insight: This is entanglement simulation β two computational paths share one weight state, with minimal unitary corrections preserving their distinct behaviors. No quantum hardware required.
"The VQE loop, but for neural network weights."
PV-Tuning mirrors the Variational Quantum Eigensolver (VQE) β the most successful hybrid quantum-classical algorithm. VQE alternates between a quantum measurement step and a classical parameter update. PV-Tuning does the same for compressed model weights.
| Quantum Concept | PV-Tuning Implementation |
|---|---|
| Variational ansatz | Codebooks + scales = the parameterized quantum circuit |
| Expectation value | Cross-entropy loss on QcalEval SFT (2,166 samples) |
| P-step (classical optimizer) | AdamW updates continuous codebooks/scales β like updating rotation angles in a variational circuit |
| V-step (quantum measurement) | Top-Ο subspace beam search updates discrete codes β like measuring qubits in the computational basis |
| Pauli grouping | Subspace selection: only the top-Ο highest-gradient code groups are updated per step |
| Convergence guarantee | Theorem 3.1: Ο(xβββ) β€ Ο(xβ) β monotonic improvement, same as the variational principle |
The subspace trick is the quantum magic: instead of updating all 1.5M+ code assignments simultaneously (exponentially expensive, like full state tomography), PV-Tuning selects only the top-Ο most "uncertain" groups (~0.1% per step). This is equivalent to measuring only the qubits with the largest gradient β a partial measurement that avoids disturbing the converged subspace.
Key insight: The P/V loop provably converges because each step is a projection onto a smaller feasible set β exactly the same mathematical structure as the quantum variational principle, where each measurement collapses the state toward the ground energy.
"Neural network weights as quantum measurement outcomes."
VQC ParamGen inverts the compression paradigm: instead of compressing existing weights, it generates MLP weight matrices directly from a Variational Quantum Circuit (VQC). Random classical features are amplitude-encoded into a quantum state, processed through variational layers with ring entanglement, then measured β and the measurement outcomes drive a HyperNetwork that synthesizes the target weight matrix via low-rank factorization.
| Quantum Concept | VQC ParamGen Implementation |
|---|---|
| Amplitude encoding | Random features mapped to 2βΏ-wires quantum state amplitudes |
| Variational ansatz | RX, RY, RZ rotation gates + CNOT ring entanglement across layers |
| Depolarizing noise | Probabilistic X/Y/Z gate injection simulates NISQ-era hardware |
| PauliZ measurement | Expectation values form a classical feature vector for the HyperNetwork |
| Quantum-classical bridge | HyperNetwork (MLP) maps n-wire measurements β low-rank factors U, V |
| Matrix reconstruction | W = U @ V^T, with rank r βͺ min(out_dim, in_dim) avoiding ~5M direct outputs |
The low-rank decomposition is the key enabler: a weight matrix of shape [4304, 1152] (~5M parameters) is generated from only r Γ (out_dim + in_dim) HyperNetwork outputs. With rank=64, this reduces the generation head to ~350K parameters β a >14Γ compression of the generation pathway itself.
VQCGeneratedLinear provides a drop-in nn.Linear replacement with internally learned context vectors and VQC-driven weight synthesis. This enables seamless integration into existing architectures β for example, replacing the second FFN layer in a Transformer encoder with VQC-generated weights, enabling end-to-end quantum-classical hybrid training.
Key insight: This is quantum-classical parameter synthesis β the VQC acts as a structured random projection with trainable unitary rotations, and the HyperNetwork learns to map these quantum features to the weight manifold. Unlike post-hoc compression, this approach generates weights that are born in a quantum-informed subspace.
Traditional compression pipelines view quantization as an engineering tradeoff β sacrifice precision for size. The quantum perspective reveals a deeper structure:
| Classical View | Quantum View |
|---|---|
| Weights are real numbers | Weights are quantum states in a discrete Hilbert space |
| Quantization is approximation error | Quantization is measurement in a learned basis |
| Weight sharing is parameter reuse | Weight sharing is entanglement between layers |
| Fine-tuning is gradient descent | Fine-tuning is variational optimization with discrete measurements |
| Quality loss is inevitable | Quality is recoverable through the variational principle |
This perspective isn't just philosophical β it predicts that PV-Tuning should converge (Theorem 3.1), that 1Γ16 is the natural "qubit encoding" for this architecture, and that entanglement-style sharing should preserve information better than independent compression.
git clone https://github.com/THeWakeSystems/RiverONE.git
cd RiverONE
pip install -r requirements.txtDiscretize all 36 LLM layers into the 16-qubit codebook space:
cd quantize
pip install -r requirements.txt
python quantize.py # ~2-3.5 hours on A100Entangle adjacent vision transformer blocks via weight multiplexing:
cd compress
python apply_minivit.py # Entangle blocks 23β24
python distill_minivit.py # State tomography (distillation)
python verify_minivit.py # Verify entanglement integrityRun the VQE-like P/V loop for accuracy recovery:
cd finetune
pip install -r requirements.txt
bash run_pv_tuning.shRun the VQC-based MLP weight parameter generation demo:
cd paramgen
pip install -r requirements.txt
python run_demo.py # Pure VQC weight generation demo
python transformer_vqc.py # VQC-Transformer end-to-end trainingπ Model definitions in
paramgen/models.py| Utilities inparamgen/utils.py| Transformer example inparamgen/transformer_vqc.py
RiverONE/
βββ engine/ AQLM quantization core (quantum state engine)
β βββ src/ aq, kmeans, beam_search, modelutils, ...
βββ quantize/ State discretization configs (25 scripts)
βββ compress/ Entanglement multiplexing: apply, distill, verify
βββ finetune/ Variational recovery: P/V optimization loop
βββ paramgen/ VQC parameter generation: models, utils, demo, transformer
βββ tools/ Utilities: dequantize, analyze, swap, eval runs
βββ docs/ Full documentation + technical paper
βββ weights/ Model weight outputs (gitignored)
βββ logs/ Archived run summaries
βββ requirements.txt Consolidated Python dependencies
| Component | Version |
|---|---|
| Python | 3.10+ |
| PyTorch | β₯2.1.0 (CUDA 12.1+) |
| Transformers | β₯4.38.0 |
| AQLM (PyPI) | β₯1.1.0 |
| GPU | NVIDIA β₯24 GB VRAM (A100 recommended) |
| OS | Linux (Ubuntu 20.04/22.04 tested) |
| Paper | Venue | Link |
|---|---|---|
| AQLM: Extreme Compression of LLMs via Additive Quantization | ICML 2024 | arXiv 2401.06118 |
| MiniViT: Compressing Vision Transformers with Weight Multiplexing | CVPR 2022 | arXiv 2204.07154 |
| PV-Tuning: Beyond Straight-Through Estimation | NeurIPS 2024 | arXiv 2405.14852 |
| RiverONE: Generating Knowledge-Intensive VLM by Simulated Quantum Machines | Submit to WAIC 2026 |
RiverONE-QC-4B-v1 Β· InternVL3.5-4B + Ising Vision Encoder Β· Built at THeWake Systems

