Skip to content

THeWakeSystems/RiverONE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RiverONE

Simulated Quantum Computing for VLM Compression
Quantum-inspired discretization Β· Entanglement-driven multiplexing Β· Variational recovery


πŸ“Š Pipeline Overview

Compression Pipeline

RiverONE treats VLM compression as a simulated quantum computing problem. A 4B-parameter multimodal model (8.9 GB) is compressed to 3.2 GB (2.8Γ—) through three quantum-inspired stages β€” without running on quantum hardware. Each stage maps to a core quantum computing primitive: state discretization, entanglement sharing, and variational optimization. Additionally, VQC ParamGen explores quantum circuit-based weight synthesis for neural network parameters.


βš›οΈ Quantum-Inspired Architecture

Stage 1 β€” AQLM: Quantum State Discretization

"Every weight lives in a 16-qubit Hilbert space."

Classical neural network weights exist in a continuous vector space β„α΅ˆ. AQLM discretizes this space into a finite quantum state basis, analogous to how a quantum system collapses into one of 2ⁿ measurement outcomes.

Quantum Concept AQLM Implementation
Qubit register (16 qubits) Codebook of size 2¹⁢ = 65,536 basis vectors
State vector Each weight group of 16 values encoded as one codebook index
Measurement Nearest-neighbor lookup: each group collapses to the closest codebook entry
Superposition Additive quantization reconstructs weights as linear combinations of basis states
State space 252 matrices Γ— thousands of groups = millions of "quantum measurements"

The 1Γ—16 scheme (out_group_size=1, in_group_size=16) means each group of 16 weights is represented by a single 16-bit index β€” exactly one measurement outcome from a 16-qubit system. The LLM's 7.9 GB of linear projections compress to 0.98 GB, an 8Γ— reduction β€” the same ratio as classical bits to qubits in certain encodings.

Key insight: The codebook is a learned quantum basis. K-Means initialization finds the natural clustering of weight patterns (the "energy eigenstates"), and Adam optimization refines them β€” exactly analogous to finding the optimal measurement basis for a quantum system.


Stage 2 β€” MiniViT: Entanglement-Driven Multiplexing

"Two layers, one state β€” entanglement without the hardware."

In quantum mechanics, entangled particles share a single quantum state regardless of distance. MiniViT applies this principle to vision transformers: adjacent blocks (23 and 24) are forced to share the same weight state, creating an entanglement-like coupling.

Quantum Concept MiniViT Implementation
Entanglement Blocks 23 and 24 share MSA + MLP weights (one state, two observers)
Unitary transform F1, F2 (16Γ—16 matrices) act as learned unitary rotations between the shared state and each block
Weak measurement Depthwise convolution (dwconv) applies a minimal perturbation to break symmetry
Decoherence protection LayerNorm and TransformNorm preserve independent phase information per block

The result: ~14M parameters replaced by ~12K β€” a compression ratio of >1000Γ— for the coupled blocks. The transform matrices (F1, F2) function as unitary gates rotating the shared representation into each block's local "measurement basis." Distillation from the original ViT acts as a quantum state tomography β€” reconstructing the optimal transform from the teacher's output distribution.

Key insight: This is entanglement simulation β€” two computational paths share one weight state, with minimal unitary corrections preserving their distinct behaviors. No quantum hardware required.


Stage 3 β€” PV-Tuning: Variational Quantum-Classical Optimization

"The VQE loop, but for neural network weights."

PV-Tuning mirrors the Variational Quantum Eigensolver (VQE) β€” the most successful hybrid quantum-classical algorithm. VQE alternates between a quantum measurement step and a classical parameter update. PV-Tuning does the same for compressed model weights.

Quantum Concept PV-Tuning Implementation
Variational ansatz Codebooks + scales = the parameterized quantum circuit
Expectation value Cross-entropy loss on QcalEval SFT (2,166 samples)
P-step (classical optimizer) AdamW updates continuous codebooks/scales β€” like updating rotation angles in a variational circuit
V-step (quantum measurement) Top-Ο„ subspace beam search updates discrete codes β€” like measuring qubits in the computational basis
Pauli grouping Subspace selection: only the top-Ο„ highest-gradient code groups are updated per step
Convergence guarantee Theorem 3.1: Ο†(xβ‚–β‚Šβ‚) ≀ Ο†(xβ‚–) β€” monotonic improvement, same as the variational principle

The subspace trick is the quantum magic: instead of updating all 1.5M+ code assignments simultaneously (exponentially expensive, like full state tomography), PV-Tuning selects only the top-Ο„ most "uncertain" groups (~0.1% per step). This is equivalent to measuring only the qubits with the largest gradient β€” a partial measurement that avoids disturbing the converged subspace.

Key insight: The P/V loop provably converges because each step is a projection onto a smaller feasible set β€” exactly the same mathematical structure as the quantum variational principle, where each measurement collapses the state toward the ground energy.


Stage 4 β€” VQC ParamGen: Quantum Circuit Parameter Generation

"Neural network weights as quantum measurement outcomes."

VQC ParamGen inverts the compression paradigm: instead of compressing existing weights, it generates MLP weight matrices directly from a Variational Quantum Circuit (VQC). Random classical features are amplitude-encoded into a quantum state, processed through variational layers with ring entanglement, then measured β€” and the measurement outcomes drive a HyperNetwork that synthesizes the target weight matrix via low-rank factorization.

Quantum Concept VQC ParamGen Implementation
Amplitude encoding Random features mapped to 2ⁿ-wires quantum state amplitudes
Variational ansatz RX, RY, RZ rotation gates + CNOT ring entanglement across layers
Depolarizing noise Probabilistic X/Y/Z gate injection simulates NISQ-era hardware
PauliZ measurement Expectation values form a classical feature vector for the HyperNetwork
Quantum-classical bridge HyperNetwork (MLP) maps n-wire measurements β†’ low-rank factors U, V
Matrix reconstruction W = U @ V^T, with rank r β‰ͺ min(out_dim, in_dim) avoiding ~5M direct outputs

The low-rank decomposition is the key enabler: a weight matrix of shape [4304, 1152] (~5M parameters) is generated from only r Γ— (out_dim + in_dim) HyperNetwork outputs. With rank=64, this reduces the generation head to ~350K parameters β€” a >14Γ— compression of the generation pathway itself.

VQCGeneratedLinear provides a drop-in nn.Linear replacement with internally learned context vectors and VQC-driven weight synthesis. This enables seamless integration into existing architectures β€” for example, replacing the second FFN layer in a Transformer encoder with VQC-generated weights, enabling end-to-end quantum-classical hybrid training.

Key insight: This is quantum-classical parameter synthesis β€” the VQC acts as a structured random projection with trainable unitary rotations, and the HyperNetwork learns to map these quantum features to the weight manifold. Unlike post-hoc compression, this approach generates weights that are born in a quantum-informed subspace.


πŸ”¬ Why Quantum-Inspired?

Traditional compression pipelines view quantization as an engineering tradeoff β€” sacrifice precision for size. The quantum perspective reveals a deeper structure:

Classical View Quantum View
Weights are real numbers Weights are quantum states in a discrete Hilbert space
Quantization is approximation error Quantization is measurement in a learned basis
Weight sharing is parameter reuse Weight sharing is entanglement between layers
Fine-tuning is gradient descent Fine-tuning is variational optimization with discrete measurements
Quality loss is inevitable Quality is recoverable through the variational principle

This perspective isn't just philosophical β€” it predicts that PV-Tuning should converge (Theorem 3.1), that 1Γ—16 is the natural "qubit encoding" for this architecture, and that entanglement-style sharing should preserve information better than independent compression.


πŸš€ Quick Start

Installation

git clone https://github.com/THeWakeSystems/RiverONE.git
cd RiverONE
pip install -r requirements.txt

Stage 1 β€” AQLM State Discretization

Discretize all 36 LLM layers into the 16-qubit codebook space:

cd quantize
pip install -r requirements.txt
python quantize.py          # ~2-3.5 hours on A100

πŸ“– Full quantization guide β†’

Stage 2 β€” MiniViT Entanglement

Entangle adjacent vision transformer blocks via weight multiplexing:

cd compress
python apply_minivit.py     # Entangle blocks 23β†’24
python distill_minivit.py   # State tomography (distillation)
python verify_minivit.py    # Verify entanglement integrity

πŸ“– Full compression guide β†’

Stage 3 β€” PV-Tuning Variational Recovery

Run the VQE-like P/V loop for accuracy recovery:

cd finetune
pip install -r requirements.txt
bash run_pv_tuning.sh

πŸ“– PV-Tuning guide β†’ | Technical paper β†’

Stage 4 β€” VQC Weight Generation

Run the VQC-based MLP weight parameter generation demo:

cd paramgen
pip install -r requirements.txt
python run_demo.py          # Pure VQC weight generation demo
python transformer_vqc.py   # VQC-Transformer end-to-end training

πŸ“– Model definitions in paramgen/models.py | Utilities in paramgen/utils.py | Transformer example in paramgen/transformer_vqc.py


πŸ“ Directory Structure

RiverONE/
β”œβ”€β”€ engine/           AQLM quantization core (quantum state engine)
β”‚   └── src/          aq, kmeans, beam_search, modelutils, ...
β”œβ”€β”€ quantize/         State discretization configs (25 scripts)
β”œβ”€β”€ compress/         Entanglement multiplexing: apply, distill, verify
β”œβ”€β”€ finetune/         Variational recovery: P/V optimization loop
β”œβ”€β”€ paramgen/         VQC parameter generation: models, utils, demo, transformer
β”œβ”€β”€ tools/            Utilities: dequantize, analyze, swap, eval runs
β”œβ”€β”€ docs/             Full documentation + technical paper
β”œβ”€β”€ weights/          Model weight outputs (gitignored)
β”œβ”€β”€ logs/             Archived run summaries
└── requirements.txt  Consolidated Python dependencies

πŸ“‹ Requirements

Component Version
Python 3.10+
PyTorch β‰₯2.1.0 (CUDA 12.1+)
Transformers β‰₯4.38.0
AQLM (PyPI) β‰₯1.1.0
GPU NVIDIA β‰₯24 GB VRAM (A100 recommended)
OS Linux (Ubuntu 20.04/22.04 tested)

πŸ“– References

Paper Venue Link
AQLM: Extreme Compression of LLMs via Additive Quantization ICML 2024 arXiv 2401.06118
MiniViT: Compressing Vision Transformers with Weight Multiplexing CVPR 2022 arXiv 2204.07154
PV-Tuning: Beyond Straight-Through Estimation NeurIPS 2024 arXiv 2405.14852
RiverONE: Generating Knowledge-Intensive VLM by Simulated Quantum Machines Submit to WAIC 2026 PDF

RiverONE-QC-4B-v1 Β· InternVL3.5-4B + Ising Vision Encoder Β· Built at THeWake Systems

About

We build RiverONE, a compact vision-language model for quantum calibration plot understanding,using simulated quantum computation. It employs a specialized visual encoder and an InternVL-based language backbone.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors