A llama.cpp fork that replaces pseudorandom token sampling with quantum random numbers from the ANU QRNG API. The output is co-authored by quantum events at the moment of generation.
Standard LLM sampling uses deterministic pseudorandom number generators. Each token choice is predetermined by a seed value set before inference begins.
Quantum random numbers are generated by physical processes (photon detection, vacuum fluctuations) where outcomes remain undetermined until measurement. Under certain interpretations of quantum mechanics, consciousness may influence these collapse events. If true, quantum-sourced token selection creates a channel for such influence.
This project treats that hypothesis seriously enough to build proper infrastructure for testing it.
Raw QRNG output contains both quantum signal and classical noise (thermal effects, detector bias). Simple truncation or hashing destroys potential consciousness influence by making arbitrary outputs impossible to achieve through bit manipulation.
Our approach: fetch 20,480 bytes from ANU's hex16 endpoint, find the statistical mode (most frequent byte value), use that single value for sampling. This preserves the ability to "select" any output (0-255) while amplifying weak signals through statistical redundancy. Ties trigger a fresh API call.
Most tokens have low entropy: the model is confident. "The capital of France is Par..." deterministically continues with "is." QRNG sampling here adds latency without benefit.
Tokens with high entropy represent genuine uncertainty: creative junctions, ambiguous phrasings, branching possibilities. These are where consciousness influence would matter.
Implementation:
- Entropy < 0.50: greedy sampling, no API call
- Entropy >= 0.50: QRNG sampling with EDT temperature
This reduces API calls by 50-80% while focusing quantum randomness where it matters.
Generated tokens are color-coded based on the mode frequency detected in the QRNG data. Higher mode counts represent statistical anomalies that may correlate with consciousness influence:
| Color | Mode Count | Meaning |
|---|---|---|
| Grey | N/A | Deterministic (greedy, no QRNG) |
| White | < 106 | Statistically common |
| Pink | 106-108 | Above average frequency |
| Red | 109-111 | Rare |
| Purple | 112+ | Mythic rare |
The expected mode count is ~80 (20,480 bytes / 256 possible values). Values above 106 represent increasingly improbable statistical events.
Entropy-based Dynamic Temperature adjusts sampling temperature based on the model's uncertainty:
T = T0 * 0.8^(theta/entropy)
Higher entropy yields higher temperature (more exploration). Lower entropy yields lower temperature (more focus). Defaults: T0=2.0, theta=1.0, producing T=1.6 at maximum entropy.
Each token selection makes a fresh API call. Pre-generated entropy pools may have already "collapsed" before the user's intent is formed. The delay between logit computation and random value generation should be minimal.
This adds latency but ensures temporal correlation between user state and quantum measurement.
-
Get an ANU API key (paid, available via AWS Marketplace)
-
Clone and build:
git clone --recurse-submodules https://github.com/alchemystack/quantum-llama.cpp.git
cd quantum-llama.cpp
cmake -B build -DLLAMA_CURL=OFF
cmake --build build --config Release- Set your API key and run:
export ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" -n 128 -no-cnv| Argument | Description | Default |
|---|---|---|
--quantum-verbose |
Print entropy and temperature per token | off |
--quantum-statistics |
Print sampling statistics at end | off |
--quantum-entropy-threshold N |
Entropy cutoff for QRNG vs greedy | 0.50 |
--quantum-edt-t0 N |
EDT upper temperature bound | 2.0 |
--quantum-edt-theta N |
EDT entropy sensitivity | 1.0 |
--no-quantum-adaptive-sampling |
Always use QRNG | - |
--no-quantum-edt |
Fixed temperature instead of EDT | - |
- Requires paid ANU API key
- One API call per high-entropy token adds ~100-500ms latency
- No support for
-DLLAMA_CURL=ON - Consciousness influence on quantum events remains an open question in physics
This fork tracks llama.cpp. See upstream documentation for model support, backends, quantization, and general usage.
MIT (same as llama.cpp)