A llama.cpp fork that replaces pseudorandom token sampling with quantum random numbers from the ANU QRNG API or Qbert QRNG API (by Cipherstone). The output is co-authored by quantum events at the moment of generation.
Standard LLM sampling uses deterministic pseudorandom number generators. Each token choice is predetermined by a seed value set before inference begins.
Quantum random numbers are generated by physical processes (photon detection, vacuum fluctuations) where outcomes remain undetermined until measurement. Under certain interpretations of quantum mechanics, consciousness may influence these collapse events. If true, quantum-sourced token selection creates a channel for such influence.
This project treats that hypothesis seriously enough to build proper infrastructure for testing it.
Raw QRNG output contains both quantum signal and classical noise (thermal effects, detector bias). Simple truncation or hashing destroys potential consciousness influence by making arbitrary outputs impossible to achieve through bit manipulation.
Our approach: fetch 20,480 bytes from the QRNG provider's hex16 endpoint, compute the sample mean, convert to a z-score against the known population distribution (μ=127.5, σ_m=0.51433), and map through the standard normal CDF to produce a uniform float in [0, 1). This leverages the Central Limit Theorem: even a sub-0.2% per-sample bias produces a detectable shift in the aggregate mean, which the z-score → CDF pipeline converts into a meaningful change in token selection probability.
Token selection uses a probability-ordered descending CDF: tokens are sorted from most probable to least probable, so higher values of the uniform float select increasingly surprising tokens. This gives the consciousness influence lever a coherent direction.
Most tokens have low entropy: the model is confident. "The capital of France is Par..." deterministically continues with "is." QRNG sampling here adds latency without benefit.
Tokens with high entropy represent genuine uncertainty: creative junctions, ambiguous phrasings, branching possibilities. These are where consciousness influence would matter.
Implementation:
- Entropy < 0.50: greedy sampling, no API call
- Entropy >= 0.50: QRNG sampling with EDT temperature
This reduces API calls by 50-80% while focusing quantum randomness where it matters.
Generated tokens are color-coded based on the z-score magnitude from the QRNG data. The z-score measures how far the sample mean deviates from the expected population mean in units of standard error. Larger deviations represent increasingly improbable statistical events that may correlate with consciousness influence. Bluer colors indicate a shift toward high-probability tokens; redder colors indicate a shift toward low-probability tokens:
| Color | Z-Score Range | Meaning |
|---|---|---|
| Grey | N/A | Deterministic (greedy, no QRNG) |
| White | |z| < 1.0 | Near expected mean |
| Light Blue | z ∈ [-2, -1) | Mild high-probability shift |
| Blue (vivid) | z < -2 | Strong high-probability shift |
| Pink | z ∈ (1, 2] | Mild low-probability shift |
| Red | z > 2 | Strong low-probability shift |
Entropy-based Dynamic Temperature adjusts sampling temperature based on the model's uncertainty:
T = T0 * 0.8^(theta/entropy)
Higher entropy yields higher temperature (more exploration). Lower entropy yields lower temperature (more focus). Defaults: T0=2.0, theta=1.0, producing T=1.6 at maximum entropy.
Each token selection makes a fresh API call. Pre-generated entropy pools may have already "collapsed" before the user's intent is formed. The delay between logit computation and random value generation should be minimal.
This adds latency but ensures temporal correlation between user state and quantum measurement.
Two QRNG providers are supported: ANU (default) and Qbert (invite-only, by Cipherstone). Select with --qrng-api.
-
Get an API key:
- ANU: Available via AWS Marketplace
- Qbert: Invite-only. Contact the Cipherstone administrator to request access.
-
Clone and build:
git clone --recurse-submodules https://github.com/alchemystack/quantum-llama.cpp.git
cd quantum-llama.cpp
cmake -B build -DLLAMA_CURL=OFF
cmake --build build --config Release- Set your API key and run: Widows
# ANU (default)
set ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt"
# Qbert
set QBERT_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" --qrng-api qbertLinux/Mac
# ANU (default)
export ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt"
# Qbert
export QBERT_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" --qrng-api qbert| Argument | Description | Default |
|---|---|---|
--qrng-api {anu,qbert} |
Select QRNG API provider | anu |
--quantum-verbose |
Print entropy and temperature per token | off |
--quantum-statistics |
Print sampling statistics at end | off |
--quantum-entropy-threshold N |
Entropy cutoff for QRNG vs greedy | 0.50 |
--quantum-edt-t0 N |
EDT upper temperature bound | 2.0 |
--quantum-edt-theta N |
EDT entropy sensitivity | 1.0 |
--no-quantum-adaptive-sampling |
Always use QRNG | - |
--no-quantum-edt |
Fixed temperature instead of EDT | - |
- Requires an API key (ANU or Qbert)
- One API call per high-entropy token adds ~100-500ms latency
- No support for
-DLLAMA_CURL=ON - Consciousness influence on quantum events remains an open question in physics
This fork tracks llama.cpp. See upstream documentation for model support, backends, quantization, and general usage.
MIT (same as llama.cpp)