quantum-llama.cpp

A llama.cpp fork that replaces pseudorandom token sampling with quantum random numbers from the ANU QRNG API or Qbert QRNG API (by Cipherstone). The output is co-authored by quantum events at the moment of generation.

Why Quantum Randomness?

Standard LLM sampling uses deterministic pseudorandom number generators. Each token choice is predetermined by a seed value set before inference begins.

Quantum random numbers are generated by physical processes (photon detection, vacuum fluctuations) where outcomes remain undetermined until measurement. Under certain interpretations of quantum mechanics, consciousness may influence these collapse events. If true, quantum-sourced token selection creates a channel for such influence.

This project treats that hypothesis seriously enough to build proper infrastructure for testing it.

Technical Approach

Z-Score Signal Amplification

Raw QRNG output contains both quantum signal and classical noise (thermal effects, detector bias). Simple truncation or hashing destroys potential consciousness influence by making arbitrary outputs impossible to achieve through bit manipulation.

Our approach: fetch 20,480 bytes from the QRNG provider's hex16 endpoint, compute the sample mean, convert to a z-score against the known population distribution (μ=127.5, σ_m=0.51433), and map through the standard normal CDF to produce a uniform float in [0, 1). This leverages the Central Limit Theorem: even a sub-0.2% per-sample bias produces a detectable shift in the aggregate mean, which the z-score → CDF pipeline converts into a meaningful change in token selection probability.

Token selection uses a probability-ordered descending CDF: tokens are sorted from most probable to least probable, so higher values of the uniform float select increasingly surprising tokens. This gives the consciousness influence lever a coherent direction.

Adaptive Entropy-Based Sampling

Most tokens have low entropy: the model is confident. "The capital of France is Par..." deterministically continues with "is." QRNG sampling here adds latency without benefit.

Tokens with high entropy represent genuine uncertainty: creative junctions, ambiguous phrasings, branching possibilities. These are where consciousness influence would matter.

Implementation:

Entropy < 0.50: greedy sampling, no API call
Entropy >= 0.50: QRNG sampling with EDT temperature

This reduces API calls by 50-80% while focusing quantum randomness where it matters.

Token Color-Coding

Generated tokens are color-coded based on the z-score magnitude from the QRNG data. The z-score measures how far the sample mean deviates from the expected population mean in units of standard error. Larger deviations represent increasingly improbable statistical events that may correlate with consciousness influence. Bluer colors indicate a shift toward high-probability tokens; redder colors indicate a shift toward low-probability tokens:

Color	Z-Score Range	Meaning
Grey	N/A	Deterministic (greedy, no QRNG)
White	\|z\| < 1.0	Near expected mean
Light Blue	z ∈ [-2, -1)	Mild high-probability shift
Blue (vivid)	z < -2	Strong high-probability shift
Pink	z ∈ (1, 2]	Mild low-probability shift
Red	z > 2	Strong low-probability shift

EDT Temperature Scaling

Entropy-based Dynamic Temperature adjusts sampling temperature based on the model's uncertainty:

T = T0 * 0.8^(theta/entropy)

Higher entropy yields higher temperature (more exploration). Lower entropy yields lower temperature (more focus). Defaults: T0=2.0, theta=1.0, producing T=1.6 at maximum entropy.

Fresh Entropy Requirement

Each token selection makes a fresh API call. Pre-generated entropy pools may have already "collapsed" before the user's intent is formed. The delay between logit computation and random value generation should be minimal.

This adds latency but ensures temporal correlation between user state and quantum measurement.

Quick Start

Two QRNG providers are supported: ANU (default) and Qbert (invite-only, by Cipherstone). Select with --qrng-api.

Get an API key:
- ANU: Available via AWS Marketplace
- Qbert: Invite-only. Contact the Cipherstone administrator to request access.
Clone and build:

git clone --recurse-submodules https://github.com/alchemystack/quantum-llama.cpp.git
cd quantum-llama.cpp
cmake -B build -DLLAMA_CURL=OFF
cmake --build build --config Release

Set your API key and run: Widows

# ANU (default)
set ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt"

# Qbert
set QBERT_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" --qrng-api qbert

Linux/Mac

# ANU (default)
export ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt"

# Qbert
export QBERT_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" --qrng-api qbert

CLI Arguments

Argument	Description	Default
`--qrng-api {anu,qbert}`	Select QRNG API provider	anu
`--quantum-verbose`	Print entropy and temperature per token	off
`--quantum-statistics`	Print sampling statistics at end	off
`--quantum-entropy-threshold N`	Entropy cutoff for QRNG vs greedy	0.50
`--quantum-edt-t0 N`	EDT upper temperature bound	2.0
`--quantum-edt-theta N`	EDT entropy sensitivity	1.0
`--no-quantum-adaptive-sampling`	Always use QRNG	-
`--no-quantum-edt`	Fixed temperature instead of EDT	-

Limitations

Requires an API key (ANU or Qbert)
One API call per high-entropy token adds ~100-500ms latency
No support for -DLLAMA_CURL=ON
Consciousness influence on quantum events remains an open question in physics

Upstream

This fork tracks llama.cpp. See upstream documentation for model support, backends, quantization, and general usage.

License

MIT (same as llama.cpp)

Name		Name	Last commit message	Last commit date
Latest commit History 7,790 Commits
.devops		.devops
.gemini		.gemini
.github		.github
.zenflow/tasks		.zenflow/tasks
benches/dgx-spark		benches/dgx-spark
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
libpsijent @ 16ffd40		libpsijent @ 16ffd40
libpsirngclient @ 68b76b8		libpsirngclient @ 68b76b8
licenses		licenses
media		media
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
AUTHORS		AUTHORS
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
build-xcframework.sh		build-xcframework.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quantum-llama.cpp

Why Quantum Randomness?

Technical Approach

Z-Score Signal Amplification

Adaptive Entropy-Based Sampling

Token Color-Coding

EDT Temperature Scaling

Fresh Entropy Requirement

Quick Start

CLI Arguments

Limitations

Upstream

License

About

Uh oh!

Releases

Packages

Languages

License

alchemystack/quantum-llama.cpp

Folders and files

Latest commit

History

Repository files navigation

quantum-llama.cpp

Why Quantum Randomness?

Technical Approach

Z-Score Signal Amplification

Adaptive Entropy-Based Sampling

Token Color-Coding

EDT Temperature Scaling

Fresh Entropy Requirement

Quick Start

CLI Arguments

Limitations

Upstream

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages