quantum-llama.cpp

A llama.cpp fork that replaces pseudorandom token sampling with quantum random numbers from the ANU QRNG API. The output is co-authored by quantum events at the moment of generation.

Why Quantum Randomness?

Standard LLM sampling uses deterministic pseudorandom number generators. Each token choice is predetermined by a seed value set before inference begins.

Quantum random numbers are generated by physical processes (photon detection, vacuum fluctuations) where outcomes remain undetermined until measurement. Under certain interpretations of quantum mechanics, consciousness may influence these collapse events. If true, quantum-sourced token selection creates a channel for such influence.

This project treats that hypothesis seriously enough to build proper infrastructure for testing it.

Technical Approach

Mode-Based Signal Extraction

Raw QRNG output contains both quantum signal and classical noise (thermal effects, detector bias). Simple truncation or hashing destroys potential consciousness influence by making arbitrary outputs impossible to achieve through bit manipulation.

Our approach: fetch 20,480 bytes from ANU's hex16 endpoint, find the statistical mode (most frequent byte value), use that single value for sampling. This preserves the ability to "select" any output (0-255) while amplifying weak signals through statistical redundancy. Ties trigger a fresh API call.

Adaptive Entropy-Based Sampling

Most tokens have low entropy: the model is confident. "The capital of France is Par..." deterministically continues with "is." QRNG sampling here adds latency without benefit.

Tokens with high entropy represent genuine uncertainty: creative junctions, ambiguous phrasings, branching possibilities. These are where consciousness influence would matter.

Implementation:

Entropy < 0.50: greedy sampling, no API call
Entropy >= 0.50: QRNG sampling with EDT temperature

This reduces API calls by 50-80% while focusing quantum randomness where it matters.

Token Color-Coding

Generated tokens are color-coded based on the mode frequency detected in the QRNG data. Higher mode counts represent statistical anomalies that may correlate with consciousness influence:

Color	Mode Count	Meaning
Grey	N/A	Deterministic (greedy, no QRNG)
White	< 106	Statistically common
Pink	106-108	Above average frequency
Red	109-111	Rare
Purple	112+	Mythic rare

The expected mode count is ~80 (20,480 bytes / 256 possible values). Values above 106 represent increasingly improbable statistical events.

EDT Temperature Scaling

Entropy-based Dynamic Temperature adjusts sampling temperature based on the model's uncertainty:

T = T0 * 0.8^(theta/entropy)

Higher entropy yields higher temperature (more exploration). Lower entropy yields lower temperature (more focus). Defaults: T0=2.0, theta=1.0, producing T=1.6 at maximum entropy.

Fresh Entropy Requirement

Each token selection makes a fresh API call. Pre-generated entropy pools may have already "collapsed" before the user's intent is formed. The delay between logit computation and random value generation should be minimal.

This adds latency but ensures temporal correlation between user state and quantum measurement.

Quick Start

Get an ANU API key (paid, available via AWS Marketplace)
Clone and build:

git clone --recurse-submodules https://github.com/alchemystack/quantum-llama.cpp.git
cd quantum-llama.cpp
cmake -B build -DLLAMA_CURL=OFF
cmake --build build --config Release

Set your API key and run:

export ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" -n 128 -no-cnv

CLI Arguments

Argument	Description	Default
`--quantum-verbose`	Print entropy and temperature per token	off
`--quantum-statistics`	Print sampling statistics at end	off
`--quantum-entropy-threshold N`	Entropy cutoff for QRNG vs greedy	0.50
`--quantum-edt-t0 N`	EDT upper temperature bound	2.0
`--quantum-edt-theta N`	EDT entropy sensitivity	1.0
`--no-quantum-adaptive-sampling`	Always use QRNG	-
`--no-quantum-edt`	Fixed temperature instead of EDT	-

Limitations

Requires paid ANU API key
One API call per high-entropy token adds ~100-500ms latency
No support for -DLLAMA_CURL=ON
Consciousness influence on quantum events remains an open question in physics

Upstream

This fork tracks llama.cpp. See upstream documentation for model support, backends, quantization, and general usage.

License

MIT (same as llama.cpp)

Name		Name	Last commit message	Last commit date
Latest commit History 6,857 Commits
.devops		.devops
.github		.github
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
libpsirngclient @ 68b76b8		libpsirngclient @ 68b76b8
licenses		licenses
media		media
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS		AUTHORS
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
build-xcframework.sh		build-xcframework.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quantum-llama.cpp

Why Quantum Randomness?

Technical Approach

Mode-Based Signal Extraction

Adaptive Entropy-Based Sampling

Token Color-Coding

EDT Temperature Scaling

Fresh Entropy Requirement

Quick Start

CLI Arguments

Limitations

Upstream

License

About

Uh oh!

Releases

Packages

Languages

License

orphiceye/quantum-llama.cpp

Folders and files

Latest commit

History

Repository files navigation

quantum-llama.cpp

Why Quantum Randomness?

Technical Approach

Mode-Based Signal Extraction

Adaptive Entropy-Based Sampling

Token Color-Coding

EDT Temperature Scaling

Fresh Entropy Requirement

Quick Start

CLI Arguments

Limitations

Upstream

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages