Skip to content

alchemystack/quantum-llama.cpp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7,790 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

quantum-llama.cpp

A llama.cpp fork that replaces pseudorandom token sampling with quantum random numbers from the ANU QRNG API or Qbert QRNG API (by Cipherstone). The output is co-authored by quantum events at the moment of generation.

Why Quantum Randomness?

Standard LLM sampling uses deterministic pseudorandom number generators. Each token choice is predetermined by a seed value set before inference begins.

Quantum random numbers are generated by physical processes (photon detection, vacuum fluctuations) where outcomes remain undetermined until measurement. Under certain interpretations of quantum mechanics, consciousness may influence these collapse events. If true, quantum-sourced token selection creates a channel for such influence.

This project treats that hypothesis seriously enough to build proper infrastructure for testing it.

Technical Approach

Z-Score Signal Amplification

Raw QRNG output contains both quantum signal and classical noise (thermal effects, detector bias). Simple truncation or hashing destroys potential consciousness influence by making arbitrary outputs impossible to achieve through bit manipulation.

Our approach: fetch 20,480 bytes from the QRNG provider's hex16 endpoint, compute the sample mean, convert to a z-score against the known population distribution (μ=127.5, σ_m=0.51433), and map through the standard normal CDF to produce a uniform float in [0, 1). This leverages the Central Limit Theorem: even a sub-0.2% per-sample bias produces a detectable shift in the aggregate mean, which the z-score → CDF pipeline converts into a meaningful change in token selection probability.

Token selection uses a probability-ordered descending CDF: tokens are sorted from most probable to least probable, so higher values of the uniform float select increasingly surprising tokens. This gives the consciousness influence lever a coherent direction.

Adaptive Entropy-Based Sampling

Most tokens have low entropy: the model is confident. "The capital of France is Par..." deterministically continues with "is." QRNG sampling here adds latency without benefit.

Tokens with high entropy represent genuine uncertainty: creative junctions, ambiguous phrasings, branching possibilities. These are where consciousness influence would matter.

Implementation:

  • Entropy < 0.50: greedy sampling, no API call
  • Entropy >= 0.50: QRNG sampling with EDT temperature

This reduces API calls by 50-80% while focusing quantum randomness where it matters.

Token Color-Coding

Generated tokens are color-coded based on the z-score magnitude from the QRNG data. The z-score measures how far the sample mean deviates from the expected population mean in units of standard error. Larger deviations represent increasingly improbable statistical events that may correlate with consciousness influence. Bluer colors indicate a shift toward high-probability tokens; redder colors indicate a shift toward low-probability tokens:

Color Z-Score Range Meaning
Grey N/A Deterministic (greedy, no QRNG)
White |z| < 1.0 Near expected mean
Light Blue z ∈ [-2, -1) Mild high-probability shift
Blue (vivid) z < -2 Strong high-probability shift
Pink z ∈ (1, 2] Mild low-probability shift
Red z > 2 Strong low-probability shift

EDT Temperature Scaling

Entropy-based Dynamic Temperature adjusts sampling temperature based on the model's uncertainty:

T = T0 * 0.8^(theta/entropy)

Higher entropy yields higher temperature (more exploration). Lower entropy yields lower temperature (more focus). Defaults: T0=2.0, theta=1.0, producing T=1.6 at maximum entropy.

Fresh Entropy Requirement

Each token selection makes a fresh API call. Pre-generated entropy pools may have already "collapsed" before the user's intent is formed. The delay between logit computation and random value generation should be minimal.

This adds latency but ensures temporal correlation between user state and quantum measurement.

Quick Start

Two QRNG providers are supported: ANU (default) and Qbert (invite-only, by Cipherstone). Select with --qrng-api.

  1. Get an API key:

    • ANU: Available via AWS Marketplace
    • Qbert: Invite-only. Contact the Cipherstone administrator to request access.
  2. Clone and build:

git clone --recurse-submodules https://github.com/alchemystack/quantum-llama.cpp.git
cd quantum-llama.cpp
cmake -B build -DLLAMA_CURL=OFF
cmake --build build --config Release
  1. Set your API key and run: Widows
# ANU (default)
set ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt"

# Qbert
set QBERT_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" --qrng-api qbert

Linux/Mac

# ANU (default)
export ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt"

# Qbert
export QBERT_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" --qrng-api qbert

CLI Arguments

Argument Description Default
--qrng-api {anu,qbert} Select QRNG API provider anu
--quantum-verbose Print entropy and temperature per token off
--quantum-statistics Print sampling statistics at end off
--quantum-entropy-threshold N Entropy cutoff for QRNG vs greedy 0.50
--quantum-edt-t0 N EDT upper temperature bound 2.0
--quantum-edt-theta N EDT entropy sensitivity 1.0
--no-quantum-adaptive-sampling Always use QRNG -
--no-quantum-edt Fixed temperature instead of EDT -

Limitations

  • Requires an API key (ANU or Qbert)
  • One API call per high-entropy token adds ~100-500ms latency
  • No support for -DLLAMA_CURL=ON
  • Consciousness influence on quantum events remains an open question in physics

Upstream

This fork tracks llama.cpp. See upstream documentation for model support, backends, quantization, and general usage.

License

MIT (same as llama.cpp)

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 55.9%
  • C 12.2%
  • Python 7.9%
  • Cuda 6.5%
  • HTML 4.5%
  • Metal 2.0%
  • Other 11.0%