Skip to content

orphiceye/quantum-llama.cpp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6,857 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

quantum-llama.cpp

A llama.cpp fork that replaces pseudorandom token sampling with quantum random numbers from the ANU QRNG API. The output is co-authored by quantum events at the moment of generation.

Why Quantum Randomness?

Standard LLM sampling uses deterministic pseudorandom number generators. Each token choice is predetermined by a seed value set before inference begins.

Quantum random numbers are generated by physical processes (photon detection, vacuum fluctuations) where outcomes remain undetermined until measurement. Under certain interpretations of quantum mechanics, consciousness may influence these collapse events. If true, quantum-sourced token selection creates a channel for such influence.

This project treats that hypothesis seriously enough to build proper infrastructure for testing it.

Technical Approach

Mode-Based Signal Extraction

Raw QRNG output contains both quantum signal and classical noise (thermal effects, detector bias). Simple truncation or hashing destroys potential consciousness influence by making arbitrary outputs impossible to achieve through bit manipulation.

Our approach: fetch 20,480 bytes from ANU's hex16 endpoint, find the statistical mode (most frequent byte value), use that single value for sampling. This preserves the ability to "select" any output (0-255) while amplifying weak signals through statistical redundancy. Ties trigger a fresh API call.

Adaptive Entropy-Based Sampling

Most tokens have low entropy: the model is confident. "The capital of France is Par..." deterministically continues with "is." QRNG sampling here adds latency without benefit.

Tokens with high entropy represent genuine uncertainty: creative junctions, ambiguous phrasings, branching possibilities. These are where consciousness influence would matter.

Implementation:

  • Entropy < 0.50: greedy sampling, no API call
  • Entropy >= 0.50: QRNG sampling with EDT temperature

This reduces API calls by 50-80% while focusing quantum randomness where it matters.

Token Color-Coding

Generated tokens are color-coded based on the mode frequency detected in the QRNG data. Higher mode counts represent statistical anomalies that may correlate with consciousness influence:

Color Mode Count Meaning
Grey N/A Deterministic (greedy, no QRNG)
White < 106 Statistically common
Pink 106-108 Above average frequency
Red 109-111 Rare
Purple 112+ Mythic rare

The expected mode count is ~80 (20,480 bytes / 256 possible values). Values above 106 represent increasingly improbable statistical events.

EDT Temperature Scaling

Entropy-based Dynamic Temperature adjusts sampling temperature based on the model's uncertainty:

T = T0 * 0.8^(theta/entropy)

Higher entropy yields higher temperature (more exploration). Lower entropy yields lower temperature (more focus). Defaults: T0=2.0, theta=1.0, producing T=1.6 at maximum entropy.

Fresh Entropy Requirement

Each token selection makes a fresh API call. Pre-generated entropy pools may have already "collapsed" before the user's intent is formed. The delay between logit computation and random value generation should be minimal.

This adds latency but ensures temporal correlation between user state and quantum measurement.

Quick Start

  1. Get an ANU API key (paid, available via AWS Marketplace)

  2. Clone and build:

git clone --recurse-submodules https://github.com/alchemystack/quantum-llama.cpp.git
cd quantum-llama.cpp
cmake -B build -DLLAMA_CURL=OFF
cmake --build build --config Release
  1. Set your API key and run:
export ANU_API_KEY="your-key"
./build/bin/llama-cli -m model.gguf -p "prompt" -n 128 -no-cnv

CLI Arguments

Argument Description Default
--quantum-verbose Print entropy and temperature per token off
--quantum-statistics Print sampling statistics at end off
--quantum-entropy-threshold N Entropy cutoff for QRNG vs greedy 0.50
--quantum-edt-t0 N EDT upper temperature bound 2.0
--quantum-edt-theta N EDT entropy sensitivity 1.0
--no-quantum-adaptive-sampling Always use QRNG -
--no-quantum-edt Fixed temperature instead of EDT -

Limitations

  • Requires paid ANU API key
  • One API call per high-entropy token adds ~100-500ms latency
  • No support for -DLLAMA_CURL=ON
  • Consciousness influence on quantum events remains an open question in physics

Upstream

This fork tracks llama.cpp. See upstream documentation for model support, backends, quantization, and general usage.

License

MIT (same as llama.cpp)

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 58.4%
  • C 13.5%
  • Python 8.4%
  • Cuda 6.5%
  • Metal 2.2%
  • Svelte 1.6%
  • Other 9.4%