PentaNet — Native Pentanary Quantization for LLMs

language

en

license

mit

PentaNet — Native Pentanary Quantization for LLMs

Author: Zorko · zorko.xyz

PentaNet extends extreme quantization beyond BitNet's ternary {-1, 0, +1} to pentanary {-2, -1, 0, +1, +2}, achieving a 6.4% perplexity improvement at 124M params on WikiText-103 while preserving zero-multiplier arithmetic. Scaling experiments show this advantage does not transfer to larger models (345M+) — the pentanary space requires more sophisticated scaling than absmean + STE.

Key Results

124M params (12 layers × 768 embed)

Model	Mean PPL	Std	Seeds
PentaNet {-2..+2}	180.32	±2.09	42, 1337, 2026
BitNet {-1..+1}	192.63	±3.52	42, 1337, 2026

124M parameter GPT-2-style transformer
WikiText-103 (~100M tokens)
Trained on a single RTX 5080 (16 GB)
No collapse: ±2 buckets maintain ~11% occupancy through all 10k iterations

345M params (24 layers × 1024 embed)

Model	PPL	Note
BitNet {-1..+1}	273.0	67% outer state usage
PentaNet {-2..+2}	320.1	22% outer state usage
PentaNet sf=0.8	618	34% outer state usage (short_wide 12×1536)

See INVESTIGATION.md for the full scaling analysis and scale_factor ablation.

Text Generation Example (124M params, 20min training)

(Prompt: "The history of the internet began with")

⏳ Generating with BitNet (Ternary {-1, 0, 1}) ...
🤖 BITNET S42: The history of the internet began with the <unk> to be a way , <unk> , which was the first recent of the <unk> , and the city and the <unk> . The French army was the first to be the first @-@ scale

⏳ Generating with PentaNet (Pentanary {-2, -1, 0, 1, 2}) ...
🤖 PENTANET S42: The history of the internet began with the original level of the other . The term of the original world was to the public court of the United States in July 2013 in February 15 , 2015 , as well as the team of $ 2 @,@ 000 . In the same year , the

Notice how BitNet struggles with vocabulary collapse (<unk>) and repetitive stuttering, while PentaNet generates fluent, grammatically correct Wikipedia-style coherent sentences (despite being factually hallucinatory due to the small size).

Project Structure

├── README.md
├── PentaNet_NeurIPS_Draft.md       # Full technical report (markdown)
├── train_pentagpt.py               # Core training script (PentaNet + BitNet)
├── pentanet_layer.py               # PentaLinear layer implementation
├── prepare_data.py                 # WikiText-103 data preparation
├── run_benchmark.py                # 3-seed benchmark orchestrator
├── paper/
│   ├── PentaNet_Technical_Report.pdf
│   └── figures/
├── scripts/                        # Visualization & utilities
│   ├── compile_pdf.py
│   ├── export_figures.py
│   ├── generate_dashboard.py
│   └── pentanet_analysis.py
└── models/                         # JSON logs + model checkpoints
    ├── pentanet_large_s{42,1337,2026}_results.json
    └── bitnet_large_s{42,1337,2026}_results.json

Quick Start

# 1. Setup
python -m venv .venv-gpu && source .venv-gpu/bin/activate
pip install torch transformers datasets

# 2. Prepare data
python prepare_data.py

# 3. Run full benchmark (3 seeds × 2 architectures, ~2h15 on RTX 5080)
python run_benchmark.py

# 4. Visualize results
python scripts/generate_dashboard.py   # Interactive HTML dashboard
python scripts/export_figures.py       # Publication-quality PNG/PDF
python scripts/compile_pdf.py          # Compile full paper PDF

Model Weights (HuggingFace)

Pre-trained checkpoints are available on HuggingFace:

🤗 kyworn/pentanet-124m

V2 Resurrection Paths

The scaling investigation shows absmean + STE is the bottleneck. Three concrete directions for PentaNet v2:

Learnable Scale per Layer — Replace the fixed absmean with a per-layer parameter optimized jointly with weights, letting each layer find its own optimal quantization grid entropy.
Distillation from FP32 — Stop training from scratch. Use a pre-trained FP32 model as teacher so pentanary weights inherit meaningful structure instead of discovering it from random init.
Soft Quantization — Replace the hard Round+Clip STE with continuous relaxations (Gumbel-Softmax or temperature-scaled) so weights slide smoothly toward ±2 instead of being thrown there abruptly.

Citation

@techreport{zorko2026pentanet,
  title     = {PentaNet: Native Pentanary Quantization for Large Language Models},
  author    = {Zorko},
  year      = {2026},
  url       = {https://zorko.xyz}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
models		models
paper		paper
scripts		scripts
.gitignore		.gitignore
INVESTIGATION.md		INVESTIGATION.md
PentaNet_NeurIPS_Draft.md		PentaNet_NeurIPS_Draft.md
README.md		README.md
benchmark_avx2_results.json		benchmark_avx2_results.json
benchmark_cpu_results.json		benchmark_cpu_results.json
benchmark_kernel_results.json		benchmark_kernel_results.json
modal_scaling.py		modal_scaling.py
penta_avx2.c		penta_avx2.c
penta_avx2.so		penta_avx2.so
penta_avx2_wrapper.py		penta_avx2_wrapper.py
penta_kernel.py		penta_kernel.py
penta_kernel_cpu.py		penta_kernel_cpu.py
pentanet_layer.py		pentanet_layer.py
prepare_data.py		prepare_data.py
run_benchmark.py		run_benchmark.py
run_scaling.py		run_scaling.py
scaling_run_log.json		scaling_run_log.json
train_pentagpt.py		train_pentagpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PentaNet — Native Pentanary Quantization for LLMs

Key Results

124M params (12 layers × 768 embed)

345M params (24 layers × 1024 embed)

Text Generation Example (124M params, 20min training)

Project Structure

Quick Start

Model Weights (HuggingFace)

V2 Resurrection Paths

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PentaNet — Native Pentanary Quantization for LLMs

Key Results

124M params (12 layers × 768 embed)

345M params (24 layers × 1024 embed)

Text Generation Example (124M params, 20min training)

Project Structure

Quick Start

Model Weights (HuggingFace)

V2 Resurrection Paths

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages