Skip to content
#

int3

Here are 3 public repositories matching this topic...

Language: All
Filter by language

Training-free INT3 KV cache quantization: 5.09× compression, ~10 lines of Python, <5% WikiText-2 ΔPPL on 8 of 8 open-weight Transformers (GPT-J 2021 → Gemma-4 2026). No calibration, no codebook, no rotation, no adapter. +2.4% decode overhead with torch.compile (no custom CUDA).

  • Updated May 21, 2026
  • Python

Improve this page

Add a description, image, and links to the int3 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the int3 topic, visit your repo's landing page and select "manage topics."

Learn more