#
int3
Here are 3 public repositories matching this topic...
A demo of a Structured/Vectored Exception Handler hook using INT3 opcode to trigger the exception.
-
Updated
Nov 10, 2024 - C++
Training-free INT3 KV cache quantization: 5.09× compression, ~10 lines of Python, <5% WikiText-2 ΔPPL on 8 of 8 open-weight Transformers (GPT-J 2021 → Gemma-4 2026). No calibration, no codebook, no rotation, no adapter. +2.4% decode overhead with torch.compile (no custom CUDA).
reproducible-research transformers pytorch open-science quantization memory-optimization kv-cache int4 calibration-free llm-inference int3 norm-separation per-channel-quantization activation-outliers
-
Updated
May 21, 2026 - Python
Improve this page
Add a description, image, and links to the int3 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the int3 topic, visit your repo's landing page and select "manage topics."