int3

Star

Here are 3 public repositories matching this topic...

intel / neural-speed

Star

An innovative library for efficient LLM inference via low-bit quantization

Updated Aug 30, 2024
C++

BenteVE / SEH-VEH-hook-INT3-opcode

Star

A demo of a Structured/Vectored Exception Handler hook using INT3 opcode to trigger the exception.

hooking vectored-exception-handling structured-exception-handling exception-filter int3

Updated Nov 10, 2024
C++

Training-free INT3 KV cache quantization: 5.09× compression, ~10 lines of Python, <5% WikiText-2 ΔPPL on 8 of 8 open-weight Transformers (GPT-J 2021 → Gemma-4 2026). No calibration, no codebook, no rotation, no adapter. +2.4% decode overhead with torch.compile (no custom CUDA).

reproducible-research transformers pytorch open-science quantization memory-optimization kv-cache int4 calibration-free llm-inference int3 norm-separation per-channel-quantization activation-outliers

Updated May 21, 2026
Python

Improve this page

Add a description, image, and links to the int3 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the int3 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int3

Here are 3 public repositories matching this topic...

intel / neural-speed

BenteVE / SEH-VEH-hook-INT3-opcode

metaSATOKEN / nsn

Improve this page

Add this topic to your repo