Model Quantization

To reduce the cost of running the fact-checking workflow, we have quantized the Latxa 70B and 8B models to 4-bit with AWQ Quantization using https://github.com/vllm-project/llm-compressor.

The quantized models ara available in this huggingface repository: https://huggingface.co/collections/Iker/latxa-4bit. They are compatible with huggingface transformers and vLLM.

Details on how to reproduce the quantization are available at each model card.

We found that quantization very significantly reduces VRAM requirements, while maintaining model performance.

Model	Accuracy	Precision	Recall	F1 Score	Undetermined Rate	Cost ($)
Latxa 8B 4-bit	75.98%	80.77%	67.20%	73.36	15.33%	1.50
Latxa 8B	74.72%	80.95%	63.91%	71.43	10.33%	1.50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Quantization

FilesExpand file tree

model-quantization.md

Latest commit

History

model-quantization.md

File metadata and controls

Model Quantization