GitHub - yhinsson/yhinsson.github.io: 🚀 Optimize inference memory to run 70B language models on a 4GB GPU, and process 405B Llama3.1 with just 8GB VRAM.

yhinsson / yhinsson.github.io Public

🚀 Optimize inference memory to run 70B language models on a 4GB GPU, and process 405B Llama3.1 with just 8GB VRAM.

github.com/yhinsson/airllm

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
syntonic		syntonic
index.md		index.md

About

🚀 Optimize inference memory to run 70B language models on a 4GB GPU, and process 405B Llama3.1 with just 8GB VRAM.

github.com/yhinsson/airllm

No releases published

No packages published