-
Notifications
You must be signed in to change notification settings - Fork 0
π Optimize inference memory to run 70B language models on a 4GB GPU, and process 405B Llama3.1 with just 8GB VRAM.
yhinsson/yhinsson.github.io
About
π Optimize inference memory to run 70B language models on a 4GB GPU, and process 405B Llama3.1 with just 8GB VRAM.
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published