Skip to content

πŸš€ Optimize inference memory to run 70B language models on a 4GB GPU, and process 405B Llama3.1 with just 8GB VRAM.

Notifications You must be signed in to change notification settings

yhinsson/yhinsson.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

About

πŸš€ Optimize inference memory to run 70B language models on a 4GB GPU, and process 405B Llama3.1 with just 8GB VRAM.

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published