This repo estimates the required VRAM or commonly used GPU memory to train a large language model (LLM). Supports:
- ZeRO stages.
- Providing HuggingFace hub repository id (example: meta-llama/Meta-Llama-3-8B)
- Add LoRA/QLoRA for finetuning purposes
- Make UI with gradio/streamlit and add sliders
- Support for reversing the problem: given GPUs how large of a model can I train
In case you only use with your own estimations and numbers and no HuggingFace config is given, the prerequisities and installation can be skipped.
- transformers (only necessary for automatic HuggingFace hub model parsing)
python -m venv venv
source venv/bin/activate
pip install transformerspython vram_estimator_old.py --micro_batch_size 1 --num_gpus 1 --repo_id TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3When a repo_id is given, the argument parser values are overwritten
Some models have been confirmed experimentally on their VRAM usage. See list below:
| Repo id/Model name | Micro batch size | Number of GPUs | ZeRO stage | Gradient checkpointing | Estimated VRAM (per GPU) | Actual VRAM (per GPU) |
|---|---|---|---|---|---|---|
| TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3 | 1 | 1 | 0 | False | 34.3 GB | 33.5GB |