estimate-train-time

Predict distributed LLM training time before you run. This tool estimates the wall-clock time for training large language models across multiple GPUs using 3D parallelism (pipeline, tensor, and data parallelism), helping you plan capacity and compare parallelization strategies without expensive trial runs.

Installation

pip install estimate-train-time  # Coming soon to PyPI

Note: PyPI package is coming soon. For now, install directly from the repository:

git clone https://github.com/DebarghaG/estimate-train-time.git
cd estimate-train-time
pip install -e .

Quick Start

# List available example configurations
estimate-train-time list-examples

# Run prediction with a bundled example (Llama 7B on A100s)
estimate-train-time predict --example llemma_7b_4_2_2_P

Output:

Estimated time cost of current training config: 9480819.17 us
                                               = 9480.82 ms
                                               = 9.4808 s

Features

3D Parallelism Support: Pipeline, tensor (model), and data parallelism
Pre-trained Regressors: Bundled models for NVIDIA A100 and GH200 GPUs
No GPU Required: Predictions run on CPU using trained regressors
Extensible: Add your own GPU profiles and cluster configurations

Documentation

Getting Started - Installation and first prediction
Core Concepts - Understanding distributed training estimation
Configuration Reference - Config file parameters
CLI Reference - Command-line options
Python API - Programmatic usage
Examples - Usage examples and custom configurations
Advanced - Kernel sampling and extending the tool

Python API

from estimate_train_time import one_batch_predict

# Predict training time from a config file
time_us = one_batch_predict("path/to/config.yml")
print(f"One batch takes {time_us / 1e6:.2f} seconds")

Requirements

Python 3.8+
pandas, numpy, scikit-learn, xgboost, pyyaml, ijson, joblib

For GPU sampling (optional): torch, flash-attn, deepspeed

Acknowledgements

National Science Foundation (NSF) funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)

Citation

If you use this tool in your research, please cite our paper, accepted to HiPC 2025 (proceedings forthcoming):

@article{zhang2025efficient,
  title={Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM},
  author={Zhang, Biyao and Zheng, Mingkai and Ganguly, Debargha and Zhang, Xuecen and Singh, Vikash and Chaudhary, Vipin and Zhang, Zhao},
  journal={arXiv preprint arXiv:2509.22832},
  year={2025}
}

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
examples/target_config		examples/target_config
results_backup		results_backup
scripts		scripts
src/estimate_train_time		src/estimate_train_time
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

estimate-train-time

Installation

Quick Start

Features

Documentation

Python API

Requirements

Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

estimate-train-time

Installation

Quick Start

Features

Documentation

Python API

Requirements

Acknowledgements

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages