LLM Math Calculator (LMC)

English | 中文

AI Infrastructure Resource Planning Tool — Estimate computing resources required for LLM training and inference

LMC is a "scientific calculator" for LLM training, encapsulating complex resource estimation formulas into a minimal interface. It helps you quickly answer:

How many GPUs are needed to train a 70B model?
How long will training take with 64 A100s?
Is there enough memory? What parallelism strategy should be used?

Features

Core Computation Engine: Mathematical models based on Megatron-LM papers, supporting FLOPs, memory, and training time estimation
Parallelism Strategy Recommendations: Automatic TP/PP/DP/ZeRO configuration suggestions
Hardware Database: Pre-configured parameters for mainstream GPUs including A100, H100, H800, Ascend 910B
Model Presets: Support for Llama-3, Mixtral, GPT-3, Qwen, DeepSeek and more
Multiple Interfaces: CLI + Python API + Web UI

Installation

# Basic installation
pip install lmc

# With Web UI
pip install lmc[web]

# Development environment
pip install lmc[dev]

Install from source:

git clone https://github.com/iannil/llm-math-calculator.git
cd llm-math-calculator
pip install -e ".[dev,web]"

Quick Start

CLI

# Basic estimation
lmc train --gpu A100-80G-SXM --params 70B --tokens 400B --num-gpus 64

# Specify target training days, auto-calculate GPU count
lmc train --gpu A100-80G-SXM --params 70B --tokens 400B --days 30

# Use model preset
lmc train --preset Llama-3-70B --tokens 400B --gpu H100-80G-SXM --num-gpus 128

# Interactive mode
lmc train -i

# List hardware
lmc hardware list

# List model presets
lmc model list

# CI/CD feasibility check
lmc check --params 70B --gpu A100-80G-SXM --num-gpus 64

Output example:

╭──────────────────────────────────────────────────────────────╮
│ LLM Math Calculator Report                                   │
╰──────────────────────────────────────────────────────────────╯
                       [Input]
  Model       70.0B Params
  Data        400.0B Tokens
  Hardware    A100-80G-SXM (FP16 Peak: 312.0 TFLOPS)
  GPUs        64

       [Resources Required]
  Total FLOPs      168.00 ZFLOPs
  Training Time    6.5 months
  GPU Hours        299,145
  GPU Days         12,464.4

        [Memory Breakdown (per GPU)]
  Parameters              4.4 GB
  Gradients               2.2 GB
  Optimizer States        13.1 GB
  Activations             9.2 GB
  Buffer/Frag (~8%)       2.7 GB
  --------------------    ----------
  Total Usage             35.9 GB / 80.0 GB
  Status                  (Safe)

         [Recommended Strategy]
  Framework          Megatron-DeepSpeed
  Parallelism        TP=4, PP=8, DP=2
  ZeRO Stage         Stage-2
  FlashAttn          Enabled (Required)
  Grad Checkpoint    Recommended
  MFU                50%

Python API

from lmc import (
    estimate_resources,
    TrainingConfig,
    ZeROStage,
    calc_total_flops,
    calc_memory_model_states,
    get_hardware,
    get_preset,
)

# Full resource estimation
config = TrainingConfig(
    params_billion=70,
    tokens_billion=400,
    gpu_name="A100-80G-SXM",
    num_gpus=64,
    zero_stage=ZeROStage.ZERO_2,
    use_recomputation=True,
)
result = estimate_resources(config)

print(f"Training time: {result.training_time_formatted}")
print(f"GPU Hours: {result.gpu_hours:,.0f}")
print(f"Memory/GPU: {result.memory.per_gpu:.1f} GB")
print(f"Recommended strategy: TP={result.parallelism.tensor_parallel}, PP={result.parallelism.pipeline_parallel}")

# Individual calculations
flops = calc_total_flops(params=70e9, tokens=400e9)
memory = calc_memory_model_states(params=70e9, zero_stage=ZeROStage.ZERO_2, num_gpus=8)

# Using presets
preset = get_preset("Llama-3-70B")
print(f"Hidden size: {preset.hidden_size}, Layers: {preset.num_layers}")

Web UI

# Launch Web interface
streamlit run src/lmc/web.py
# Or
lmc-web

Core Formulas

Based on Megatron-LM paper and industry practices:

Metric	Formula	Description
Training FLOPs	`6 × P × D`	P=parameters, D=training data size
Memory (ZeRO-0)	`16 Bytes × P`	Parameters + Gradients + Optimizer states
Activations	`s × b × h × L × (34 + 5ah/s)`	s=sequence length, b=batch, h=hidden, L=layers
KV Cache	`2 × b × s × h × L × 2`	For inference
Training Time	`FLOPs / (GPUs × TFLOPS × MFU)`	MFU: Model FLOPs Utilization

ZeRO Optimization

Stage	Partitioned Content	Memory Savings
ZeRO-0	None	Baseline (16B/param)
ZeRO-1	Optimizer states	~4x
ZeRO-2	Optimizer + Gradients	~8x
ZeRO-3	All	~N (GPU count)

Supported Hardware

Hardware	Memory	FP16 TFLOPS	Interconnect Bandwidth	Typical MFU
A100-40G-SXM	40 GB	312	NVLink 600 GB/s	50%
A100-80G-SXM	80 GB	312	NVLink 600 GB/s	50%
A100-80G-PCIe	80 GB	312	-	45%
H100-80G-SXM	80 GB	989	NVLink 900 GB/s	55%
H100-80G-PCIe	80 GB	756	-	50%
H800-80G-SXM	80 GB	989	NVLink 400 GB/s	50%
Ascend 910B	64 GB	320	HCCS 392 GB/s	45%
L40S	48 GB	362	-	45%

Supported Model Presets

Model	Parameters	Architecture	MoE
Llama-3-8B	8B	Llama	-
Llama-3-70B	70B	Llama	-
Llama-3.1-405B	405B	Llama	-
Mixtral-8x7B	46.7B (12.9B active)	Mixtral	8×2
Mixtral-8x22B	141B (39B active)	Mixtral	8×2
GPT-3-175B	175B	GPT	-
Qwen2-72B	72B	Qwen	-
DeepSeek-V2-236B	236B (21B active)	DeepSeek	160×6

Project Structure

llm-math-calculator/
├── pyproject.toml          # Project configuration
├── data/
│   ├── hardware.json       # Hardware database
│   └── presets.json        # Model presets
├── src/lmc/
│   ├── __init__.py         # API exports
│   ├── cli.py              # CLI commands
│   ├── engine.py           # Computation engine
│   ├── loader.py           # Data loader
│   ├── models.py           # Data models
│   ├── optimizer.py        # Strategy recommender
│   └── web.py              # Web UI
└── tests/
    └── test_engine.py      # Unit tests

Development

# Clone repository
git clone https://github.com/iannil/llm-math-calculator.git
cd llm-math-calculator

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install development dependencies
pip install -e ".[dev,web]"

# Run tests
pytest tests/ -v

# Test a single command
lmc train --gpu A100-80G-SXM --params 7B --tokens 100B

Contributing

Contributions are welcome! Especially:

Hardware Data: Add new GPUs (e.g., B200, MI300X) to data/hardware.json
Model Presets: Add new models to data/presets.json
Formula Improvements: Enhance estimation accuracy
Documentation: Multi-language support

Before submitting a PR, please ensure:

All tests pass (pytest tests/)
Code follows project style
Related documentation is updated

Roadmap

Completed (v1.0)

Planned

Add more GPU support (B200, MI300X, etc.)
Add inference resource estimation
PyPI package publishing
Interactive Web UI enhancements

References

License

MIT License - See LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
docs		docs
src/lmc		src/lmc
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Math Calculator (LMC)

Features

Installation

Quick Start

CLI

Python API

Web UI

Core Formulas

ZeRO Optimization

Supported Hardware

Supported Model Presets

Project Structure

Development

Contributing

Roadmap

Completed (v1.0)

Planned

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Math Calculator (LMC)

Features

Installation

Quick Start

CLI

Python API

Web UI

Core Formulas

ZeRO Optimization

Supported Hardware

Supported Model Presets

Project Structure

Development

Contributing

Roadmap

Completed (v1.0)

Planned

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages