Complete reference for the estimate-train-time command-line interface.
estimate-train-time [--version] [--help] <command> [<args>]Display the package version and exit.
estimate-train-time --versionOutput:
estimate-train-time 0.1.0
Display help information.
estimate-train-time --helpOutput:
usage: estimate-train-time [-h] [--version] {predict,list-examples,show-example} ...
Distributed training time estimator for Large Language Models
positional arguments:
{predict,list-examples,show-example}
Available commands
predict Run time estimation prediction
list-examples List available example configurations
show-example Show an example configuration
optional arguments:
-h, --help show this help message and exit
--version Show version and exit
Run training time prediction with a configuration file.
estimate-train-time predict [--config PATH | --example NAME]| Option | Short | Type | Description |
|---|---|---|---|
--config |
-c |
PATH | Path to a YAML configuration file |
--example |
-e |
NAME | Name of a bundled example configuration |
Note: You must specify either --config or --example, but not both.
Using a custom config file:
estimate-train-time predict --config /path/to/my_config.yml
estimate-train-time predict -c ./configs/llama_70b.ymlUsing a bundled example:
estimate-train-time predict --example llemma_7b_4_2_2_P
estimate-train-time predict -e llemma_7b_4_2_2_VThe command outputs:
- Path to the configuration being used
- Loading messages for each regressor model
- Per-operator predictions with input shapes
- Final time estimate in microseconds, milliseconds, and seconds
Running prediction with config: /path/to/config.yml
----------------------------------------
Loading /path/to/regressors/NVIDIAA100-SXM4-80GB_embedding_fp16_fwd.json
Function:embedding_fp16_fwd Input:[2, 4, 4096, 4096] PredictorInput:[16384, 400, 4096] Prediction:123.45
...
Estimated time cost of current training config: 9480819.17 us
= 9480.82 ms
= 9.4808 s
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (config not found, invalid config, prediction failed) |
List all available bundled example configurations.
estimate-train-time list-examplesAvailable example configurations:
----------------------------------------
llemma_7b_4_2_2_P
llemma_7b_4_2_2_V
Use 'estimate-train-time show-example <name>' to view a configuration.
Use 'estimate-train-time predict --example <name>' to run prediction.
Display the contents of a bundled example configuration.
estimate-train-time show-example <name>| Argument | Type | Description |
|---|---|---|
name |
NAME | Name of the example configuration (without .yml extension) |
estimate-train-time show-example llemma_7b_4_2_2_P# Configuration: llemma_7b_4_2_2_P
# Path: /path/to/examples/llemma_7b_4_2_2_P.yml
----------------------------------------
{
"gpu_name": "NVIDIAA100-SXM4-80GB",
"operator_data_folder": "./regressors/Perlmutter/operator",
"nccl_data_folder": "./regressors/Perlmutter/nccl",
# pp, mp, dp, b, h, l, dim, steps_per_update, gpus_per_node
"training_config": [4, 2, 2, 4, 32, 4096, 4096, 8, 4],
...
}
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Example not found |
Test with bundled examples:
# See what's available
estimate-train-time list-examples
# Run prediction
estimate-train-time predict -e llemma_7b_4_2_2_PUse your own config file:
# Create config from template
estimate-train-time show-example llemma_7b_4_2_2_P > my_config.yml
# Edit my_config.yml to match your setup
# ...
# Run prediction
estimate-train-time predict -c my_config.ymlRun multiple predictions to compare:
# Compare Perlmutter vs Vista
estimate-train-time predict -e llemma_7b_4_2_2_P
estimate-train-time predict -e llemma_7b_4_2_2_V
# Compare different parallelism strategies
estimate-train-time predict -c configs/pp4_mp2_dp2.yml
estimate-train-time predict -c configs/pp2_mp4_dp2.yml
estimate-train-time predict -c configs/pp2_mp2_dp4.ymlUse in shell scripts:
#!/bin/bash
# compare_configs.sh
for config in configs/*.yml; do
echo "Testing: $config"
estimate-train-time predict -c "$config" 2>/dev/null | tail -3
echo "---"
doneFor more control, use the Python API instead:
from estimate_train_time import one_batch_predict
configs = ['config_a.yml', 'config_b.yml', 'config_c.yml']
for config in configs:
time_us = one_batch_predict(config)
print(f"{config}: {time_us/1e6:.2f}s per step")The CLI does not currently use environment variables. All configuration is done through command-line arguments and YAML config files.
- Getting Started - First steps with the tool
- Configuration Reference - Config file format
- Python API - Programmatic usage