Automated benchmark that discovers optimal resource configuration for a given system size by running short production trials across a sweep of resource configs.
Motivation
Different system sizes have different optimal resource configurations. A 5,000-atom system doesn't benefit from 64 cores the way a 500,000-atom system does. Currently the pipeline uses hardcoded settings (num_cores: 8, num_gpus: 1). A short benchmark sweep identifies the throughput sweet spot before committing to a full campaign.
Depends loosely on: #7, #8 (Cluster autodiscovery — knows available core counts and GPU types)
Scope
mdfactory/performance/benchmark.py
BenchmarkConfig dataclass: sweep parameters (core counts, GPU replicas, duration)
run_benchmark_sweep(system_path, config) -> BenchmarkResult
- GROMACS md.log parser: extract ns/day from performance section
- Result selection: best ns/day or best ns/day per core-hour
- Short benchmark MDP template (100 ps, minimal output)
- Result persistence as JSON sidecar per campaign
Sweep dimensions
- CPU path: vary cores (1, 2, 4, 8, 16, 32, 64) with OMP_NUM_THREADS
- GPU path: vary concurrent replicas (1, 2, 4, 8, 16) sharing one GPU
Reuse across HT batch
If all systems in a campaign have similar atom counts (within ±20%), the benchmark runs once on a representative system. The discovered settings then apply to all production runs in the batch.
Acceptance criteria
- Parser correctly extracts ns/day from GROMACS md.log samples
- Sweep generates correct srun commands for each configuration
- Result JSON contains all trials with ns/day and selects optimum
- Works without cluster (dry-run mode that generates scripts without submitting)
Automated benchmark that discovers optimal resource configuration for a given system size by running short production trials across a sweep of resource configs.
Motivation
Different system sizes have different optimal resource configurations. A 5,000-atom system doesn't benefit from 64 cores the way a 500,000-atom system does. Currently the pipeline uses hardcoded settings (
num_cores: 8,num_gpus: 1). A short benchmark sweep identifies the throughput sweet spot before committing to a full campaign.Depends loosely on: #7, #8 (Cluster autodiscovery — knows available core counts and GPU types)
Scope
mdfactory/performance/benchmark.pyBenchmarkConfigdataclass: sweep parameters (core counts, GPU replicas, duration)run_benchmark_sweep(system_path, config) -> BenchmarkResultSweep dimensions
Reuse across HT batch
If all systems in a campaign have similar atom counts (within ±20%), the benchmark runs once on a representative system. The discovered settings then apply to all production runs in the batch.
Acceptance criteria