Summary
Any DLIO workload that uses the PyTorch DataLoader with reader.read_threads > 0 (e.g. retinanet_b200) aborts at the start of epoch 1 with:
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort)
orte_ess_init failed
--> Returned value No permission (-17) instead of ORTE_SUCCESS
[host--...:NNNN] Local abort before MPI_INIT completed completed successfully ...
Each main MPI rank crashes once per spawned DataLoader worker (e.g. 8 workers × 8 ranks = 64 aborts), then the run dies.
Reproduction
Cluster: 2 nodes, OpenMPI via mpirun under SLURM. Fresh uv sync install of mlcommons/storage main (3.0.3) and bundled dlio_benchmark.
mlpstorage training run \
--model=retinanet --exec-type=mpi -g b200 -na 8 -cm 8 \
--num-client-hosts=2 --hosts host1 host2 \
--data-dir=/data/mlps_uv/retinanet \
--results-dir=/data/mlperf_storage_results \
--open --allow-invalid-params --allow-run-as-root --oversubscribe
Root cause
PyTorch DataLoader's default multiprocessing_context is spawn (since dlio commit 39449df). Spawned worker subprocesses re-import every parent module, including dlio_benchmark/utils/statscounter.py which executes from mpi4py import MPI at module top level. With mpi4py.rc.initialize == True (the default), this implicitly invokes MPI_Init_thread() in the child — but the child was not launched by mpirun, has no PMIX/ORTE environment, and the init aborts.
Interestingly, the bug is already half-acknowledged in dlio_benchmark/utils/utility.py:176-178:
# MPI cannot be initialized automatically, or read_thread spawn/forkserver
# child processes will abort trying to open a non-existant PMI_fd file.
import mpi4py
…but the mpi4py.rc.initialize/finalize flags that actually disable auto-init were never set.
Proposed fix
In dlio_benchmark/utils/utility.py, set the rc flags before the first from mpi4py import MPI anywhere in the package:
import mpi4py
mpi4py.rc.initialize = False
mpi4py.rc.finalize = False
The main MPI ranks still get initialized via the existing explicit path in DLIOMPI.initialize():
from mpi4py import MPI
if not MPI.Is_initialized():
MPI.Init()
…and finalised via DLIOMPI.finalize() (MPI.Finalize() at line 351). The atexit auto-finalize is redundant for the main ranks and actively harmful in spawn workers.
PR for dlio_benchmark proposed on a fork: wolfgang-desalvador/DLIO_local_changes branch wolfgang/fix-MPI-initialization-fork (single commit, +2 lines).
Environment
- mlpstorage 3.0.3 (
mlcommons/storage@44eee09)
- dlio_benchmark (bundled via
uv sync)
- OpenMPI, SLURM, 2 × 192-core nodes
- Python 3.12
Summary
Any DLIO workload that uses the PyTorch DataLoader with
reader.read_threads > 0(e.g.retinanet_b200) aborts at the start of epoch 1 with:Each main MPI rank crashes once per spawned DataLoader worker (e.g. 8 workers × 8 ranks = 64 aborts), then the run dies.
Reproduction
Cluster: 2 nodes, OpenMPI via mpirun under SLURM. Fresh
uv syncinstall ofmlcommons/storagemain(3.0.3) and bundleddlio_benchmark.Root cause
PyTorch DataLoader's default
multiprocessing_contextisspawn(since dlio commit39449df). Spawned worker subprocesses re-import every parent module, includingdlio_benchmark/utils/statscounter.pywhich executesfrom mpi4py import MPIat module top level. Withmpi4py.rc.initialize == True(the default), this implicitly invokesMPI_Init_thread()in the child — but the child was not launched by mpirun, has no PMIX/ORTE environment, and the init aborts.Interestingly, the bug is already half-acknowledged in
dlio_benchmark/utils/utility.py:176-178:…but the
mpi4py.rc.initialize/finalizeflags that actually disable auto-init were never set.Proposed fix
In
dlio_benchmark/utils/utility.py, set the rc flags before the firstfrom mpi4py import MPIanywhere in the package:The main MPI ranks still get initialized via the existing explicit path in
DLIOMPI.initialize():…and finalised via
DLIOMPI.finalize()(MPI.Finalize()at line 351). The atexit auto-finalize is redundant for the main ranks and actively harmful in spawn workers.PR for
dlio_benchmarkproposed on a fork:wolfgang-desalvador/DLIO_local_changesbranchwolfgang/fix-MPI-initialization-fork(single commit, +2 lines).Environment
mlcommons/storage@44eee09)uv sync)