Conversation
1967d7d to
d2b1e14
Compare
d2b1e14 to
bff8585
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2216 +/- ##
==========================================
- Coverage 85.06% 77.30% -7.76%
==========================================
Files 46 46
Lines 8521 8526 +5
Branches 1988 1989 +1
==========================================
- Hits 7248 6591 -657
- Misses 805 1415 +610
- Partials 468 520 +52 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Test successful on BSC MN5! 🎉
For posterity, if you run cwltool on an HPC using multiple nodes, but they are not sharing the same temp directory, you'll get something similar to: slurmstepd: error: couldn't chdir to `/scratch/tmp/37435333/l0q96kcm': No such file or directory: going to /tmp instead
slurmstepd: error: couldn't chdir to `/scratch/tmp/37435333/l0q96kcm': No such file or directory: going to /tmp instead
WARNING: skipping mount of /scratch/tmp/37435333/1db6cool: stat /scratch/tmp/37435333/1db6cool: no such file or directory
WARNING: skipping mount of /scratch/tmp/37435333/e9hzplsn/a.out: stat /scratch/tmp/37435333/e9hzplsn/a.out: no such file or directory
WARNING: skipping mount of /scratch/tmp/37435333/1db6cool: stat /scratch/tmp/37435333/1db6cool: no such file or directory
WARNING: skipping mount of /scratch/tmp/37435333/e9hzplsn/a.out: stat /scratch/tmp/37435333/e9hzplsn/a.out: no such file or directory
FATAL: container creation failed: mount /scratch/tmp/37435333/1db6cool->/dev/shm error: while mounting /scratch/tmp/37435333/1db6cool: mount source /scratch/tmp/37435333/1db6cool doesn't exist
FATAL: container creation failed: mount /scratch/tmp/37435333/1db6cool->/dev/shm error: while mounting /scratch/tmp/37435333/1db6cool: mount source /scratch/tmp/37435333/1db6cool doesn't existAnd if you run without network access: Abort(1017768207) on node 2:
Fatal error in internal_Init:
Other MPI error, error stack:
internal_Init(70)....................:
MPI_Init(argc=0x7ffd98ef954c, argv=0x7ffd98ef9540) failed MPII_Init_thread(282)................:
MPIR_init_comm_world(34).............:
MPIR_Comm_commit(794)................:
MPIR_Comm_commit_internal(579).......:
MPID_Comm_commit_pre_hook(151).......:
MPIDI_world_pre_init(669)............:
MPIDI_OFI_init_world(805)............:
MPIDI_OFI_addr_exchange_root_ctx(143):
MPIDU_bc_allgather(112)..............:
MPIR_Allgatherv_intra_brucks(80).....:
MPIC_Sendrecv(259)...................:
MPID_Isend(60).......................:
MPIDI_isend(32)......................:
MPIDI_NM_mpi_isend(780)..............:
MPIDI_OFI_send_fallback(483).........:
OFI call tsendv failed (default nic=lo:
No such file or directory) Abort(1017768207) on node 0:
Fatal error in internal_Init:
Other MPI error, error stack:
internal_Init(70)....................:
MPI_Init(argc=0x7ffe4702f11c, argv=0x7ffe4702f110) failed MPII_Init_thread(282)................:
MPIR_init_comm_world(34).............:
MPIR_Comm_commit(794)................:
MPIR_Comm_commit_internal(579).......:
MPID_Comm_commit_pre_hook(151).......:
MPIDI_world_pre_init(669)............:
MPIDI_OFI_init_world(805)............:
MPIDI_OFI_addr_exchange_root_ctx(143):
MPIDU_bc_allgather(112)..............:
MPIR_Allgatherv_intra_brucks(80).....:
MPIC_Sendrecv(259)...................:
MPID_Isend(60).......................:
MPIDI_isend(32)......................:
MPIDI_NM_mpi_isend(780)..............:
MPIDI_OFI_send_fallback(483).........:
OFI call tsendv failed (default nic=lo:
No such file or directory)Using a temporary directory pointing to the parallel file system (GPFS in the case of MN5) and adding the |
|
Test successful on CESGA FT3! 🎉
I used a folder from Lustre parallel filesystem, |
| if not mpi_req or not is_req: | ||
| runtime.append("--cleanenv") |
There was a problem hiding this comment.
Thank you for this PR! Lets always use --cleanenv, to cope with the environment variables sets I recommend creating a shell script to be executed instead of the main command line.
The environment variables set can be derived from either parsing env_pass and env_pass_regex from the --mpi-config-file OR pass in a list of the existing environment variable names and only pass those newly set by the MPI runner.
| else: | ||
| self.append_volume( | ||
| runtime, | ||
| runtime_context.create_tmpdir(), | ||
| "/dev/shm", | ||
| writable=True, |
There was a problem hiding this comment.
Neat trick; is this universal for all known MPI implementations? Or do some not use the shared memory device for communication between MPI processes?
Related to #2208 (not sure if this is enough to close that issue)
@mr-c, I started some tests yesterday, and completed them today after work. Once I managed to run the workflow on my machine reproducing the error we have on HPCs, debugging it showed that the changes appeared simple to be made.
I managed to keep the
--contain, but in order to do that I had to include/dev/shm. I couldn't find any files on/tmpor on the temporary directory created by cwltool. And actually ,cwltoolworked without having to remove the temporary directory created by cwltool (i.e. I was wrong, and the process manager wasn't using the temp dir to sync with processes, but apparently shared memory).I kept the
--cleanenvfor when the job isn't using the cwltoolMPIRequirement. When that requirement is a requirement (not a hint) and is used in the job, then I do not set the--cleanenv.I tried keeping it, and even tested the MPI config file using the variables from the MPICH documentation.
(The MPI config file was needed on my laptop as I have OpenMPI & MPICH, so I need to override the runner as mpirun defaults to OpenMPI on my machine, and the container I'm using has MPICH -- I also confirmed with the debugger that the MPI config file values were added)
The issue with the environment variables, @mr-c, is that when
cwltoolruns, I don't have any environment variables yet.When we create the CWL job, the runtime config, and even when we pass through the env vars from
mpi.py, there are no MPI variables loaded yet.The Edinburgh paper discusses how to use some variables, but I believe those are only the variable already available. For instance, when you have a Slurm allocation and you have
SLURM_JOB_ID,SLURM_MEM_PER_CPU, etc.When the CWL job is executed, already with the runtime (command + args) created, and then Python launches
mpirun.mpich, that's when the MPICH (or Hydra) program will create the environment variables.I tried setting these variables in the command line of singularity without success. This is what I did:
Got the list of variables that appeared in the diff. There were 8 extra variables:
GFORTRAN_UNBUFFERED_PRECONNECTED=yHYDI_CONTROL_FD=7MPIR_CVAR_CH3_INTERFACE_HOSTNAME=ranmaMPI_LOCALNRANKS=1MPI_LOCALRANKID=0PMI_FD=9PMI_RANK=0PMI_SIZE=1I added them to the Singularity command with just their names to see if it would pass the host env vars to the container (as it wouldn't make sense trying to define random file descriptor, ranks, etc.).
The resulting command has the
--cleanenvand all the environment variables from the diff:INFO [job run] /tmp/hib1sp8z$ mpirun.mpich \ -n \ 2 \ singularity \ --quiet \ run \ --ipc \ --contain \ --cleanenv \ --env \ PMI_RANK \ --env \ PMI_FD \ --env \ PMI_SIZE \ --env \ MPI_LOCALNRANKS \ --env \ MPI_LOCALRANKID \ --env \ MPIR_CVAR_CH3_INTERFACE_HOSTNAME \ --env \ HYDI_CONTROL_FD \ --env \ GFORTRAN_UNBUFFERED_PRECONNECTED \ --mount=type=bind,source=/tmp/g7lo1zku,target=/dev/shm \ --no-eval \ --userns \ --home \ /tmp/hib1sp8z:/PCIKJh \ --mount=type=bind,source=/tmp/ub89clkk,target=/tmp \ --mount=type=bind,source=/tmp/h8j2gy1j/a.out,target=/var/lib/cwl/stgfb9c3552-0c10-424c-82ba-284742e69287/a.out,readonly \ --pwd \ /PCIKJh \ --net \ --network \ none \ /home/kinow/Development/python/workspace/cwl-mpi/examples/mpich-sr/cwltool/mfisherman_mpich:4.3.2.sif \ /var/lib/cwl/stgfb9c3552-0c10-424c-82ba-284742e69287/a.out \ 0 \ 1 > /tmp/hib1sp8z/sr.out 2> /tmp/hib1sp8z/sr.err WARNING [job run] exited with status: 1 WARNING [job run] completed permanentFail WARNING [step run] completed permanentFailThat failed again. And running a smaller test looks like only key=value is valid.
But since I don't know the file descriptor, rank, etc., I can't see a way to define PMI_RANK, PMI_SIZE, PMI_FD, and other variables for MPICH.
@mr-c I did not write tests as I wanted to check with you first if this is going in the right direction. I'll run a couple of tests on MN5 and CESGA FT3. My main concern is
--containand InfiniBand. Even without--cleanenv, I think--containmay force the container to use ethernet instead of InfiniBand.