-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
There are some hacks that need to be cleaned up, because the current wheel torch==2.10.0+rocm710 comes with libraries linked to cray-mpich/9.0.1 which causes segfaults. We replace the paths with paths to cray-mpich/9.1.0.
cleanup1 #17 (comment)
cleanup2 #17 (comment)
The hacks replace shared library links for libmpi_gnu_112.so.12 in .venvs/scaffoldvenv-tuo/lib/python3.11/site-packages/torch/lib/ to libmpi_gnu.so.12, which is the name for 9.1. Then in the jobscript we can LD_PRELOAD /opt/cray/pe/mpich/9.1.0/ofi/gnu/11.2/lib/libmpi_gnu.so.12 to use the correct libmpi.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels