MPI_Send and MPI_Recv are not sending concurrent data from replica_sender -> compute_receiver and replica_sender -> replica_receiver. So in case of compute_sender failure err_handler is not getting invoked and some other rank do the MPI_Comm_shrink from which replica_sender is automatically neglected.
This isn't the way it should work.
MPI_SendandMPI_Recvare not sending concurrent data fromreplica_sender -> compute_receiverandreplica_sender -> replica_receiver. So in case ofcompute_senderfailureerr_handleris not getting invoked and some other rank do theMPI_Comm_shrinkfrom whichreplica_senderis automatically neglected.This isn't the way it should work.