[Darshan-users] COMM Split Error
Hassan Asghar
haxxanasghar at gmail.com
Wed Jun 16 03:25:35 CDT 2021
I am facing the following issue: Please help
[haxxanasghar at gpuserver2 ~]$ *mpiexec -n 16 -f machinefile -env
LD_PRELOAD=/home/haxxanasghar/darshan/darshan-runtime/lib/libdarshan.so
./MADbench2 640 80 4 8 8 4 4*
MADbench 2.0 IO-mode
no_pe = 16 no_pix = 640 no_bin = 80 no_gang = 4 sblocksize = 8
fblocksize = 8 r_mod = 4 w_mod = 4
IOMETHOD = POSIX IOMODE = SYNC FILETYPE = UNIQUE REMAP = CUSTOM
Fatal error in PMPI_Comm_split: A process has failed, error stack:
PMPI_Comm_split(474)......: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=2,
new_comm=0x6073e0) failed
PMPI_Comm_split(456)......:
MPIR_Comm_split_impl(143).:
MPIR_Allgather_impl(807)..:
MPIR_Allgather(766).......:
MPIR_Allgather_intra(181).:
dequeue_and_set_error(888): Communication error with rank 3
MPIR_Allgather_intra(181).:
dequeue_and_set_error(888): Communication error with rank 6
Fatal error in PMPI_Comm_split: A process has failed, error stack:
PMPI_Comm_split(474)......: MPI_Comm_split(MPI_COMM_WORLD, color=3, key=1,
new_comm=0x6073e0) failed
PMPI_Comm_split(456)......:
MPIR_Comm_split_impl(143).:
MPIR_Allgather_impl(807)..:
MPIR_Allgather(766).......:
MPIR_Allgather_intra(181).:
dequeue_and_set_error(888): Communication error with rank 12
MPIR_Allgather_intra(181).:
dequeue_and_set_error(888): Communication error with rank 15
MPIR_Allgather_intra(181).:
dequeue_and_set_error(888): Communication error with rank 9
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[mpiexec at gpuserver2] HYDU_sock_read (./utils/sock/sock.c:243): read error
(Bad file descriptor)
[mpiexec at gpuserver2] control_cb (./pm/pmiserv/pmiserv_cb.c:201): unable to
read command from proxy
[mpiexec at gpuserver2] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at gpuserver2] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec at gpuserver2] main (./ui/mpich/mpiexec.c:331): process manager error
waiting for completion
[haxxanasghar at gpuserver2 ~]$
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210616/5b50758a/attachment.html>
More information about the Darshan-users
mailing list