[mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
Jayesh Krishna
jayesh at mcs.anl.gov
Mon Feb 28 07:34:35 CST 2011
Hi,
Try the latest stable release of MPICH2 (http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads).
It looks like you are explicitly using the sock channel (with "-channel sock" option of mpiexec) for running your MPI job. Is there any reason why you want to use the sock channel instead of the default nemesis channel (If you don't use the "-channel" option mpiexec should pick the nemesis channel) ? Sock channel is old and we recommend all users to use the Nemesis channel instead.
Regards,
Jayesh
----- Original Message -----
From: "Koh Voon Li" <kohvoonli at gmail.com>
To: mpich-discuss at mcs.anl.gov
Sent: Sunday, February 27, 2011 8:47:39 PM
Subject: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
Hi, I am running 2 PC both with Window 7 home premium edition for parallel calculation by using MPICH2 version mpich2-1.2.1p1, it run for 3D FDS calculation which runs for a while and then fails with a number of MPI error messages as below.
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)........................: MPI_Allreduce(sbuf=000000003FC70738,
rbuf=000000003FC706F8, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD) failed
MPIR_Bcast(1031)..........................:
MPIR_Bcast_binomial(157)..................:
MPIC_Recv(83).............................:
MPIC_Wait(513)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an
event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Sock_wait(2606).....................: The semaphore timeout period has exp
ired. (errno 121)
Fatal error in MPI_Allreduce: Other MPI error, error stack:
MPI_Allreduce(773)........................: MPI_Allreduce(sbuf=000000003FC707B8,
rbuf=000000003FC70778, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD) failed
MPIR_Allreduce(289).......................:
MPIC_Sendrecv(164)........................:
MPIC_Wait(513)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an
event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Sock_wait(2606).....................: The semaphore timeout period has exp
ired. (errno 121)
I tried to ping test on each PC and its failed. It seem like I got no response from the network adapter.
I disabled the network adapter and enabled it then everything seem to be normal again.
Both PC are connected by using a crossover cable.
Thanks.
Regards,
Koh
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list