[mpich-discuss] mpich2-1.2.1p1 runs for a while and failed

Koh Voon Li kohvoonli at gmail.com
Tue Mar 1 06:39:25 CST 2011


Hi,

Thanks for the reply. Are there any old version which use Nemesis channel?
Thanks.

Regards,
Koh

On Mon, Feb 28, 2011 at 9:34 PM, Jayesh Krishna <jayesh at mcs.anl.gov> wrote:

> Hi,
>  Try the latest stable release of MPICH2 (
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads
> ).
>  It looks like you are explicitly using the sock channel (with "-channel
> sock" option of mpiexec) for running your MPI job. Is there any reason why
> you want to use the sock channel instead of the default nemesis channel (If
> you don't use the "-channel" option mpiexec should pick the nemesis channel)
> ? Sock channel is old and we recommend all users to use the Nemesis channel
> instead.
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Koh Voon Li" <kohvoonli at gmail.com>
> To: mpich-discuss at mcs.anl.gov
> Sent: Sunday, February 27, 2011 8:47:39 PM
> Subject: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
>
>
> Hi, I am running 2 PC both with Window 7 home premium edition for parallel
> calculation by using MPICH2 version mpich2-1.2.1p1, it run for 3D FDS
> calculation which runs for a while and then fails with a number of MPI error
> messages as below.
>
>
>
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC70738,
> rbuf=000000003FC706F8, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Bcast(1031)..........................:
> MPIR_Bcast_binomial(157)..................:
> MPIC_Recv(83).............................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC707B8,
> rbuf=000000003FC70778, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Allreduce(289).......................:
> MPIC_Sendrecv(164)........................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
>
>
> I tried to ping test on each PC and its failed. It seem like I got no
> response from the network adapter.
> I disabled the network adapter and enabled it then everything seem to be
> normal again.
> Both PC are connected by using a crossover cable.
> Thanks.
> Regards,
> Koh
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110301/816b6bd7/attachment.htm>


More information about the mpich-discuss mailing list