[mpich-discuss] mpich2-1.2.1p1 runs for a while and failed

Koh Voon Li kohvoonli at gmail.com
Thu Mar 3 21:40:39 CST 2011


Hi,
Is there anyway to know that I am running with nemesis channel instead of
sock channel?
I am launching my job via config file which is something look like this as
below.

channel
nemesis
exe \\FDS2-PC\Project\Paradigm\V20\fds5_mpi_win_64.exe Paradigmv4-20.fds
dir \\FDS2-PC\Project\Paradigm\V20\
hosts
FDS2-PC 6
WIN7-PC 6

Thanks.

Regards,
Koh

On Wed, Mar 2, 2011 at 12:06 AM, Jayesh Krishna <jayesh at mcs.anl.gov> wrote:

> Hi,
>  With MPICH2 1.2.1p1 you should be able to use the Nemesis channel using
> the "-channel" option of mpiexec (mpiexec -n 2 -channel nemesis
> mympipgm.exe).
>
> Regards,
> Jayesh
>
> ----- Original Message -----
> From: "Koh Voon Li" <kohvoonli at gmail.com>
> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Tuesday, March 1, 2011 6:39:25 AM
> Subject: Re: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
> Hi,
>
>
> Thanks for the reply. Are there any old version which use Nemesis channel?
> Thanks.
>
>
> Regards,
> Koh
>
>
> On Mon, Feb 28, 2011 at 9:34 PM, Jayesh Krishna < jayesh at mcs.anl.gov >
> wrote:
>
>
> Hi,
> Try the latest stable release of MPICH2 (
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads).
> It looks like you are explicitly using the sock channel (with "-channel
> sock" option of mpiexec) for running your MPI job. Is there any reason why
> you want to use the sock channel instead of the default nemesis channel (If
> you don't use the "-channel" option mpiexec should pick the nemesis channel)
> ? Sock channel is old and we recommend all users to use the Nemesis channel
> instead.
>
> Regards,
> Jayesh
>
>
>
> ----- Original Message -----
> From: "Koh Voon Li" < kohvoonli at gmail.com >
> To: mpich-discuss at mcs.anl.gov
> Sent: Sunday, February 27, 2011 8:47:39 PM
> Subject: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
>
>
> Hi, I am running 2 PC both with Window 7 home premium edition for parallel
> calculation by using MPICH2 version mpich2-1.2.1p1, it run for 3D FDS
> calculation which runs for a while and then fails with a number of MPI error
> messages as below.
>
>
>
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC70738,
> rbuf=000000003FC706F8, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Bcast(1031)..........................:
> MPIR_Bcast_binomial(157)..................:
> MPIC_Recv(83).............................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC707B8,
> rbuf=000000003FC70778, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Allreduce(289).......................:
> MPIC_Sendrecv(164)........................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
>
>
> I tried to ping test on each PC and its failed. It seem like I got no
> response from the network adapter.
> I disabled the network adapter and enabled it then everything seem to be
> normal again.
> Both PC are connected by using a crossover cable.
> Thanks.
> Regards,
> Koh
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110304/27d8e69c/attachment-0001.htm>


More information about the mpich-discuss mailing list