[mpich-discuss] mpich2-1.2.1p1 runs for a while and failed

Koh Voon Li kohvoonli at gmail.com
Mon Mar 7 22:03:45 CST 2011


Hi,
I tried to run as what you instructed, but the running processes seem like
not performed. Here is the output from the command prompt.

D:\>mpiexec -n 12 -channel nemesis -machinefile mf.txt -map
Z:\\FDS2-PC\Project\
Paradigm -wdir Z:\ZoneAB fds5_mpi_win_64.exe V1.fds
Process   1 of  11 is running on FDS2-PC
Process   5 of  11 is running on FDS2-PC
Process   4 of  11 is running on FDS2-PC
Process   0 of  11 is running on FDS2-PC
Process   2 of  11 is running on FDS2-PC
Process   3 of  11 is running on FDS2-PC
Process  10 of  11 is running on WIN7-PC
Process   9 of  11 is running on WIN7-PC
Process  11 of  11 is running on WIN7-PC
Process   7 of  11 is running on WIN7-PC
Process   8 of  11 is running on WIN7-PC
Process   6 of  11 is running on WIN7-PC

In the task manager, it shows that there are 6 running processes with 100%
of cpu usage. But somehow the physical memory are low (4mb) where usually 1
process would take up to at least 1 GB memory. It seem that although the 6
processes is running on my CPU, but it did not output anything.

When I tried with non nemesis channel, it works.
Thank you.

Regards,
Koh

On Sat, Mar 5, 2011 at 12:29 AM, Jayesh Krishna <jayesh at mcs.anl.gov> wrote:

> Hi,
>  Is there a reason why you want to use a config file (We do not regularly
> test the various config file options)? You could just run your job as,
>
>    mpiexec -n 12 -channel nemesis -machinefile mf.txt -map
> J:\\FDS2-PC\Project\Paradigm -wdir J:\V20 fds5_mpi_win_64.exe
> Paradigmv4-20.fds
>
>  where machine file, mf.txt, contains
>
>    # Machine file for FDS5
>    # Hosts and max procs to run on each host listed below
>     FDS2-PC:6
>    WIN7-PC:6
>     # End of the machine file
>
>  Let us know if it works for you.
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Koh Voon Li" <kohvoonli at gmail.com>
> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Thursday, March 3, 2011 9:40:39 PM
> Subject: Re: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
> Hi,
> Is there anyway to know that I am running with nemesis channel instead of
> sock channel?
> I am launching my job via config file which is something look like this as
> below.
>
>
>
> channel
> nemesis
> exe \\FDS2-PC\Project\Paradigm\V20\fds5_mpi_win_64.exe Paradigmv4-20.fds
> dir \\FDS2-PC\Project\Paradigm\V20\
> hosts
> FDS2-PC 6
> WIN7-PC 6
>
>
> Thanks.
>
>
> Regards,
> Koh
>
>
> On Wed, Mar 2, 2011 at 12:06 AM, Jayesh Krishna < jayesh at mcs.anl.gov >
> wrote:
>
>
> Hi,
> With MPICH2 1.2.1p1 you should be able to use the Nemesis channel using the
> "-channel" option of mpiexec (mpiexec -n 2 -channel nemesis mympipgm.exe).
>
>
> Regards,
> Jayesh
>
> ----- Original Message -----
> From: "Koh Voon Li" < kohvoonli at gmail.com >
>
>
>
> To: "Jayesh Krishna" < jayesh at mcs.anl.gov >
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Tuesday, March 1, 2011 6:39:25 AM
> Subject: Re: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
> Hi,
>
>
> Thanks for the reply. Are there any old version which use Nemesis channel?
> Thanks.
>
>
> Regards,
> Koh
>
>
> On Mon, Feb 28, 2011 at 9:34 PM, Jayesh Krishna < jayesh at mcs.anl.gov >
> wrote:
>
>
> Hi,
> Try the latest stable release of MPICH2 (
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads).
> It looks like you are explicitly using the sock channel (with "-channel
> sock" option of mpiexec) for running your MPI job. Is there any reason why
> you want to use the sock channel instead of the default nemesis channel (If
> you don't use the "-channel" option mpiexec should pick the nemesis channel)
> ? Sock channel is old and we recommend all users to use the Nemesis channel
> instead.
>
> Regards,
> Jayesh
>
>
>
> ----- Original Message -----
> From: "Koh Voon Li" < kohvoonli at gmail.com >
> To: mpich-discuss at mcs.anl.gov
> Sent: Sunday, February 27, 2011 8:47:39 PM
> Subject: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
>
>
> Hi, I am running 2 PC both with Window 7 home premium edition for parallel
> calculation by using MPICH2 version mpich2-1.2.1p1, it run for 3D FDS
> calculation which runs for a while and then fails with a number of MPI error
> messages as below.
>
>
>
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC70738,
> rbuf=000000003FC706F8, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Bcast(1031)..........................:
> MPIR_Bcast_binomial(157)..................:
> MPIC_Recv(83).............................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC707B8,
> rbuf=000000003FC70778, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Allreduce(289).......................:
> MPIC_Sendrecv(164)........................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
>
>
> I tried to ping test on each PC and its failed. It seem like I got no
> response from the network adapter.
> I disabled the network adapter and enabled it then everything seem to be
> normal again.
> Both PC are connected by using a crossover cable.
> Thanks.
> Regards,
> Koh
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110308/e7df6526/attachment-0001.htm>


More information about the mpich-discuss mailing list