[mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
Koh Voon Li
kohvoonli at gmail.com
Mon Mar 7 22:03:45 CST 2011
Hi,
I tried to run as what you instructed, but the running processes seem like
not performed. Here is the output from the command prompt.
D:\>mpiexec -n 12 -channel nemesis -machinefile mf.txt -map
Z:\\FDS2-PC\Project\
Paradigm -wdir Z:\ZoneAB fds5_mpi_win_64.exe V1.fds
Process 1 of 11 is running on FDS2-PC
Process 5 of 11 is running on FDS2-PC
Process 4 of 11 is running on FDS2-PC
Process 0 of 11 is running on FDS2-PC
Process 2 of 11 is running on FDS2-PC
Process 3 of 11 is running on FDS2-PC
Process 10 of 11 is running on WIN7-PC
Process 9 of 11 is running on WIN7-PC
Process 11 of 11 is running on WIN7-PC
Process 7 of 11 is running on WIN7-PC
Process 8 of 11 is running on WIN7-PC
Process 6 of 11 is running on WIN7-PC
In the task manager, it shows that there are 6 running processes with 100%
of cpu usage. But somehow the physical memory are low (4mb) where usually 1
process would take up to at least 1 GB memory. It seem that although the 6
processes is running on my CPU, but it did not output anything.
When I tried with non nemesis channel, it works.
Thank you.
Regards,
Koh
On Sat, Mar 5, 2011 at 12:29 AM, Jayesh Krishna <jayesh at mcs.anl.gov> wrote:
> Hi,
> Is there a reason why you want to use a config file (We do not regularly
> test the various config file options)? You could just run your job as,
>
> mpiexec -n 12 -channel nemesis -machinefile mf.txt -map
> J:\\FDS2-PC\Project\Paradigm -wdir J:\V20 fds5_mpi_win_64.exe
> Paradigmv4-20.fds
>
> where machine file, mf.txt, contains
>
> # Machine file for FDS5
> # Hosts and max procs to run on each host listed below
> FDS2-PC:6
> WIN7-PC:6
> # End of the machine file
>
> Let us know if it works for you.
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Koh Voon Li" <kohvoonli at gmail.com>
> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Thursday, March 3, 2011 9:40:39 PM
> Subject: Re: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
> Hi,
> Is there anyway to know that I am running with nemesis channel instead of
> sock channel?
> I am launching my job via config file which is something look like this as
> below.
>
>
>
> channel
> nemesis
> exe \\FDS2-PC\Project\Paradigm\V20\fds5_mpi_win_64.exe Paradigmv4-20.fds
> dir \\FDS2-PC\Project\Paradigm\V20\
> hosts
> FDS2-PC 6
> WIN7-PC 6
>
>
> Thanks.
>
>
> Regards,
> Koh
>
>
> On Wed, Mar 2, 2011 at 12:06 AM, Jayesh Krishna < jayesh at mcs.anl.gov >
> wrote:
>
>
> Hi,
> With MPICH2 1.2.1p1 you should be able to use the Nemesis channel using the
> "-channel" option of mpiexec (mpiexec -n 2 -channel nemesis mympipgm.exe).
>
>
> Regards,
> Jayesh
>
> ----- Original Message -----
> From: "Koh Voon Li" < kohvoonli at gmail.com >
>
>
>
> To: "Jayesh Krishna" < jayesh at mcs.anl.gov >
> Cc: mpich-discuss at mcs.anl.gov
> Sent: Tuesday, March 1, 2011 6:39:25 AM
> Subject: Re: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
> Hi,
>
>
> Thanks for the reply. Are there any old version which use Nemesis channel?
> Thanks.
>
>
> Regards,
> Koh
>
>
> On Mon, Feb 28, 2011 at 9:34 PM, Jayesh Krishna < jayesh at mcs.anl.gov >
> wrote:
>
>
> Hi,
> Try the latest stable release of MPICH2 (
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads).
> It looks like you are explicitly using the sock channel (with "-channel
> sock" option of mpiexec) for running your MPI job. Is there any reason why
> you want to use the sock channel instead of the default nemesis channel (If
> you don't use the "-channel" option mpiexec should pick the nemesis channel)
> ? Sock channel is old and we recommend all users to use the Nemesis channel
> instead.
>
> Regards,
> Jayesh
>
>
>
> ----- Original Message -----
> From: "Koh Voon Li" < kohvoonli at gmail.com >
> To: mpich-discuss at mcs.anl.gov
> Sent: Sunday, February 27, 2011 8:47:39 PM
> Subject: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed
>
>
>
> Hi, I am running 2 PC both with Window 7 home premium edition for parallel
> calculation by using MPICH2 version mpich2-1.2.1p1, it run for 3D FDS
> calculation which runs for a while and then fails with a number of MPI error
> messages as below.
>
>
>
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC70738,
> rbuf=000000003FC706F8, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Bcast(1031)..........................:
> MPIR_Bcast_binomial(157)..................:
> MPIC_Recv(83).............................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
> Fatal error in MPI_Allreduce: Other MPI error, error stack:
> MPI_Allreduce(773)........................:
> MPI_Allreduce(sbuf=000000003FC707B8,
> rbuf=000000003FC70778, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD)
> failed
> MPIR_Allreduce(289).......................:
> MPIC_Sendrecv(164)........................:
> MPIC_Wait(513)............................:
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an
> event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Sock_wait(2606).....................: The semaphore timeout period
> has exp
> ired. (errno 121)
>
>
> I tried to ping test on each PC and its failed. It seem like I got no
> response from the network adapter.
> I disabled the network adapter and enabled it then everything seem to be
> normal again.
> Both PC are connected by using a crossover cable.
> Thanks.
> Regards,
> Koh
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110308/e7df6526/attachment-0001.htm>
More information about the mpich-discuss
mailing list