Hi,<div>I tried to run as what you instructed, but the running processes seem like not performed. Here is the output from the command prompt.</div><div><div><br></div><div>D:\>mpiexec -n 12 -channel nemesis -machinefile mf.txt -map Z:\\FDS2-PC\Project\</div>
<div>Paradigm -wdir Z:\ZoneAB fds5_mpi_win_64.exe V1.fds</div><div>Process 1 of 11 is running on FDS2-PC</div><div>Process 5 of 11 is running on FDS2-PC</div><div>Process 4 of 11 is running on FDS2-PC</div><div>Process 0 of 11 is running on FDS2-PC</div>
<div>Process 2 of 11 is running on FDS2-PC</div><div>Process 3 of 11 is running on FDS2-PC</div><div>Process 10 of 11 is running on WIN7-PC</div><div>Process 9 of 11 is running on WIN7-PC</div><div>Process 11 of 11 is running on WIN7-PC</div>
<div>Process 7 of 11 is running on WIN7-PC</div><div>Process 8 of 11 is running on WIN7-PC</div><div>Process 6 of 11 is running on WIN7-PC</div></div><div><br></div><div>In the task manager, it shows that there are 6 running processes with 100% of cpu usage. But somehow the physical memory are low (4mb) where usually 1 process would take up to at least 1 GB memory. It seem that although the 6 processes is running on my CPU, but it did not output anything.</div>
<div><br></div><div>When I tried with non nemesis channel, it works.</div><div>Thank you.</div><div><br></div><div>Regards,</div><div>Koh</div><div><br><div class="gmail_quote">On Sat, Mar 5, 2011 at 12:29 AM, Jayesh Krishna <span dir="ltr"><<a href="mailto:jayesh@mcs.anl.gov">jayesh@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi,<br>
Is there a reason why you want to use a config file (We do not regularly test the various config file options)? You could just run your job as,<br>
<br>
mpiexec -n 12 -channel nemesis -machinefile mf.txt -map J:\\FDS2-PC\Project\Paradigm -wdir J:\V20 fds5_mpi_win_64.exe Paradigmv4-20.fds<br>
<br>
where machine file, mf.txt, contains<br>
<br>
# Machine file for FDS5<br>
# Hosts and max procs to run on each host listed below<br>
<div class="im"> FDS2-PC:6<br>
WIN7-PC:6<br>
</div> # End of the machine file<br>
<br>
Let us know if it works for you.<br>
<div class="im"><br>
Regards,<br>
Jayesh<br>
----- Original Message -----<br>
From: "Koh Voon Li" <<a href="mailto:kohvoonli@gmail.com">kohvoonli@gmail.com</a>><br>
To: "Jayesh Krishna" <<a href="mailto:jayesh@mcs.anl.gov">jayesh@mcs.anl.gov</a>><br>
Cc: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
</div><div><div></div><div class="h5">Sent: Thursday, March 3, 2011 9:40:39 PM<br>
Subject: Re: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed<br>
<br>
Hi,<br>
Is there anyway to know that I am running with nemesis channel instead of sock channel?<br>
I am launching my job via config file which is something look like this as below.<br>
<br>
<br>
<br>
channel<br>
nemesis<br>
exe \\FDS2-PC\Project\Paradigm\V20\fds5_mpi_win_64.exe Paradigmv4-20.fds<br>
dir \\FDS2-PC\Project\Paradigm\V20\<br>
hosts<br>
FDS2-PC 6<br>
WIN7-PC 6<br>
<br>
<br>
Thanks.<br>
<br>
<br>
Regards,<br>
Koh<br>
<br>
<br>
On Wed, Mar 2, 2011 at 12:06 AM, Jayesh Krishna < <a href="mailto:jayesh@mcs.anl.gov">jayesh@mcs.anl.gov</a> > wrote:<br>
<br>
<br>
Hi,<br>
With MPICH2 1.2.1p1 you should be able to use the Nemesis channel using the "-channel" option of mpiexec (mpiexec -n 2 -channel nemesis mympipgm.exe).<br>
<br>
<br>
Regards,<br>
Jayesh<br>
<br>
----- Original Message -----<br>
From: "Koh Voon Li" < <a href="mailto:kohvoonli@gmail.com">kohvoonli@gmail.com</a> ><br>
<br>
<br>
<br>
To: "Jayesh Krishna" < <a href="mailto:jayesh@mcs.anl.gov">jayesh@mcs.anl.gov</a> ><br>
Cc: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
Sent: Tuesday, March 1, 2011 6:39:25 AM<br>
Subject: Re: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed<br>
<br>
Hi,<br>
<br>
<br>
Thanks for the reply. Are there any old version which use Nemesis channel?<br>
Thanks.<br>
<br>
<br>
Regards,<br>
Koh<br>
<br>
<br>
On Mon, Feb 28, 2011 at 9:34 PM, Jayesh Krishna < <a href="mailto:jayesh@mcs.anl.gov">jayesh@mcs.anl.gov</a> > wrote:<br>
<br>
<br>
Hi,<br>
Try the latest stable release of MPICH2 ( <a href="http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads" target="_blank">http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads</a> ).<br>
It looks like you are explicitly using the sock channel (with "-channel sock" option of mpiexec) for running your MPI job. Is there any reason why you want to use the sock channel instead of the default nemesis channel (If you don't use the "-channel" option mpiexec should pick the nemesis channel) ? Sock channel is old and we recommend all users to use the Nemesis channel instead.<br>
<br>
Regards,<br>
Jayesh<br>
<br>
<br>
<br>
----- Original Message -----<br>
From: "Koh Voon Li" < <a href="mailto:kohvoonli@gmail.com">kohvoonli@gmail.com</a> ><br>
To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
Sent: Sunday, February 27, 2011 8:47:39 PM<br>
Subject: [mpich-discuss] mpich2-1.2.1p1 runs for a while and failed<br>
<br>
<br>
<br>
Hi, I am running 2 PC both with Window 7 home premium edition for parallel calculation by using MPICH2 version mpich2-1.2.1p1, it run for 3D FDS calculation which runs for a while and then fails with a number of MPI error messages as below.<br>
<br>
<br>
<br>
Fatal error in MPI_Allreduce: Other MPI error, error stack:<br>
MPI_Allreduce(773)........................: MPI_Allreduce(sbuf=000000003FC70738,<br>
rbuf=000000003FC706F8, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD) failed<br>
MPIR_Bcast(1031)..........................:<br>
MPIR_Bcast_binomial(157)..................:<br>
MPIC_Recv(83).............................:<br>
MPIC_Wait(513)............................:<br>
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an<br>
event returned by MPIDU_Sock_Wait()<br>
MPIDI_CH3I_Progress_handle_sock_event(420):<br>
MPIDU_Sock_wait(2606).....................: The semaphore timeout period has exp<br>
ired. (errno 121)<br>
Fatal error in MPI_Allreduce: Other MPI error, error stack:<br>
MPI_Allreduce(773)........................: MPI_Allreduce(sbuf=000000003FC707B8,<br>
rbuf=000000003FC70778, count=10, MPI_LOGICAL, MPI_LXOR, MPI_COMM_WORLD) failed<br>
MPIR_Allreduce(289).......................:<br>
MPIC_Sendrecv(164)........................:<br>
MPIC_Wait(513)............................:<br>
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an<br>
event returned by MPIDU_Sock_Wait()<br>
MPIDI_CH3I_Progress_handle_sock_event(420):<br>
MPIDU_Sock_wait(2606).....................: The semaphore timeout period has exp<br>
ired. (errno 121)<br>
<br>
<br>
I tried to ping test on each PC and its failed. It seem like I got no response from the network adapter.<br>
I disabled the network adapter and enabled it then everything seem to be normal again.<br>
Both PC are connected by using a crossover cable.<br>
Thanks.<br>
Regards,<br>
Koh<br>
<br>
<br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br>
<br>
</div></div></blockquote></div><br></div>