Thanks! Gus and Rajeev for your feedback on my MPICH deadlock error post earlier in February.<br>Our model code now works successfully with the latest stable mpich2 (mpich2 1.4.1p1) version after modifying the MPI calls as per your suggestions.<br>
<br>The code uses Master/Slave MPI implementation and was originally set up for a cluster of single core nodes controlled by a head node. <br><br>The code also works well with a single node multiple cores set up. <br><br>
Now I am testing this code on mutiple nodes each with multiple cores. The nodes are set up for password less login via ssh. The run hangs sometimes and takes longer to complete. (The run with single node 12 cores is faster than the corresponding run using 2 nodes each with 12 cores (total 24)).<br>
<br>I am trying to resolve this issue and wondering if you have any feedback on:<br>1. Does Master/Slave implementation need any specific settings to work across mutliple node/mutliple core machine set up?<br>2. Is there any way to explicitly specify a core as a master with mpich2 and exclude it from computations?<br>
<br>I would greatly appreciate any other pointers/suggestions about this issue.<br><br>Thanks,<br>Sarika<br><br><br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Gustavo Correa</b> <span dir="ltr"><<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>></span><br>
Date: Tue, Feb 14, 2012 at 10:04 AM<br>Subject: Re: [mpich-discuss] MPICH deadlock error<br>To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br><br><br><div class="im"><br>
On Feb 13, 2012, at 8:05 PM, Sarika K wrote:<br>
<br>
> Thanks! Gus. I appreciate your feedback.<br>
><br>
> I looked through the MPI communication and driver code (written way back in 2001) , it does includes calls to MPI_Bcast (Attached below is a sample code). I am not sure why MPI_send /MPI_Recv is used in some places and MPI_Bcast in others. I have started to learn more details about MPI calls only after encountering this deadlock error. Are there any particular cases/instances where MPI_send /MPI_Recv call set up is preferred over MPI_bcast or vice-versa?<br>
><br>
> Best regards,<br>
> Sarika<br>
<br>
</div>Hi Sarika<br>
<br>
When one process sends the *same* data to all other processes in a communicator,<br>
the situation begs for MPI_Bcast.<br>
Point to point communication (send/recv) is preferred when the data exchanged across<br>
different pairs of processes is different.<br>
MPI_Bcast is but one type of MPI collective procedures.<br>
There are several others with different goals [e.g. MPI_Scatter[v] and MPI_Gather[v], say,<br>
when parts of an array is distributed/collected by a master process].<br>
<br>
Check these MPI tutorials:<br>
<a href="http://www.citutor.org/browse.php" target="_blank">http://www.citutor.org/browse.php</a><br>
<a href="https://computing.llnl.gov/tutorials/mpi/" target="_blank">https://computing.llnl.gov/tutorials/mpi/</a><br>
<a href="http://www.mcs.anl.gov/research/projects/mpi/tutorial/" target="_blank">http://www.mcs.anl.gov/research/projects/mpi/tutorial/</a><br>
<a href="http://www.mcs.anl.gov/research/projects/mpi/tutorial/gropp/talk.html" target="_blank">http://www.mcs.anl.gov/research/projects/mpi/tutorial/gropp/talk.html</a><br>
<a href="http://www.cs.usfca.edu/%7Epeter/ppmpi/" target="_blank">http://www.cs.usfca.edu/~peter/ppmpi/</a><br>
<div class="HOEnZb"><div class="h5"><br>
I hope this helps,<br>
Gus Correa<br>
<br>
><br>
> (MPI_Bcast sample code)<br>
> Nbuf = ix+iy+iz+is+4<br>
> if (Master) then<br>
> k=0<br>
> buf(k+1:k+ix)=dx(1:ix) ; k=k+ix<br>
> buf(k+1:k+iy)=dy(1:iy) ; k=k+iy<br>
> buf(k+1:k+iz)=sigmaz(1:iz) ; k=k+iz<br>
> buf(k+1)=dht ; k=k+1<br>
> buf(k+1)=baseh ; k=k+1<br>
> buf(k+1:k+is)=rmw(1:is) ; k=k+is<br>
> buf(k+1)=dt ; k=k+1<br>
> buf(k+1)=ut ; k=k+1<br>
> if (k.ne.Nbuf) then<br>
> print*, 'Error in real_distrib. Nbuf=',Nbuf,<br>
> & ' needed=',k<br>
> stop<br>
> endif<br>
> endif<br>
> c<br>
> call MPI_BCAST(buf(1),Nbuf,MPI_REAL,0,MPI_COMM_WORLD,Ierr)<br>
> c<br>
> if (Worker) then<br>
> k=0<br>
> dx(1:ix)=buf(1:ix) ; k=ix<br>
> dy(1:iy)=buf(k+1:k+iy) ; k=k+iy<br>
> sigmaz(1:iz)=buf(k+1:k+iz) ; k=k+iz<br>
> dht=buf(k+1) ; k=k+1<br>
> baseh=buf(k+1) ; k=k+1<br>
> rmw(1:is)=buf(k+1:k+is); k=k+is<br>
> dt=buf(k+1); k=k+1<br>
> ut=buf(k+1); k=k+1<br>
> endif<br>
><br>
> On Mon, Feb 13, 2012 at 4:05 PM, Gustavo Correa <<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>> wrote:<br>
> Hi Sarika<br>
><br>
> I think you may also need an MPI_Wait or MPI_Waitall after the MPI_Recv.<br>
><br>
> However, your code seems to broadcast the same 'buf' from Master to all Workers, right?<br>
> Have you tried to use MPI_Bcast, instead of MPI_Send & MPI_Recv?<br>
> A collective call, and is likely to perform better.<br>
> Something like this:<br>
><br>
> if (Master) then<br>
> ...load buf with data<br>
> endif<br>
><br>
> call MPI_Bcast(buf, ...) ! every process calls it<br>
><br>
> if (Worker) then<br>
> ... unload data from buf<br>
> endif<br>
><br>
> I hope this helps,<br>
> Gus Correa<br>
><br>
> On Feb 13, 2012, at 5:03 PM, Sarika K wrote:<br>
><br>
> > Thanks! Rajeev for the quick feedback. I really appreciate it. I have used but never never written/modified MPI code. I am assuming that I need to use the nonblocking routine MPI_Isend within the if (master) part of the sample code. Is that right?<br>
> ><br>
> > Best regards,<br>
> > Sarika<br>
> ><br>
> ><br>
> > On Mon, Feb 13, 2012 at 1:45 PM, Rajeev Thakur <<a href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</a>> wrote:<br>
> > This will happen if the master is also sending to itself, and calls MPI_Send(to itself) before MPI_Recv(from itself). You need to either use a nonblocking send or post a nonblocking receive before the blocking send.<br>
> ><br>
> > Rajeev<br>
> ><br>
> ><br>
> > On Feb 13, 2012, at 3:28 PM, Sarika K wrote:<br>
> ><br>
> > > Dear MPICH-discuss group:<br>
> > ><br>
> > > My work involves working with Fortran Code using MPICH for parallelization. But I have a very limited experience with the details of MPICH implementation. (I have always treated the MPICh part of the code as a black box).<br>
> > ><br>
> > > I am now working on porting the code across different machine configurations. My modeling code works fine on some machines/servers. But it also generates random MPI deadlock errors when running the simulations across other machines/servers.<br>
> > ><br>
> > > The error message is below.<br>
> > > "Fatal error in MPI_Send: Other MPI error, error stack:<br>
> > > MPI_Send(174): MPI_Send(buf=0x7f4d9b375010, count=1, dtype=USER<vector>, dest=1, tag=10001, MPI_COMM_WORLD) failed<br>
> > > MPID_Send(53): DEADLOCK: attempting to send a message to the local process without a prior matching receive"<br>
> > ><br>
> > > I searched this list/other resources for this error code and strongly believe that there is a bug in the model MPI implementation code which remains dormant in some environments and works fine due to the internal buffering threshold dependance.<br>
> > ><br>
> > > I am not sure if this is sufficient information, but attached below sample subroutine (there are many inside the code) which generates the deadlock error.<br>
> > ><br>
> > > I would really appreciate any help/pointers from the group to fix this error in our code.<br>
> > ><br>
> > > Thanks in advance for your time and assistance,<br>
> > > Sarika<br>
> > ><br>
> > > c-----------------------------------------------------------------------------------------------------------------------------<br>
> > > subroutine int_distrib1(iend)<br>
> > > c-----------------------<br>
> > > c Master distributes another bunch of integers to Workers<br>
> > > c-----------------------------------------------------------------------------------------------------------------------------<br>
> > > c<br>
> > > use ParallelDataMap<br>
> > > use CommDataTypes<br>
> > > implicit none<br>
> > > include 'mpif.h'<br>
> > > c<br>
> > > include 'aqmax.param'<br>
> > > include 'aqindx.cmm'<br>
> > > c<br>
> > > integer :: iend<br>
> > > integer, parameter :: Nbuf=35<br>
> > > integer :: i, j, k, buf(Nbuf), Ierr, status(MPI_STATUS_SIZE)<br>
> > > c<br>
> > > if (Master) then<br>
> > > ! arguments<br>
> > > buf(1) = iend<br>
> > > ! /aqspid/ in aqindx.cmm stuff<br>
> > > buf(2) = iair<br>
> > > buf(3) = ih2o<br>
> > > buf(4) = io2<br>
> > > buf(5) = ico<br>
> > > buf(6) = ino2<br>
> > > buf(7) = iho2<br>
> > > buf(8) = iso2<br>
> > > buf(9) = io3<br>
> > > buf(10)= ich4<br>
> > > buf(11)= ico2<br>
> > > buf(12)= ih2<br>
> > > buf(13)= in2<br>
> > > buf(14)= itrace<br>
> > > k=15<br>
> > > buf(k:k+9) = ispg_idx(1:10); k=k+10<br>
> > > buf(k:k+9) = ispl_idx(1:10); k=k+10<br>
> > ><br>
> > > do i=1,Nworkers<br>
> > > call MPI_SEND(buf, Nbuf, MPI_INTEGER,<br>
> > > & i, i, MPI_COMM_WORLD, Ierr)<br>
> > ><br>
> > > enddo<br>
> > > print*, ''<br>
> > > print*, 'done sending int_distrib1'<br>
> > > print*, ''<br>
> > > endif ! (Master)<br>
> > > c<br>
> > > c<br>
> > > if (Worker) then<br>
> > > call MPI_RECV(buf, Nbuf, MPI_INTEGER, 0, MyId,<br>
> > > & MPI_COMM_WORLD, status, ierr)<br>
> > > iend = buf(1)<br>
> > > ! /aqspid/ in aqindx.cmm stuff<br>
> > > iair = buf(2)<br>
> > > ih2o = buf(3)<br>
> > > io2 = buf(4)<br>
> > > ico = buf(5)<br>
> > > ino2 = buf(6)<br>
> > > iho2 = buf(7)<br>
> > > iso2 = buf(8)<br>
> > > io3 = buf(9)<br>
> > > ich4 = buf(10)<br>
> > > ico2 = buf(11)<br>
> > > ih2 = buf(12)<br>
> > > in2 = buf(13)<br>
> > > itrace= buf(14)<br>
> > > k=15<br>
> > > ispg_idx(1:10) = buf(k:k+9); k=k+10<br>
> > > ispl_idx(1:10) = buf(k:k+9); k=k+10<br>
> > > print*, ''<br>
> > > print*, 'done receiving int_distrib1'<br>
> > > print*, ''<br>
> > > endif ! (Worker)<br>
> > > c<br>
> > > end subroutine int_distrib1<br>
> > ><br>
> > ><br>
> > ><br>
> > > _______________________________________________<br>
> > > mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> > > To manage subscription options or unsubscribe:<br>
> > > <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
> ><br>
> > _______________________________________________<br>
> > mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
> ><br>
> > _______________________________________________<br>
> > mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
><br>
> _______________________________________________<br>
> mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
><br>
> _______________________________________________<br>
> mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br>
_______________________________________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</div></div></div><br>