Thanks! Gus. I appreciate your feedback.<br><br>I looked through the MPI communication and driver code (written way back in 2001) , it does includes calls to MPI_Bcast (Attached below is a sample code). I am not sure why MPI_send /MPI_Recv is used in some places and MPI_Bcast in others. I have started to learn more details about MPI calls only after encountering this deadlock error. Are there any particular cases/instances where MPI_send /MPI_Recv call set up is preferred over MPI_bcast or vice-versa? <br>
<br>Best regards,<br>Sarika <br><br>(MPI_Bcast sample code)<br> Nbuf = ix+iy+iz+is+4<br> if (Master) then <br> k=0<br> buf(k+1:k+ix)=dx(1:ix) ; k=k+ix<br> buf(k+1:k+iy)=dy(1:iy) ; k=k+iy<br> buf(k+1:k+iz)=sigmaz(1:iz) ; k=k+iz<br>
buf(k+1)=dht ; k=k+1<br> buf(k+1)=baseh ; k=k+1 <br> buf(k+1:k+is)=rmw(1:is) ; k=k+is<br> buf(k+1)=dt ; k=k+1<br> buf(k+1)=ut ; k=k+1<br> if (k.ne.Nbuf) then<br> print*, 'Error in real_distrib. Nbuf=',Nbuf,<br>
& ' needed=',k<br> stop <br> endif <br> endif <br>c<br> call MPI_BCAST(buf(1),Nbuf,MPI_REAL,0,MPI_COMM_WORLD,Ierr)<br>c <br> if (Worker) then<br>
k=0<br> dx(1:ix)=buf(1:ix) ; k=ix<br> dy(1:iy)=buf(k+1:k+iy) ; k=k+iy<br> sigmaz(1:iz)=buf(k+1:k+iz) ; k=k+iz<br> dht=buf(k+1) ; k=k+1<br> baseh=buf(k+1) ; k=k+1 <br> rmw(1:is)=buf(k+1:k+is); k=k+is<br>
dt=buf(k+1); k=k+1<br> ut=buf(k+1); k=k+1<br> endif <br><br><div class="gmail_quote">On Mon, Feb 13, 2012 at 4:05 PM, Gustavo Correa <span dir="ltr"><<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">Hi Sarika<br></div>
<br>
I think you may also need an MPI_Wait or MPI_Waitall after the MPI_Recv.<br>
<br>
However, your code seems to broadcast the same 'buf' from Master to all Workers, right?<br>
Have you tried to use MPI_Bcast, instead of MPI_Send & MPI_Recv?<br>
A collective call, and is likely to perform better.<br>
Something like this:<br>
<br>
if (Master) then<br>
...load buf with data<br>
endif<br>
<br>
call MPI_Bcast(buf, ...) ! every process calls it<br>
<br>
if (Worker) then<br>
... unload data from buf<br>
endif<br>
<br>
I hope this helps,<br>
Gus Correa<br>
<div class="HOEnZb"><div class="h5"></div></div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="HOEnZb"><div class="h5">
On Feb 13, 2012, at 5:03 PM, Sarika K wrote:<br>
<br>
> Thanks! Rajeev for the quick feedback. I really appreciate it. I
have used but never never written/modified MPI code. I am assuming that I
need to use the nonblocking routine MPI_Isend within the if (master)
part of the sample code. Is that right?<br>
><br>
> Best regards,<br>
> Sarika<br>
><br>
><br>
> On Mon, Feb 13, 2012 at 1:45 PM, Rajeev Thakur <<a href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</a>> wrote:<br>
> This will happen if the master is also sending to itself, and calls MPI_Send(to itself) before MPI_Recv(from itself). You need to either use a nonblocking send or post a nonblocking receive before the blocking send.<br>
><br>
> Rajeev<br>
><br>
><br>
> On Feb 13, 2012, at 3:28 PM, Sarika K wrote:<br>
><br>
> > Dear MPICH-discuss group:<br>
> ><br>
> > My work involves working with Fortran Code using MPICH for parallelization. But I have a very limited experience with the details of MPICH implementation. (I have always treated the MPICh part of the code as a black box).<br>
> ><br>
> > I am now working on porting the code across different machine configurations. My modeling code works fine on some machines/servers. But it also generates random MPI deadlock errors when running the simulations across other machines/servers.<br>
> ><br>
> > The error message is below.<br>
> > "Fatal error in MPI_Send: Other MPI error, error stack:<br>
> > MPI_Send(174): MPI_Send(buf=0x7f4d9b375010, count=1, dtype=USER<vector>, dest=1, tag=10001, MPI_COMM_WORLD) failed<br>
> > MPID_Send(53): DEADLOCK: attempting to send a message to the local process without a prior matching receive"<br>
> ><br>
> > I searched this list/other resources for this error code and strongly believe that there is a bug in the model MPI implementation code which remains dormant in some environments and works fine due to the internal buffering threshold dependance.<br>
> ><br>
> > I am not sure if this is sufficient information, but attached below sample subroutine (there are many inside the code) which generates the deadlock error.<br>
> ><br>
> > I would really appreciate any help/pointers from the group to fix this error in our code.<br>
> ><br>
> > Thanks in advance for your time and assistance,<br>
> > Sarika<br>
> ><br>
> > c-----------------------------------------------------------------------------------------------------------------------------<br>
> > subroutine int_distrib1(iend)<br>
> > c-----------------------<br>
> > c Master distributes another bunch of integers to Workers<br>
> > c-----------------------------------------------------------------------------------------------------------------------------<br>
> > c<br>
> > use ParallelDataMap<br>
> > use CommDataTypes<br>
> > implicit none<br>
> > include 'mpif.h'<br>
> > c<br>
> > include 'aqmax.param'<br>
> > include 'aqindx.cmm'<br>
> > c<br>
> > integer :: iend<br>
> > integer, parameter :: Nbuf=35<br>
> > integer :: i, j, k, buf(Nbuf), Ierr, status(MPI_STATUS_SIZE)<br>
> > c<br>
> > if (Master) then<br>
> > ! arguments<br>
> > buf(1) = iend<br>
> > ! /aqspid/ in aqindx.cmm stuff<br>
> > buf(2) = iair<br>
> > buf(3) = ih2o<br>
> > buf(4) = io2<br>
> > buf(5) = ico<br>
> > buf(6) = ino2<br>
> > buf(7) = iho2<br>
> > buf(8) = iso2<br>
> > buf(9) = io3<br>
> > buf(10)= ich4<br>
> > buf(11)= ico2<br>
> > buf(12)= ih2<br>
> > buf(13)= in2<br>
> > buf(14)= itrace<br>
> > k=15<br>
> > buf(k:k+9) = ispg_idx(1:10); k=k+10<br>
> > buf(k:k+9) = ispl_idx(1:10); k=k+10<br>
> ><br>
> > do i=1,Nworkers<br>
> > call MPI_SEND(buf, Nbuf, MPI_INTEGER,<br>
> > & i, i, MPI_COMM_WORLD, Ierr)<br>
> ><br>
> > enddo<br>
> > print*, ''<br>
> > print*, 'done sending int_distrib1'<br>
> > print*, ''<br>
> > endif ! (Master)<br>
> > c<br>
> > c<br>
> > if (Worker) then<br>
> > call MPI_RECV(buf, Nbuf, MPI_INTEGER, 0, MyId,<br>
> > & MPI_COMM_WORLD, status, ierr)<br>
> > iend = buf(1)<br>
> > ! /aqspid/ in aqindx.cmm stuff<br>
> > iair = buf(2)<br>
> > ih2o = buf(3)<br>
> > io2 = buf(4)<br>
> > ico = buf(5)<br>
> > ino2 = buf(6)<br>
> > iho2 = buf(7)<br>
> > iso2 = buf(8)<br>
> > io3 = buf(9)<br>
> > ich4 = buf(10)<br>
> > ico2 = buf(11)<br>
> > ih2 = buf(12)<br>
> > in2 = buf(13)<br>
> > itrace= buf(14)<br>
> > k=15<br>
> > ispg_idx(1:10) = buf(k:k+9); k=k+10<br>
> > ispl_idx(1:10) = buf(k:k+9); k=k+10<br>
> > print*, ''<br>
> > print*, 'done receiving int_distrib1'<br>
> > print*, ''<br>
> > endif ! (Worker)<br>
> > c<br>
> > end subroutine int_distrib1<br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
><br>
> _______________________________________________<br>
> mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
><br>
> _______________________________________________<br>
> mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br>
_______________________________________________<br>
mpich-discuss mailing list <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</div></div></blockquote></div><br>