Thanks! Gus. I appreciate your feedback.<br><br>I looked through the MPI communication and driver code (written way back in 2001) , it does includes calls to MPI_Bcast (Attached below is a sample code). I am not sure why MPI_send /MPI_Recv is used in some places and MPI_Bcast in others.  I have started to learn more details about MPI calls only after encountering this deadlock error. Are there any particular cases/instances where  MPI_send /MPI_Recv call set up is preferred over  MPI_bcast or vice-versa? <br>

<br>Best regards,<br>Sarika <br><br>(MPI_Bcast sample code)<br>      Nbuf = ix+iy+iz+is+4<br>      if (Master) then <br>        k=0<br>    buf(k+1:k+ix)=dx(1:ix) ; k=k+ix<br>    buf(k+1:k+iy)=dy(1:iy) ; k=k+iy<br>    buf(k+1:k+iz)=sigmaz(1:iz) ; k=k+iz<br>

    buf(k+1)=dht ; k=k+1<br>    buf(k+1)=baseh ; k=k+1    <br>    buf(k+1:k+is)=rmw(1:is) ; k=k+is<br>    buf(k+1)=dt ; k=k+1<br>    buf(k+1)=ut ; k=k+1<br>    if (k.ne.Nbuf) then<br>      print*, &#39;Error in real_distrib. Nbuf=&#39;,Nbuf,<br>

     &amp;             &#39;   needed=&#39;,k<br>          stop         <br>    endif    <br>      endif     <br>c<br>      call MPI_BCAST(buf(1),Nbuf,MPI_REAL,0,MPI_COMM_WORLD,Ierr)<br>c      <br>      if (Worker) then<br>

        k=0<br>    dx(1:ix)=buf(1:ix) ; k=ix<br>    dy(1:iy)=buf(k+1:k+iy) ; k=k+iy<br>    sigmaz(1:iz)=buf(k+1:k+iz) ; k=k+iz<br>    dht=buf(k+1) ; k=k+1<br>    baseh=buf(k+1) ; k=k+1    <br>    rmw(1:is)=buf(k+1:k+is); k=k+is<br>

    dt=buf(k+1); k=k+1<br>    ut=buf(k+1); k=k+1<br>      endif     <br><br><div class="gmail_quote">On Mon, Feb 13, 2012 at 4:05 PM, Gustavo Correa <span dir="ltr">&lt;<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">Hi Sarika<br></div>

<br>

I think you may also need an MPI_Wait or MPI_Waitall after the MPI_Recv.<br>

<br>

However, your code seems to broadcast  the same &#39;buf&#39; from Master to all Workers, right?<br>

Have you tried to use MPI_Bcast, instead of MPI_Send &amp; MPI_Recv?<br>

A collective call, and is likely to perform better.<br>

Something like this:<br>

<br>

if (Master) then<br>

 ...load buf with data<br>

endif<br>

<br>

call MPI_Bcast(buf, ...)  ! every process calls it<br>

<br>

if (Worker) then<br>

 ... unload data from buf<br>

endif<br>

<br>

I hope this helps,<br>

Gus Correa<br>

<div class="HOEnZb"><div class="h5"></div></div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="HOEnZb"><div class="h5">

On Feb 13, 2012, at 5:03 PM, Sarika K wrote:<br>

<br>

&gt; Thanks! Rajeev for the quick feedback. I really appreciate it.  I 

have used but never never written/modified MPI code. I am assuming that I

 need to use the nonblocking routine MPI_Isend within the if (master) 

part of the sample code. Is that right?<br>

&gt;<br>

&gt; Best regards,<br>

&gt; Sarika<br>

&gt;<br>

&gt;<br>

&gt; On Mon, Feb 13, 2012 at 1:45 PM, Rajeev Thakur &lt;<a href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</a>&gt; wrote:<br>

&gt; This will happen if the master is also sending to itself, and calls MPI_Send(to itself) before MPI_Recv(from itself). You need to either use a nonblocking send or post a nonblocking receive before the blocking send.<br>


&gt;<br>

&gt; Rajeev<br>

&gt;<br>

&gt;<br>

&gt; On Feb 13, 2012, at 3:28 PM, Sarika K wrote:<br>

&gt;<br>

&gt; &gt; Dear MPICH-discuss group:<br>

&gt; &gt;<br>

&gt; &gt; My work involves working with Fortran Code using MPICH for parallelization. But I have a very limited experience with the details of MPICH implementation. (I have always treated the MPICh part of the code as a black box).<br>


&gt; &gt;<br>

&gt; &gt; I am now working on porting the code across different machine configurations. My modeling code works fine on some machines/servers. But it also generates random MPI deadlock errors when running the simulations across other machines/servers.<br>


&gt; &gt;<br>

&gt; &gt; The error message is below.<br>

&gt; &gt; &quot;Fatal error in MPI_Send: Other MPI error, error stack:<br>

&gt; &gt; MPI_Send(174): MPI_Send(buf=0x7f4d9b375010, count=1, dtype=USER&lt;vector&gt;, dest=1, tag=10001, MPI_COMM_WORLD) failed<br>

&gt; &gt; MPID_Send(53): DEADLOCK: attempting to send a message to the local process without a prior matching receive&quot;<br>

&gt; &gt;<br>

&gt; &gt; I searched this list/other resources for this error code and strongly believe that there is a bug in the model MPI implementation code which remains dormant in some environments and works fine due to the internal buffering threshold dependance.<br>


&gt; &gt;<br>

&gt; &gt; I am not sure if this is sufficient information, but attached below sample subroutine (there are many inside the code) which generates the deadlock error.<br>

&gt; &gt;<br>

&gt; &gt; I would really appreciate any help/pointers from the group to fix this error in our code.<br>

&gt; &gt;<br>

&gt; &gt; Thanks in advance for your time and assistance,<br>

&gt; &gt; Sarika<br>

&gt; &gt;<br>

&gt; &gt; c-----------------------------------------------------------------------------------------------------------------------------<br>

&gt; &gt;       subroutine int_distrib1(iend)<br>

&gt; &gt; c-----------------------<br>

&gt; &gt; c  Master distributes another bunch of integers to Workers<br>

&gt; &gt; c-----------------------------------------------------------------------------------------------------------------------------<br>

&gt; &gt; c<br>

&gt; &gt;       use ParallelDataMap<br>

&gt; &gt;       use CommDataTypes<br>

&gt; &gt;       implicit none<br>

&gt; &gt;       include &#39;mpif.h&#39;<br>

&gt; &gt; c<br>

&gt; &gt;       include &#39;aqmax.param&#39;<br>

&gt; &gt;       include &#39;aqindx.cmm&#39;<br>

&gt; &gt; c<br>

&gt; &gt;       integer :: iend<br>

&gt; &gt;       integer, parameter ::  Nbuf=35<br>

&gt; &gt;       integer ::  i, j, k, buf(Nbuf), Ierr, status(MPI_STATUS_SIZE)<br>

&gt; &gt; c<br>

&gt; &gt;       if (Master) then<br>

&gt; &gt; ! arguments<br>

&gt; &gt;     buf(1) = iend<br>

&gt; &gt; !  /aqspid/ in aqindx.cmm stuff<br>

&gt; &gt;     buf(2) = iair<br>

&gt; &gt;     buf(3) = ih2o<br>

&gt; &gt;     buf(4) = io2<br>

&gt; &gt;     buf(5) = ico<br>

&gt; &gt;     buf(6) = ino2<br>

&gt; &gt;     buf(7) = iho2<br>

&gt; &gt;     buf(8) = iso2<br>

&gt; &gt;     buf(9) = io3<br>

&gt; &gt;     buf(10)= ich4<br>

&gt; &gt;     buf(11)= ico2<br>

&gt; &gt;     buf(12)= ih2<br>

&gt; &gt;     buf(13)= in2<br>

&gt; &gt;     buf(14)= itrace<br>

&gt; &gt;     k=15<br>

&gt; &gt;     buf(k:k+9) = ispg_idx(1:10); k=k+10<br>

&gt; &gt;     buf(k:k+9) = ispl_idx(1:10); k=k+10<br>

&gt; &gt;<br>

&gt; &gt;     do i=1,Nworkers<br>

&gt; &gt;       call MPI_SEND(buf, Nbuf, MPI_INTEGER,<br>

&gt; &gt;      &amp;         i, i,  MPI_COMM_WORLD, Ierr)<br>

&gt; &gt;<br>

&gt; &gt;     enddo<br>

&gt; &gt;     print*, &#39;&#39;<br>

&gt; &gt;     print*, &#39;done sending int_distrib1&#39;<br>

&gt; &gt;     print*, &#39;&#39;<br>

&gt; &gt;       endif   !   (Master)<br>

&gt; &gt; c<br>

&gt; &gt; c<br>

&gt; &gt;       if (Worker) then<br>

&gt; &gt;         call MPI_RECV(buf, Nbuf, MPI_INTEGER, 0, MyId,<br>

&gt; &gt;      &amp;                 MPI_COMM_WORLD, status, ierr)<br>

&gt; &gt;     iend  = buf(1)<br>

&gt; &gt; ! /aqspid/ in aqindx.cmm stuff<br>

&gt; &gt;     iair  = buf(2)<br>

&gt; &gt;     ih2o  = buf(3)<br>

&gt; &gt;     io2   = buf(4)<br>

&gt; &gt;     ico   = buf(5)<br>

&gt; &gt;     ino2  = buf(6)<br>

&gt; &gt;     iho2  = buf(7)<br>

&gt; &gt;     iso2  = buf(8)<br>

&gt; &gt;     io3   = buf(9)<br>

&gt; &gt;     ich4  = buf(10)<br>

&gt; &gt;     ico2  = buf(11)<br>

&gt; &gt;     ih2   = buf(12)<br>

&gt; &gt;     in2   = buf(13)<br>

&gt; &gt;     itrace= buf(14)<br>

&gt; &gt;     k=15<br>

&gt; &gt;     ispg_idx(1:10) = buf(k:k+9); k=k+10<br>

&gt; &gt;     ispl_idx(1:10) = buf(k:k+9); k=k+10<br>

&gt; &gt;     print*, &#39;&#39;<br>

&gt; &gt;     print*, &#39;done receiving int_distrib1&#39;<br>

&gt; &gt;     print*, &#39;&#39;<br>

&gt; &gt;       endif  !    (Worker)<br>

&gt; &gt; c<br>

&gt; &gt;       end  subroutine int_distrib1<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>

&gt; &gt; To manage subscription options or unsubscribe:<br>

&gt; &gt; <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>

&gt; To manage subscription options or unsubscribe:<br>

&gt; <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>

&gt; To manage subscription options or unsubscribe:<br>

&gt; <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>

<br>

_______________________________________________<br>

mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>

To manage subscription options or unsubscribe:<br>

<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>

</div></div></blockquote></div><br>