[mpich-discuss] MPICH deadlock error

Gustavo Correa gus at ldeo.columbia.edu
Tue Feb 14 12:04:52 CST 2012


On Feb 13, 2012, at 8:05 PM, Sarika K wrote:

> Thanks! Gus. I appreciate your feedback.
> 
> I looked through the MPI communication and driver code (written way back in 2001) , it does includes calls to MPI_Bcast (Attached below is a sample code). I am not sure why MPI_send /MPI_Recv is used in some places and MPI_Bcast in others.  I have started to learn more details about MPI calls only after encountering this deadlock error. Are there any particular cases/instances where MPI_send /MPI_Recv call set up is preferred over MPI_bcast or vice-versa? 
> 
> Best regards,
> Sarika 

Hi Sarika

When one process sends the *same* data to all other processes in a communicator,
the situation begs for MPI_Bcast.
Point to point communication (send/recv) is preferred when the data exchanged across
different pairs of processes is different.
MPI_Bcast is but one type of MPI collective procedures.  
There are several others with different goals [e.g. MPI_Scatter[v] and MPI_Gather[v], say,
when parts of an array is distributed/collected by a master process].

Check these MPI tutorials:
http://www.citutor.org/browse.php
https://computing.llnl.gov/tutorials/mpi/
http://www.mcs.anl.gov/research/projects/mpi/tutorial/
http://www.mcs.anl.gov/research/projects/mpi/tutorial/gropp/talk.html
http://www.cs.usfca.edu/~peter/ppmpi/

I hope this helps,
Gus Correa

> 
> (MPI_Bcast sample code)
>       Nbuf = ix+iy+iz+is+4
>       if (Master) then 
>         k=0
>     buf(k+1:k+ix)=dx(1:ix) ; k=k+ix
>     buf(k+1:k+iy)=dy(1:iy) ; k=k+iy
>     buf(k+1:k+iz)=sigmaz(1:iz) ; k=k+iz
>     buf(k+1)=dht ; k=k+1
>     buf(k+1)=baseh ; k=k+1    
>     buf(k+1:k+is)=rmw(1:is) ; k=k+is
>     buf(k+1)=dt ; k=k+1
>     buf(k+1)=ut ; k=k+1
>     if (k.ne.Nbuf) then
>       print*, 'Error in real_distrib. Nbuf=',Nbuf,
>      &             '   needed=',k
>           stop         
>     endif    
>       endif     
> c
>       call MPI_BCAST(buf(1),Nbuf,MPI_REAL,0,MPI_COMM_WORLD,Ierr)
> c      
>       if (Worker) then
>         k=0
>     dx(1:ix)=buf(1:ix) ; k=ix
>     dy(1:iy)=buf(k+1:k+iy) ; k=k+iy
>     sigmaz(1:iz)=buf(k+1:k+iz) ; k=k+iz
>     dht=buf(k+1) ; k=k+1
>     baseh=buf(k+1) ; k=k+1    
>     rmw(1:is)=buf(k+1:k+is); k=k+is
>     dt=buf(k+1); k=k+1
>     ut=buf(k+1); k=k+1
>       endif     
> 
> On Mon, Feb 13, 2012 at 4:05 PM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
> Hi Sarika
> 
> I think you may also need an MPI_Wait or MPI_Waitall after the MPI_Recv.
> 
> However, your code seems to broadcast  the same 'buf' from Master to all Workers, right?
> Have you tried to use MPI_Bcast, instead of MPI_Send & MPI_Recv?
> A collective call, and is likely to perform better.
> Something like this:
> 
> if (Master) then
>  ...load buf with data
> endif
> 
> call MPI_Bcast(buf, ...)  ! every process calls it
> 
> if (Worker) then
>  ... unload data from buf
> endif
> 
> I hope this helps,
> Gus Correa
>  
> On Feb 13, 2012, at 5:03 PM, Sarika K wrote:
> 
> > Thanks! Rajeev for the quick feedback. I really appreciate it.  I have used but never never written/modified MPI code. I am assuming that I need to use the nonblocking routine MPI_Isend within the if (master) part of the sample code. Is that right?
> >
> > Best regards,
> > Sarika
> >
> >
> > On Mon, Feb 13, 2012 at 1:45 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> > This will happen if the master is also sending to itself, and calls MPI_Send(to itself) before MPI_Recv(from itself). You need to either use a nonblocking send or post a nonblocking receive before the blocking send.
> >
> > Rajeev
> >
> >
> > On Feb 13, 2012, at 3:28 PM, Sarika K wrote:
> >
> > > Dear MPICH-discuss group:
> > >
> > > My work involves working with Fortran Code using MPICH for parallelization. But I have a very limited experience with the details of MPICH implementation. (I have always treated the MPICh part of the code as a black box).
> > >
> > > I am now working on porting the code across different machine configurations. My modeling code works fine on some machines/servers. But it also generates random MPI deadlock errors when running the simulations across other machines/servers.
> > >
> > > The error message is below.
> > > "Fatal error in MPI_Send: Other MPI error, error stack:
> > > MPI_Send(174): MPI_Send(buf=0x7f4d9b375010, count=1, dtype=USER<vector>, dest=1, tag=10001, MPI_COMM_WORLD) failed
> > > MPID_Send(53): DEADLOCK: attempting to send a message to the local process without a prior matching receive"
> > >
> > > I searched this list/other resources for this error code and strongly believe that there is a bug in the model MPI implementation code which remains dormant in some environments and works fine due to the internal buffering threshold dependance.
> > >
> > > I am not sure if this is sufficient information, but attached below sample subroutine (there are many inside the code) which generates the deadlock error.
> > >
> > > I would really appreciate any help/pointers from the group to fix this error in our code.
> > >
> > > Thanks in advance for your time and assistance,
> > > Sarika
> > >
> > > c-----------------------------------------------------------------------------------------------------------------------------
> > >       subroutine int_distrib1(iend)
> > > c-----------------------
> > > c  Master distributes another bunch of integers to Workers
> > > c-----------------------------------------------------------------------------------------------------------------------------
> > > c
> > >       use ParallelDataMap
> > >       use CommDataTypes
> > >       implicit none
> > >       include 'mpif.h'
> > > c
> > >       include 'aqmax.param'
> > >       include 'aqindx.cmm'
> > > c
> > >       integer :: iend
> > >       integer, parameter ::  Nbuf=35
> > >       integer ::  i, j, k, buf(Nbuf), Ierr, status(MPI_STATUS_SIZE)
> > > c
> > >       if (Master) then
> > > ! arguments
> > >     buf(1) = iend
> > > !  /aqspid/ in aqindx.cmm stuff
> > >     buf(2) = iair
> > >     buf(3) = ih2o
> > >     buf(4) = io2
> > >     buf(5) = ico
> > >     buf(6) = ino2
> > >     buf(7) = iho2
> > >     buf(8) = iso2
> > >     buf(9) = io3
> > >     buf(10)= ich4
> > >     buf(11)= ico2
> > >     buf(12)= ih2
> > >     buf(13)= in2
> > >     buf(14)= itrace
> > >     k=15
> > >     buf(k:k+9) = ispg_idx(1:10); k=k+10
> > >     buf(k:k+9) = ispl_idx(1:10); k=k+10
> > >
> > >     do i=1,Nworkers
> > >       call MPI_SEND(buf, Nbuf, MPI_INTEGER,
> > >      &         i, i,  MPI_COMM_WORLD, Ierr)
> > >
> > >     enddo
> > >     print*, ''
> > >     print*, 'done sending int_distrib1'
> > >     print*, ''
> > >       endif   !   (Master)
> > > c
> > > c
> > >       if (Worker) then
> > >         call MPI_RECV(buf, Nbuf, MPI_INTEGER, 0, MyId,
> > >      &                 MPI_COMM_WORLD, status, ierr)
> > >     iend  = buf(1)
> > > ! /aqspid/ in aqindx.cmm stuff
> > >     iair  = buf(2)
> > >     ih2o  = buf(3)
> > >     io2   = buf(4)
> > >     ico   = buf(5)
> > >     ino2  = buf(6)
> > >     iho2  = buf(7)
> > >     iso2  = buf(8)
> > >     io3   = buf(9)
> > >     ich4  = buf(10)
> > >     ico2  = buf(11)
> > >     ih2   = buf(12)
> > >     in2   = buf(13)
> > >     itrace= buf(14)
> > >     k=15
> > >     ispg_idx(1:10) = buf(k:k+9); k=k+10
> > >     ispl_idx(1:10) = buf(k:k+9); k=k+10
> > >     print*, ''
> > >     print*, 'done receiving int_distrib1'
> > >     print*, ''
> > >       endif  !    (Worker)
> > > c
> > >       end  subroutine int_distrib1
> > >
> > >
> > >
> > > _______________________________________________
> > > mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> > > To manage subscription options or unsubscribe:
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> > _______________________________________________
> > mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> > _______________________________________________
> > mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list