[mpich-discuss] MPICH deadlock error

Rajeev Thakur thakur at mcs.anl.gov
Mon Feb 13 15:45:27 CST 2012


This will happen if the master is also sending to itself, and calls MPI_Send(to itself) before MPI_Recv(from itself). You need to either use a nonblocking send or post a nonblocking receive before the blocking send.

Rajeev


On Feb 13, 2012, at 3:28 PM, Sarika K wrote:

> Dear MPICH-discuss group:
> 
> My work involves working with Fortran Code using MPICH for parallelization. But I have a very limited experience with the details of MPICH implementation. (I have always treated the MPICh part of the code as a black box).
> 
> I am now working on porting the code across different machine configurations. My modeling code works fine on some machines/servers. But it also generates random MPI deadlock errors when running the simulations across other machines/servers.
> 
> The error message is below.
> "Fatal error in MPI_Send: Other MPI error, error stack:
> MPI_Send(174): MPI_Send(buf=0x7f4d9b375010, count=1, dtype=USER<vector>, dest=1, tag=10001, MPI_COMM_WORLD) failed
> MPID_Send(53): DEADLOCK: attempting to send a message to the local process without a prior matching receive"
> 
> I searched this list/other resources for this error code and strongly believe that there is a bug in the model MPI implementation code which remains dormant in some environments and works fine due to the internal buffering threshold dependance.
> 
> I am not sure if this is sufficient information, but attached below sample subroutine (there are many inside the code) which generates the deadlock error.
> 
> I would really appreciate any help/pointers from the group to fix this error in our code.
> 
> Thanks in advance for your time and assistance,
> Sarika
> 
> c-----------------------------------------------------------------------------------------------------------------------------
>       subroutine int_distrib1(iend) 
> c-----------------------
> c  Master distributes another bunch of integers to Workers
> c-----------------------------------------------------------------------------------------------------------------------------
> c      
>       use ParallelDataMap
>       use CommDataTypes
>       implicit none
>       include 'mpif.h'
> c      
>       include 'aqmax.param'
>       include 'aqindx.cmm'
> c      
>       integer :: iend
>       integer, parameter ::  Nbuf=35
>       integer ::  i, j, k, buf(Nbuf), Ierr, status(MPI_STATUS_SIZE)    
> c
>       if (Master) then
> ! arguments
>     buf(1) = iend
> !  /aqspid/ in aqindx.cmm stuff
>     buf(2) = iair
>     buf(3) = ih2o
>     buf(4) = io2
>     buf(5) = ico 
>     buf(6) = ino2
>     buf(7) = iho2
>     buf(8) = iso2
>     buf(9) = io3
>     buf(10)= ich4
>     buf(11)= ico2
>     buf(12)= ih2
>     buf(13)= in2
>     buf(14)= itrace
>     k=15
>     buf(k:k+9) = ispg_idx(1:10); k=k+10
>     buf(k:k+9) = ispl_idx(1:10); k=k+10
> 
>     do i=1,Nworkers
>       call MPI_SEND(buf, Nbuf, MPI_INTEGER, 
>      &         i, i,  MPI_COMM_WORLD, Ierr)
> 
>     enddo
>     print*, ''
>     print*, 'done sending int_distrib1'
>     print*, ''
>       endif   !   (Master)
> c
> c      
>       if (Worker) then
>         call MPI_RECV(buf, Nbuf, MPI_INTEGER, 0, MyId, 
>      &                 MPI_COMM_WORLD, status, ierr)
>     iend  = buf(1)
> ! /aqspid/ in aqindx.cmm stuff
>     iair  = buf(2)
>     ih2o  = buf(3)
>     io2   = buf(4)
>     ico   = buf(5)
>     ino2  = buf(6)
>     iho2  = buf(7) 
>     iso2  = buf(8)
>     io3   = buf(9)
>     ich4  = buf(10)
>     ico2  = buf(11)
>     ih2   = buf(12)
>     in2   = buf(13)
>     itrace= buf(14)
>     k=15
>     ispg_idx(1:10) = buf(k:k+9); k=k+10
>     ispl_idx(1:10) = buf(k:k+9); k=k+10
>     print*, ''
>     print*, 'done receiving int_distrib1'
>     print*, ''
>       endif  !    (Worker)
> c       
>       end  subroutine int_distrib1
> 
> 
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list