[mpich-discuss] MPICH deadlock error
Sarika K
sarikauniv at gmail.com
Mon Feb 13 16:03:05 CST 2012
Thanks! Rajeev for the quick feedback. I really appreciate it. I have used
but never never written/modified MPI code. I am assuming that I need to use
the nonblocking routine MPI_Isend within the if (master) part of the sample
code. Is that right?
Best regards,
Sarika
On Mon, Feb 13, 2012 at 1:45 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> This will happen if the master is also sending to itself, and calls
> MPI_Send(to itself) before MPI_Recv(from itself). You need to either use a
> nonblocking send or post a nonblocking receive before the blocking send.
>
> Rajeev
>
>
> On Feb 13, 2012, at 3:28 PM, Sarika K wrote:
>
> > Dear MPICH-discuss group:
> >
> > My work involves working with Fortran Code using MPICH for
> parallelization. But I have a very limited experience with the details of
> MPICH implementation. (I have always treated the MPICh part of the code as
> a black box).
> >
> > I am now working on porting the code across different machine
> configurations. My modeling code works fine on some machines/servers. But
> it also generates random MPI deadlock errors when running the simulations
> across other machines/servers.
> >
> > The error message is below.
> > "Fatal error in MPI_Send: Other MPI error, error stack:
> > MPI_Send(174): MPI_Send(buf=0x7f4d9b375010, count=1, dtype=USER<vector>,
> dest=1, tag=10001, MPI_COMM_WORLD) failed
> > MPID_Send(53): DEADLOCK: attempting to send a message to the local
> process without a prior matching receive"
> >
> > I searched this list/other resources for this error code and strongly
> believe that there is a bug in the model MPI implementation code which
> remains dormant in some environments and works fine due to the internal
> buffering threshold dependance.
> >
> > I am not sure if this is sufficient information, but attached below
> sample subroutine (there are many inside the code) which generates the
> deadlock error.
> >
> > I would really appreciate any help/pointers from the group to fix this
> error in our code.
> >
> > Thanks in advance for your time and assistance,
> > Sarika
> >
> >
> c-----------------------------------------------------------------------------------------------------------------------------
> > subroutine int_distrib1(iend)
> > c-----------------------
> > c Master distributes another bunch of integers to Workers
> >
> c-----------------------------------------------------------------------------------------------------------------------------
> > c
> > use ParallelDataMap
> > use CommDataTypes
> > implicit none
> > include 'mpif.h'
> > c
> > include 'aqmax.param'
> > include 'aqindx.cmm'
> > c
> > integer :: iend
> > integer, parameter :: Nbuf=35
> > integer :: i, j, k, buf(Nbuf), Ierr, status(MPI_STATUS_SIZE)
> > c
> > if (Master) then
> > ! arguments
> > buf(1) = iend
> > ! /aqspid/ in aqindx.cmm stuff
> > buf(2) = iair
> > buf(3) = ih2o
> > buf(4) = io2
> > buf(5) = ico
> > buf(6) = ino2
> > buf(7) = iho2
> > buf(8) = iso2
> > buf(9) = io3
> > buf(10)= ich4
> > buf(11)= ico2
> > buf(12)= ih2
> > buf(13)= in2
> > buf(14)= itrace
> > k=15
> > buf(k:k+9) = ispg_idx(1:10); k=k+10
> > buf(k:k+9) = ispl_idx(1:10); k=k+10
> >
> > do i=1,Nworkers
> > call MPI_SEND(buf, Nbuf, MPI_INTEGER,
> > & i, i, MPI_COMM_WORLD, Ierr)
> >
> > enddo
> > print*, ''
> > print*, 'done sending int_distrib1'
> > print*, ''
> > endif ! (Master)
> > c
> > c
> > if (Worker) then
> > call MPI_RECV(buf, Nbuf, MPI_INTEGER, 0, MyId,
> > & MPI_COMM_WORLD, status, ierr)
> > iend = buf(1)
> > ! /aqspid/ in aqindx.cmm stuff
> > iair = buf(2)
> > ih2o = buf(3)
> > io2 = buf(4)
> > ico = buf(5)
> > ino2 = buf(6)
> > iho2 = buf(7)
> > iso2 = buf(8)
> > io3 = buf(9)
> > ich4 = buf(10)
> > ico2 = buf(11)
> > ih2 = buf(12)
> > in2 = buf(13)
> > itrace= buf(14)
> > k=15
> > ispg_idx(1:10) = buf(k:k+9); k=k+10
> > ispl_idx(1:10) = buf(k:k+9); k=k+10
> > print*, ''
> > print*, 'done receiving int_distrib1'
> > print*, ''
> > endif ! (Worker)
> > c
> > end subroutine int_distrib1
> >
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120213/d7b808df/attachment.htm>
More information about the mpich-discuss
mailing list