[mpich-discuss] MPICH deadlock error
Sarika K
sarikauniv at gmail.com
Mon Feb 13 15:28:03 CST 2012
Dear MPICH-discuss group:
My work involves working with Fortran Code using MPICH for parallelization.
But I have a very limited experience with the details of MPICH
implementation. (I have always treated the MPICh part of the code as a
black box).
I am now working on porting the code across different machine
configurations. My modeling code works fine on some machines/servers. But
it also generates random MPI deadlock errors when running the simulations
across other machines/servers.
The error message is below.
"Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(174): MPI_Send(buf=0x7f4d9b375010, count=1, dtype=USER<vector>,
dest=1, tag=10001, MPI_COMM_WORLD) failed
MPID_Send(53): DEADLOCK: attempting to send a message to the local process
without a prior matching receive"
I searched this list/other resources for this error code and strongly
believe that there is a bug in the model MPI implementation code which
remains dormant in some environments and works fine due to the internal
buffering threshold dependance.
I am not sure if this is sufficient information, but attached below sample
subroutine (there are many inside the code) which generates the deadlock
error.
I would really appreciate any help/pointers from the group to fix this
error in our code.
Thanks in advance for your time and assistance,
Sarika
c-----------------------------------------------------------------------------------------------------------------------------
subroutine int_distrib1(iend)
c-----------------------
c Master distributes another bunch of integers to Workers
c-----------------------------------------------------------------------------------------------------------------------------
c
use ParallelDataMap
use CommDataTypes
implicit none
include 'mpif.h'
c
include 'aqmax.param'
include 'aqindx.cmm'
c
integer :: iend
integer, parameter :: Nbuf=35
integer :: i, j, k, buf(Nbuf), Ierr, status(MPI_STATUS_SIZE)
c
if (Master) then
! arguments
buf(1) = iend
! /aqspid/ in aqindx.cmm stuff
buf(2) = iair
buf(3) = ih2o
buf(4) = io2
buf(5) = ico
buf(6) = ino2
buf(7) = iho2
buf(8) = iso2
buf(9) = io3
buf(10)= ich4
buf(11)= ico2
buf(12)= ih2
buf(13)= in2
buf(14)= itrace
k=15
buf(k:k+9) = ispg_idx(1:10); k=k+10
buf(k:k+9) = ispl_idx(1:10); k=k+10
do i=1,Nworkers
call MPI_SEND(buf, Nbuf, MPI_INTEGER,
& i, i, MPI_COMM_WORLD, Ierr)
enddo
print*, ''
print*, 'done sending int_distrib1'
print*, ''
endif ! (Master)
c
c
if (Worker) then
call MPI_RECV(buf, Nbuf, MPI_INTEGER, 0, MyId,
& MPI_COMM_WORLD, status, ierr)
iend = buf(1)
! /aqspid/ in aqindx.cmm stuff
iair = buf(2)
ih2o = buf(3)
io2 = buf(4)
ico = buf(5)
ino2 = buf(6)
iho2 = buf(7)
iso2 = buf(8)
io3 = buf(9)
ich4 = buf(10)
ico2 = buf(11)
ih2 = buf(12)
in2 = buf(13)
itrace= buf(14)
k=15
ispg_idx(1:10) = buf(k:k+9); k=k+10
ispl_idx(1:10) = buf(k:k+9); k=k+10
print*, ''
print*, 'done receiving int_distrib1'
print*, ''
endif ! (Worker)
c
end subroutine int_distrib1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120213/77d0cc5e/attachment.htm>
More information about the mpich-discuss
mailing list