[mpich-discuss] Error in WAIT_MPI in a subroutine

Anthony Chan chan at mcs.anl.gov
Wed Sep 14 11:05:06 CDT 2011



----- Original Message -----

> so you don't need to set it before using it.
> Is there any line that I should cancel?

>> BTW, should I have a pre-default of msids (msid,msid1,msid2,...msid7)?

I interpreted you were asking if msids need to be set before using them
in non-blocking pt2pt calls.  Did I misinterpret your question ?

I only took a quick look at your MPI code, I don't see anything obviously
wrong, but I could have missed it.  However, I notice you used MPI_Wait for
all your non-blocking pt2pt calls, naively I would use MPI_Waitall
on some of these requests but that is just optimization not correctness issue.

A.Chan 
 
> 
> Best Regards
> Sincerely
> Amin
> 
> 
> On Wed, Sep 14, 2011 at 12:43 AM, Anthony Chan <chan at mcs.anl.gov>
> wrote:
> 
> >
> > > [hx001:18110] *** An error occurred in MPI_Wait
> > > [hx001:18110] *** on communicator MPI_COMM_WORLD
> > > [hx001:18110] *** MPI_ERR_TRUNCATE: message truncated
> > > [hx001:18110] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [hx001:18111] *** An error occurred in MPI_Wait
> > > [hx001:18111] *** on communicator MPI_COMM_WORLD
> > > [hx001:18111] *** MPI_ERR_TRUNCATE: message truncated
> > > [hx001:18111] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > mpirun noticed that job rank 0 with PID 18109 on node hx001 exited
> > > on
> > signal 15 (Terminated).
> > > 1 additional process aborted (not shown)
> >
> > Are you using OpenMPI or MPICH2?
> > MPI_Request is set by MPI, so you don't need to set it before using
> > it.
> >
> > A.Chan
> >
> > ----- Original Message -----
> > > Hi
> > >
> > > I`m running a program including th following subroutine:
> > > c---------------------SUBROUTINE Particle_Passing Begin Here
> > > --------------------------
> > > SUBROUTINE Particle_Passing(ions,lecs,xi,yi,zi,xe,ye,ze
> > > & ,ui,vi,wi,ue,ve,we,mh,Max_p,buf_size,CRye,CRze,CRue
> > > & ,CP_send,CP_recv,CRxi,CRyi,CRzi,CRui,CRvi,CRwi,CRxe
> > > & ,CRve,CRwe,CP_Rx,CP_Ry,CP_Rz,CP_Ru,CP_Rv,CP_Rw,CP_Sx
> > > & ,CLxi,CLyi,CLzi,CLui,CLvi,CLwi,CLxe,CLye,CLze,CLue
> > > & ,CLve,CLwe,CP_Sy,CP_Sz,CP_Su,CP_Sv,CP_Sw,LABEL_p
> > > & ,ionsR,lecsR,ionsL,lecsL,mpass,kstrt,Nproc,ierr)
> > >
> > > Integer ions,lecs,mh,mpass
> > > REAL xe,ye,ze,ue,ve,we
> > > REAL xi,yi,zi,ui,vi,wi
> > > DIMENSION xe(mh),ye(mh),ze(mh),ue(mh),ve(mh),we(mh)
> > > DIMENSION xi(mh),yi(mh),zi(mh),ui(mh),vi(mh),wi(mh)
> > >
> > > Integer MAX_p
> > > C Real MAX_p
> > >
> > >
> > >
> > > integer::ionsR,lecsR,ionsL,lecsL,buf_size
> > > integer::LABEL_p,CP_send,CP_recv
> > > REAL,DIMENSION(mpass)::CRxi,CRyi,CRzi,CRui,CRvi,CRwi
> > > REAL,DIMENSION(mpass)::CRxe,CRye,CRze,CRue,CRve,CRwe
> > > REAL,DIMENSION(mpass)::CLxi,CLyi,CLzi,CLui,CLvi,CLwi
> > > REAL,DIMENSION(mpass)::CLxe,CLye,CLze,CLue,CLve,CLwe
> > > REAL,DIMENSION(buf_size)::CP_Sx,CP_Sy,CP_Sz
> > > REAL,DIMENSION(buf_size)::CP_Su,CP_Sv,CP_Sw
> > > REAL,DIMENSION(buf_size)::CP_Rx,CP_Ry,CP_Rz
> > > REAL,DIMENSION(buf_size)::CP_Ru,CP_Rv,CP_Rw
> > >
> > > REAL,DIMENSION(6,buf_size)::CP_S6,CP_R6
> > >
> > > integer kstrt, Nproc
> > > integer ierr
> > >
> > > c common block for parallel processing
> > > integer nprocc, lgrp, lstat, mreal, mint, mcplx, mdouble, lworld
> > > c lstat = length of status array
> > > parameter(lstat=10)
> > > c lgrp = current communicator
> > > c mreal = default datatype for reals
> > > common /PPARMS/ nprocc, lgrp, mreal, mint, mcplx, mdouble, lworld
> > > c local data
> > > integer j, ks, moff, kl, kr
> > > integer istatus, msid,msid1,msid2,msid3,msid4
> > > integer msid5,msid6,msid7
> > > dimension istatus(lstat)
> > >
> > > integer nypmx
> > > C integer status1(10)
> > > integer, DIMENSION(1):: mypm, iwork1
> > > C dimension mypm(1),iwork1(1)
> > >
> > > mypm(1)=ionsR
> > >
> > > call PPIMAX(mypm,iwork1,1)
> > > nypmx=mypm(1)
> > >
> > > Max_p=nypmx/buf_size+1
> > > C Max_p=MAXVAL(ionsR)/buf_size+1 !change
> > > LABEL_p=0
> > > do 200 n_p=1,Max_p
> > > CP_send=0
> > > do i=1,buf_size
> > > n=LABEL_p+i
> > > CP_Sx(i)=CRXi(n)
> > > CP_Sy(i)=CRyi(n)
> > > CP_Sz(i)=CRzi(n)
> > >
> > > CP_Su(i)=CRui(n)
> > > CP_Sv(i)=CRvi(n)
> > > CP_Sw(i)=CRwi(n)
> > > end do
> > > CP_send=buf_size
> > > if(n.gt.ionsR)then
> > > CP_send=ionsR - LABEL_p
> > > end if
> > > LABEL_p=n
> > >
> > > do i=1,buf_size
> > > CP_S6(1,i)=CP_Sx(i)
> > > CP_S6(2,i)=CP_Sy(i)
> > > CP_S6(3,i)=CP_Sz(i)
> > >
> > > CP_S6(4,i)=CP_Su(i)
> > > CP_S6(5,i)=CP_Sv(i)
> > > CP_S6(6,i)=CP_Sw(i)
> > > end do
> > >
> > > C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> > > C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> > > c CP_Ry=CSHIFT(CP_Sy,-1,2) !change
> > > C CP_Rz=CSHIFT(CP_Sz,-1,2) !change
> > > c CP_Ru=CSHIFT(CP_Su,-1,2) !change
> > > C CP_Rv=CSHIFT(CP_Sv,-1,2) !change
> > > C CP_Rw=CSHIFT(CP_Sw,-1,2) !change
> > >
> > > C CP_recv=CSHIFT(CP_send,-1,1) !change
> > > C CP_recv=buf_size
> > >
> > > ks = kstrt - 1
> > > moff = FZ_strd*mz+2 !my*mz + 2
> > > c copy to guard cells
> > > kr = ks + 1
> > > if (kr.ge.Nproc) kr = kr - Nproc
> > > kl = ks - 1
> > > if (kl.lt.0) kl = kl + Nproc
> > >
> > >
> > >
> > >
> > > call MPI_IRECV(CP_recv,1,mint,kl,1,lgrp,msid
> > > & ,ierr)
> > > call MPI_SEND(CP_send,1,mint,kr,1,lgrp,ierr)
> > > call MPI_WAIT(msid,istatus,ierr)
> > > print*,kstrt,lecsR,kr,CP_recv,CP_send,11111113
> > > call MPI_IRECV(CP_R6,CP_recv,mreal,kl,2,lgrp,msid1
> > > & ,ierr)
> > > call MPI_SEND(CP_S6,CP_recv,mreal,kr,2,lgrp,ierr)
> > > call MPI_WAIT(msid1,istatus,ierr)
> > >
> > > if (CP_recv.gt.0) then
> > > do n=1,CP_recv
> > > ions=ions+1
> > > xi(ions)=CP_R6(1,n)
> > > yi(ions)=CP_R6(2,n)
> > > zi(ions)=CP_R6(3,n)
> > >
> > > ui(ions)=CP_R6(4,n)
> > > vi(ions)=CP_R6(5,n)
> > > wi(ions)=CP_R6(6,n)
> > > end do
> > > End if
> > > 110 continue
> > > C.......................
> > > 200 continue
> > >
> > > C------------------------------
> > >
> > > mypm(1)=lecsR
> > > call PPIMAX(mypm,iwork1,1)
> > > nypmx=mypm(1)
> > >
> > >
> > > Max_p=nypmx/buf_size+1
> > > C Max_p=MAXVAL(lecsR)/buf_size+1 !change
> > >
> > > LABEL_p=0
> > > do 400 n_p=1,Max_p
> > > CP_send=0
> > > do i=1,buf_size
> > > n=LABEL_p+i
> > > CP_Sx(i)=CRXe(n)
> > > CP_Sy(i)=CRye(n)
> > > CP_Sz(i)=CRze(n)
> > >
> > > CP_Su(i)=CRue(n)
> > > CP_Sv(i)=CRve(n)
> > > CP_Sw(i)=CRwe(n)
> > > end do
> > > CP_send=buf_size
> > > if(n.gt.lecsR)then
> > > CP_send=lecsR - LABEL_p
> > > end if
> > >
> > > LABEL_p=n
> > >
> > > C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> > > C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> > > C CP_Ry=CSHIFT(CP_Sy,-1,2) !change
> > > C CP_Rz=CSHIFT(CP_Sz,-1,2) !change
> > > C CP_Ru=CSHIFT(CP_Su,-1,2) !change
> > > C CP_Rv=CSHIFT(CP_Sv,-1,2) !change
> > > C CP_Rw=CSHIFT(CP_Sw,-1,2) !change
> > >
> > > C CP_recv=CSHIFT(CP_send,-1,1) !change
> > >
> > > do i=1,buf_size
> > > CP_S6(1,i)=CP_Sx(i)
> > > CP_S6(2,i)=CP_Sy(i)
> > > CP_S6(3,i)=CP_Sz(i)
> > >
> > > CP_S6(4,i)=CP_Su(i)
> > > CP_S6(5,i)=CP_Sv(i)
> > > CP_S6(6,i)=CP_Sw(i)
> > > end do
> > >
> > >
> > > call MPI_IRECV(CP_recv,1,mint,kl,3,lgrp,msid2
> > > & ,ierr)
> > > call MPI_SEND(CP_send,1,mint,kr,3,lgrp,ierr)
> > > call MPI_WAIT(msid2,istatus,ierr)
> > >
> > > call MPI_IRECV(CP_R6,CP_recv,mreal,kl,4,lgrp,msid3
> > > & ,ierr)
> > > call MPI_SEND(CP_S6,CP_recv,mreal,kr,4,lgrp,ierr)
> > > call MPI_WAIT(msid3,istatus,ierr)
> > >
> > > if (CP_recv.gt.0) then
> > > do n=1,CP_recv
> > > lecs=lecs+1
> > > xe(lecs)=CP_R6(1,n)
> > > ye(lecs)=CP_R6(2,n)
> > > ze(lecs)=CP_R6(3,n)
> > >
> > > ue(lecs)=CP_R6(4,n)
> > > ve(lecs)=CP_R6(5,n)
> > > we(lecs)=CP_R6(6,n)
> > > end do
> > > End if
> > > 310 continue
> > > C.......................
> > > 400 continue
> > > C------------------
> > > mypm(1)=ionsL
> > > call PPIMAX(mypm,iwork1,1)
> > > nypmx=mypm(1)
> > > Max_p=nypmx/buf_size+1
> > > C Max_p=MAXVAL(ionsL)/buf_size+1 !change
> > > LABEL_p=0
> > > do 600 n_p=1,Max_p
> > > CP_send=0
> > > do i=1,buf_size
> > > n=LABEL_p+i
> > > CP_Sx(i)=CLXi(n)
> > > CP_Sy(i)=CLyi(n)
> > > CP_Sz(i)=CLzi(n)
> > >
> > > CP_Su(i)=CLui(n)
> > > CP_Sv(i)=CLvi(n)
> > > CP_Sw(i)=CLwi(n)
> > > end do
> > > CP_send=buf_size
> > > if(n.gt.ionsL)then
> > > CP_send=ionsL - LABEL_p
> > > end if
> > > LABEL_p=n
> > >
> > > do i=1,buf_size
> > > CP_S6(1,i)=CP_Sx(i)
> > > CP_S6(2,i)=CP_Sy(i)
> > > CP_S6(3,i)=CP_Sz(i)
> > >
> > > CP_S6(4,i)=CP_Su(i)
> > > CP_S6(5,i)=CP_Sv(i)
> > > CP_S6(6,i)=CP_Sw(i)
> > > end do
> > >
> > >
> > > C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> > > C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> > > C CP_Ry=CSHIFT(CP_Sy,+1,2) !change
> > > C CP_Rz=CSHIFT(CP_Sz,+1,2) !change
> > > C CP_Ru=CSHIFT(CP_Su,+1,2) !change
> > > C CP_Rv=CSHIFT(CP_Sv,+1,2) !change
> > > C CP_Rw=CSHIFT(CP_Sw,+1,2) !change
> > >
> > > C CP_recv=CSHIFT(CP_send,+1,1) !change
> > >
> > > call MPI_IRECV(CP_recv,1,mint,kr,5,lgrp,msid4
> > > & ,ierr)
> > > call MPI_SEND(CP_send,1,mint,kl,5,lgrp,ierr)
> > > call MPI_WAIT(msid4,istatus,ierr)
> > >
> > > call MPI_IRECV(CP_R6,CP_recv,mreal,kr,6,lgrp,msid5
> > > & ,ierr)
> > > call MPI_SEND(CP_S6,CP_recv,mreal,kl,6,lgrp,ierr)
> > > call MPI_WAIT(msid5,istatus,ierr)
> > >
> > > if (CP_recv.gt.0) then
> > > do n=1,CP_recv
> > > ions=ions+1
> > > xi(ions)=CP_R6(1,n)
> > > yi(ions)=CP_R6(2,n)
> > > zi(ions)=CP_R6(3,n)
> > >
> > > ui(ions)=CP_R6(4,n)
> > > vi(ions)=CP_R6(5,n)
> > > wi(ions)=CP_R6(6,n)
> > > end do
> > > End if
> > > 510 continue
> > > C.......................
> > > 600 continue
> > > C--------------------
> > > mypm(1)=lecsL
> > > call PPIMAX(mypm,iwork1,1)
> > > nypmx=mypm(1)
> > > Max_p=nypmx/buf_size+1
> > > C Max_p=MAXVAL(lecsL)/buf_size+1 !change
> > > LABEL_p=0
> > > do 800 n_p=1,Max_p
> > > CP_send=0
> > > do i=1,buf_size
> > > n=LABEL_p+i
> > > CP_Sx(i)=CLXe(n)
> > > CP_Sy(i)=CLye(n)
> > > CP_Sz(i)=CLze(n)
> > >
> > > CP_Su(i)=CLue(n)
> > > CP_Sv(i)=CLve(n)
> > > CP_Sw(i)=CLwe(n)
> > > end do
> > > CP_send=buf_size
> > > if(n.gt.lecsL)then
> > > CP_send=lecsL - LABEL_p
> > > end if
> > > LABEL_p=n
> > >
> > > C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> > > C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> > > C CP_Ry=CSHIFT(CP_Sy,+1,2) !change
> > > C CP_Rz=CSHIFT(CP_Sz,+1,2) !change
> > > C CP_Ru=CSHIFT(CP_Su,+1,2) !change
> > > C CP_Rv=CSHIFT(CP_Sv,+1,2) !change
> > > C CP_Rw=CSHIFT(CP_Sw,+1,2) !change
> > >
> > > C CP_recv=CSHIFT(CP_send,-1,1) !change
> > >
> > > do i=1,buf_size
> > > CP_S6(1,i)=CP_Sx(i)
> > > CP_S6(2,i)=CP_Sy(i)
> > > CP_S6(3,i)=CP_Sz(i)
> > >
> > > CP_S6(4,i)=CP_Su(i)
> > > CP_S6(5,i)=CP_Sv(i)
> > > CP_S6(6,i)=CP_Sw(i)
> > > end do
> > >
> > > call MPI_IRECV(CP_recv,1,mint,kr,7,lgrp,msid6
> > > & ,ierr)
> > > call MPI_SEND(CP_send,1,mint,kl,7,lgrp,ierr)
> > > call MPI_WAIT(msid6,istatus,ierr)
> > >
> > > call MPI_IRECV(CP_R6,CP_recv,mreal,kr,8,lgrp,msid7
> > > & ,ierr)
> > > call MPI_SEND(CP_S6,CP_recv,mreal,kl,8,lgrp,ierr)
> > > call MPI_WAIT(msid7,istatus,ierr)
> > >
> > > C print*,mreal,mint,lgrp,CP_send,msid,1111114
> > > C print*,kstrt,kl,kr,CP_recv,CP_send,11111113
> > >
> > > if (CP_recv.gt.0) then
> > > do n=1,CP_recv
> > > lecs=lecs+1
> > > xe(lecs)=CP_R6(1,n)
> > > ye(lecs)=CP_R6(2,n)
> > > ze(lecs)=CP_R6(3,n)
> > >
> > > ue(lecs)=CP_R6(4,n)
> > > ve(lecs)=CP_R6(5,n)
> > > we(lecs)=CP_R6(6,n)
> > > end do
> > > End if
> > > 710 continue
> > > C.......................
> > > 800 continue
> > > End Subroutine
> > > c---------------------SUBROUTINE Particle_Passing End Here
> > > --------------------------
> > > And I`m receiving the following error:
> > >
> > > [hx001:18110] *** An error occurred in MPI_Wait
> > > [hx001:18110] *** on communicator MPI_COMM_WORLD
> > > [hx001:18110] *** MPI_ERR_TRUNCATE: message truncated
> > > [hx001:18110] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [hx001:18111] *** An error occurred in MPI_Wait
> > > [hx001:18111] *** on communicator MPI_COMM_WORLD
> > > [hx001:18111] *** MPI_ERR_TRUNCATE: message truncated
> > > [hx001:18111] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > mpirun noticed that job rank 0 with PID 18109 on node hx001 exited
> > > on
> > > signal
> > > 15 (Terminated).
> > > 1 additional process aborted (not shown)
> > >
> > > is it possible also to be cause of the following routine:
> > > call PPIMAX(mypm,iwork1,1)
> > >
> > > subroutine PPIMAX(if,ig,nxp)
> > > c this subroutine finds parallel maximum for each element of a
> > > vector
> > > c that is, if(j,k) = maximum as a function of k of if(j,k)
> > > c at the end, all processors contain the same maximum.
> > > c if = input and output integer data
> > > c ig = scratch integer array
> > > c nxp = number of data values in vector
> > > implicit none
> > > integer if, ig
> > > integer nxp
> > > dimension if(nxp), ig(nxp)
> > > c common block for parallel processing
> > > integer nproc, lgrp, lstat, mreal, mint, mcplx, mdouble, lworld
> > > integer msum, mmax
> > > parameter(lstat=10)
> > > c lgrp = current communicator
> > > c mint = default datatype for integers
> > > common /PPARMS/ nproc, lgrp, mreal, mint, mcplx, mdouble, lworld
> > > c mmax = MPI_MAX
> > > common /PPARMSX/ msum, mmax
> > > c local data
> > > integer j, ierr
> > > c find maximum
> > > call MPI_ALLREDUCE(if,ig,nxp,mint,mmax,lgrp,ierr)
> > > c copy output from scratch array
> > > do 10 j = 1, nxp
> > > if(j) = ig(j)
> > > 10 continue
> > > return
> > > end
> > > c-----------------------------------------------------------------------
> > >
> > > BTW, should I have a pre-default of msids
> > > (msid,msid1,msid2,...msid7)?
> > >
> > > Best Regards
> > > Sincerely
> > > Amin
> > >
> > > _______________________________________________
> > > mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> > > To manage subscription options or unsubscribe:
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > _______________________________________________
> > mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >


More information about the mpich-discuss mailing list