[mpich-discuss] Error in WAIT_MPI in a subroutine

Anthony Chan chan at mcs.anl.gov
Tue Sep 13 10:43:32 CDT 2011


> [hx001:18110] *** An error occurred in MPI_Wait
> [hx001:18110] *** on communicator MPI_COMM_WORLD
> [hx001:18110] *** MPI_ERR_TRUNCATE: message truncated
> [hx001:18110] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [hx001:18111] *** An error occurred in MPI_Wait
> [hx001:18111] *** on communicator MPI_COMM_WORLD
> [hx001:18111] *** MPI_ERR_TRUNCATE: message truncated
> [hx001:18111] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 0 with PID 18109 on node hx001 exited on signal 15 (Terminated).
> 1 additional process aborted (not shown)

Are you using OpenMPI or MPICH2?
MPI_Request is set by MPI, so you don't need to set it before using it.

A.Chan

----- Original Message -----
> Hi
> 
> I`m running a program including th following subroutine:
> c---------------------SUBROUTINE Particle_Passing Begin Here
> --------------------------
> SUBROUTINE Particle_Passing(ions,lecs,xi,yi,zi,xe,ye,ze
> & ,ui,vi,wi,ue,ve,we,mh,Max_p,buf_size,CRye,CRze,CRue
> & ,CP_send,CP_recv,CRxi,CRyi,CRzi,CRui,CRvi,CRwi,CRxe
> & ,CRve,CRwe,CP_Rx,CP_Ry,CP_Rz,CP_Ru,CP_Rv,CP_Rw,CP_Sx
> & ,CLxi,CLyi,CLzi,CLui,CLvi,CLwi,CLxe,CLye,CLze,CLue
> & ,CLve,CLwe,CP_Sy,CP_Sz,CP_Su,CP_Sv,CP_Sw,LABEL_p
> & ,ionsR,lecsR,ionsL,lecsL,mpass,kstrt,Nproc,ierr)
> 
> Integer ions,lecs,mh,mpass
> REAL xe,ye,ze,ue,ve,we
> REAL xi,yi,zi,ui,vi,wi
> DIMENSION xe(mh),ye(mh),ze(mh),ue(mh),ve(mh),we(mh)
> DIMENSION xi(mh),yi(mh),zi(mh),ui(mh),vi(mh),wi(mh)
> 
> Integer MAX_p
> C Real MAX_p
> 
> 
> 
> integer::ionsR,lecsR,ionsL,lecsL,buf_size
> integer::LABEL_p,CP_send,CP_recv
> REAL,DIMENSION(mpass)::CRxi,CRyi,CRzi,CRui,CRvi,CRwi
> REAL,DIMENSION(mpass)::CRxe,CRye,CRze,CRue,CRve,CRwe
> REAL,DIMENSION(mpass)::CLxi,CLyi,CLzi,CLui,CLvi,CLwi
> REAL,DIMENSION(mpass)::CLxe,CLye,CLze,CLue,CLve,CLwe
> REAL,DIMENSION(buf_size)::CP_Sx,CP_Sy,CP_Sz
> REAL,DIMENSION(buf_size)::CP_Su,CP_Sv,CP_Sw
> REAL,DIMENSION(buf_size)::CP_Rx,CP_Ry,CP_Rz
> REAL,DIMENSION(buf_size)::CP_Ru,CP_Rv,CP_Rw
> 
> REAL,DIMENSION(6,buf_size)::CP_S6,CP_R6
> 
> integer kstrt, Nproc
> integer ierr
> 
> c common block for parallel processing
> integer nprocc, lgrp, lstat, mreal, mint, mcplx, mdouble, lworld
> c lstat = length of status array
> parameter(lstat=10)
> c lgrp = current communicator
> c mreal = default datatype for reals
> common /PPARMS/ nprocc, lgrp, mreal, mint, mcplx, mdouble, lworld
> c local data
> integer j, ks, moff, kl, kr
> integer istatus, msid,msid1,msid2,msid3,msid4
> integer msid5,msid6,msid7
> dimension istatus(lstat)
> 
> integer nypmx
> C integer status1(10)
> integer, DIMENSION(1):: mypm, iwork1
> C dimension mypm(1),iwork1(1)
> 
> mypm(1)=ionsR
> 
> call PPIMAX(mypm,iwork1,1)
> nypmx=mypm(1)
> 
> Max_p=nypmx/buf_size+1
> C Max_p=MAXVAL(ionsR)/buf_size+1 !change
> LABEL_p=0
> do 200 n_p=1,Max_p
> CP_send=0
> do i=1,buf_size
> n=LABEL_p+i
> CP_Sx(i)=CRXi(n)
> CP_Sy(i)=CRyi(n)
> CP_Sz(i)=CRzi(n)
> 
> CP_Su(i)=CRui(n)
> CP_Sv(i)=CRvi(n)
> CP_Sw(i)=CRwi(n)
> end do
> CP_send=buf_size
> if(n.gt.ionsR)then
> CP_send=ionsR - LABEL_p
> end if
> LABEL_p=n
> 
> do i=1,buf_size
> CP_S6(1,i)=CP_Sx(i)
> CP_S6(2,i)=CP_Sy(i)
> CP_S6(3,i)=CP_Sz(i)
> 
> CP_S6(4,i)=CP_Su(i)
> CP_S6(5,i)=CP_Sv(i)
> CP_S6(6,i)=CP_Sw(i)
> end do
> 
> C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> c CP_Ry=CSHIFT(CP_Sy,-1,2) !change
> C CP_Rz=CSHIFT(CP_Sz,-1,2) !change
> c CP_Ru=CSHIFT(CP_Su,-1,2) !change
> C CP_Rv=CSHIFT(CP_Sv,-1,2) !change
> C CP_Rw=CSHIFT(CP_Sw,-1,2) !change
> 
> C CP_recv=CSHIFT(CP_send,-1,1) !change
> C CP_recv=buf_size
> 
> ks = kstrt - 1
> moff = FZ_strd*mz+2 !my*mz + 2
> c copy to guard cells
> kr = ks + 1
> if (kr.ge.Nproc) kr = kr - Nproc
> kl = ks - 1
> if (kl.lt.0) kl = kl + Nproc
> 
> 
> 
> 
> call MPI_IRECV(CP_recv,1,mint,kl,1,lgrp,msid
> & ,ierr)
> call MPI_SEND(CP_send,1,mint,kr,1,lgrp,ierr)
> call MPI_WAIT(msid,istatus,ierr)
> print*,kstrt,lecsR,kr,CP_recv,CP_send,11111113
> call MPI_IRECV(CP_R6,CP_recv,mreal,kl,2,lgrp,msid1
> & ,ierr)
> call MPI_SEND(CP_S6,CP_recv,mreal,kr,2,lgrp,ierr)
> call MPI_WAIT(msid1,istatus,ierr)
> 
> if (CP_recv.gt.0) then
> do n=1,CP_recv
> ions=ions+1
> xi(ions)=CP_R6(1,n)
> yi(ions)=CP_R6(2,n)
> zi(ions)=CP_R6(3,n)
> 
> ui(ions)=CP_R6(4,n)
> vi(ions)=CP_R6(5,n)
> wi(ions)=CP_R6(6,n)
> end do
> End if
> 110 continue
> C.......................
> 200 continue
> 
> C------------------------------
> 
> mypm(1)=lecsR
> call PPIMAX(mypm,iwork1,1)
> nypmx=mypm(1)
> 
> 
> Max_p=nypmx/buf_size+1
> C Max_p=MAXVAL(lecsR)/buf_size+1 !change
> 
> LABEL_p=0
> do 400 n_p=1,Max_p
> CP_send=0
> do i=1,buf_size
> n=LABEL_p+i
> CP_Sx(i)=CRXe(n)
> CP_Sy(i)=CRye(n)
> CP_Sz(i)=CRze(n)
> 
> CP_Su(i)=CRue(n)
> CP_Sv(i)=CRve(n)
> CP_Sw(i)=CRwe(n)
> end do
> CP_send=buf_size
> if(n.gt.lecsR)then
> CP_send=lecsR - LABEL_p
> end if
> 
> LABEL_p=n
> 
> C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> C CP_Rx=CSHIFT(CP_Sx,-1,2) !change
> C CP_Ry=CSHIFT(CP_Sy,-1,2) !change
> C CP_Rz=CSHIFT(CP_Sz,-1,2) !change
> C CP_Ru=CSHIFT(CP_Su,-1,2) !change
> C CP_Rv=CSHIFT(CP_Sv,-1,2) !change
> C CP_Rw=CSHIFT(CP_Sw,-1,2) !change
> 
> C CP_recv=CSHIFT(CP_send,-1,1) !change
> 
> do i=1,buf_size
> CP_S6(1,i)=CP_Sx(i)
> CP_S6(2,i)=CP_Sy(i)
> CP_S6(3,i)=CP_Sz(i)
> 
> CP_S6(4,i)=CP_Su(i)
> CP_S6(5,i)=CP_Sv(i)
> CP_S6(6,i)=CP_Sw(i)
> end do
> 
> 
> call MPI_IRECV(CP_recv,1,mint,kl,3,lgrp,msid2
> & ,ierr)
> call MPI_SEND(CP_send,1,mint,kr,3,lgrp,ierr)
> call MPI_WAIT(msid2,istatus,ierr)
> 
> call MPI_IRECV(CP_R6,CP_recv,mreal,kl,4,lgrp,msid3
> & ,ierr)
> call MPI_SEND(CP_S6,CP_recv,mreal,kr,4,lgrp,ierr)
> call MPI_WAIT(msid3,istatus,ierr)
> 
> if (CP_recv.gt.0) then
> do n=1,CP_recv
> lecs=lecs+1
> xe(lecs)=CP_R6(1,n)
> ye(lecs)=CP_R6(2,n)
> ze(lecs)=CP_R6(3,n)
> 
> ue(lecs)=CP_R6(4,n)
> ve(lecs)=CP_R6(5,n)
> we(lecs)=CP_R6(6,n)
> end do
> End if
> 310 continue
> C.......................
> 400 continue
> C------------------
> mypm(1)=ionsL
> call PPIMAX(mypm,iwork1,1)
> nypmx=mypm(1)
> Max_p=nypmx/buf_size+1
> C Max_p=MAXVAL(ionsL)/buf_size+1 !change
> LABEL_p=0
> do 600 n_p=1,Max_p
> CP_send=0
> do i=1,buf_size
> n=LABEL_p+i
> CP_Sx(i)=CLXi(n)
> CP_Sy(i)=CLyi(n)
> CP_Sz(i)=CLzi(n)
> 
> CP_Su(i)=CLui(n)
> CP_Sv(i)=CLvi(n)
> CP_Sw(i)=CLwi(n)
> end do
> CP_send=buf_size
> if(n.gt.ionsL)then
> CP_send=ionsL - LABEL_p
> end if
> LABEL_p=n
> 
> do i=1,buf_size
> CP_S6(1,i)=CP_Sx(i)
> CP_S6(2,i)=CP_Sy(i)
> CP_S6(3,i)=CP_Sz(i)
> 
> CP_S6(4,i)=CP_Su(i)
> CP_S6(5,i)=CP_Sv(i)
> CP_S6(6,i)=CP_Sw(i)
> end do
> 
> 
> C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> C CP_Ry=CSHIFT(CP_Sy,+1,2) !change
> C CP_Rz=CSHIFT(CP_Sz,+1,2) !change
> C CP_Ru=CSHIFT(CP_Su,+1,2) !change
> C CP_Rv=CSHIFT(CP_Sv,+1,2) !change
> C CP_Rw=CSHIFT(CP_Sw,+1,2) !change
> 
> C CP_recv=CSHIFT(CP_send,+1,1) !change
> 
> call MPI_IRECV(CP_recv,1,mint,kr,5,lgrp,msid4
> & ,ierr)
> call MPI_SEND(CP_send,1,mint,kl,5,lgrp,ierr)
> call MPI_WAIT(msid4,istatus,ierr)
> 
> call MPI_IRECV(CP_R6,CP_recv,mreal,kr,6,lgrp,msid5
> & ,ierr)
> call MPI_SEND(CP_S6,CP_recv,mreal,kl,6,lgrp,ierr)
> call MPI_WAIT(msid5,istatus,ierr)
> 
> if (CP_recv.gt.0) then
> do n=1,CP_recv
> ions=ions+1
> xi(ions)=CP_R6(1,n)
> yi(ions)=CP_R6(2,n)
> zi(ions)=CP_R6(3,n)
> 
> ui(ions)=CP_R6(4,n)
> vi(ions)=CP_R6(5,n)
> wi(ions)=CP_R6(6,n)
> end do
> End if
> 510 continue
> C.......................
> 600 continue
> C--------------------
> mypm(1)=lecsL
> call PPIMAX(mypm,iwork1,1)
> nypmx=mypm(1)
> Max_p=nypmx/buf_size+1
> C Max_p=MAXVAL(lecsL)/buf_size+1 !change
> LABEL_p=0
> do 800 n_p=1,Max_p
> CP_send=0
> do i=1,buf_size
> n=LABEL_p+i
> CP_Sx(i)=CLXe(n)
> CP_Sy(i)=CLye(n)
> CP_Sz(i)=CLze(n)
> 
> CP_Su(i)=CLue(n)
> CP_Sv(i)=CLve(n)
> CP_Sw(i)=CLwe(n)
> end do
> CP_send=buf_size
> if(n.gt.lecsL)then
> CP_send=lecsL - LABEL_p
> end if
> LABEL_p=n
> 
> C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> C CP_Rx=CSHIFT(CP_Sx,+1,2) !change
> C CP_Ry=CSHIFT(CP_Sy,+1,2) !change
> C CP_Rz=CSHIFT(CP_Sz,+1,2) !change
> C CP_Ru=CSHIFT(CP_Su,+1,2) !change
> C CP_Rv=CSHIFT(CP_Sv,+1,2) !change
> C CP_Rw=CSHIFT(CP_Sw,+1,2) !change
> 
> C CP_recv=CSHIFT(CP_send,-1,1) !change
> 
> do i=1,buf_size
> CP_S6(1,i)=CP_Sx(i)
> CP_S6(2,i)=CP_Sy(i)
> CP_S6(3,i)=CP_Sz(i)
> 
> CP_S6(4,i)=CP_Su(i)
> CP_S6(5,i)=CP_Sv(i)
> CP_S6(6,i)=CP_Sw(i)
> end do
> 
> call MPI_IRECV(CP_recv,1,mint,kr,7,lgrp,msid6
> & ,ierr)
> call MPI_SEND(CP_send,1,mint,kl,7,lgrp,ierr)
> call MPI_WAIT(msid6,istatus,ierr)
> 
> call MPI_IRECV(CP_R6,CP_recv,mreal,kr,8,lgrp,msid7
> & ,ierr)
> call MPI_SEND(CP_S6,CP_recv,mreal,kl,8,lgrp,ierr)
> call MPI_WAIT(msid7,istatus,ierr)
> 
> C print*,mreal,mint,lgrp,CP_send,msid,1111114
> C print*,kstrt,kl,kr,CP_recv,CP_send,11111113
> 
> if (CP_recv.gt.0) then
> do n=1,CP_recv
> lecs=lecs+1
> xe(lecs)=CP_R6(1,n)
> ye(lecs)=CP_R6(2,n)
> ze(lecs)=CP_R6(3,n)
> 
> ue(lecs)=CP_R6(4,n)
> ve(lecs)=CP_R6(5,n)
> we(lecs)=CP_R6(6,n)
> end do
> End if
> 710 continue
> C.......................
> 800 continue
> End Subroutine
> c---------------------SUBROUTINE Particle_Passing End Here
> --------------------------
> And I`m receiving the following error:
> 
> [hx001:18110] *** An error occurred in MPI_Wait
> [hx001:18110] *** on communicator MPI_COMM_WORLD
> [hx001:18110] *** MPI_ERR_TRUNCATE: message truncated
> [hx001:18110] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [hx001:18111] *** An error occurred in MPI_Wait
> [hx001:18111] *** on communicator MPI_COMM_WORLD
> [hx001:18111] *** MPI_ERR_TRUNCATE: message truncated
> [hx001:18111] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 0 with PID 18109 on node hx001 exited on
> signal
> 15 (Terminated).
> 1 additional process aborted (not shown)
> 
> is it possible also to be cause of the following routine:
> call PPIMAX(mypm,iwork1,1)
> 
> subroutine PPIMAX(if,ig,nxp)
> c this subroutine finds parallel maximum for each element of a vector
> c that is, if(j,k) = maximum as a function of k of if(j,k)
> c at the end, all processors contain the same maximum.
> c if = input and output integer data
> c ig = scratch integer array
> c nxp = number of data values in vector
> implicit none
> integer if, ig
> integer nxp
> dimension if(nxp), ig(nxp)
> c common block for parallel processing
> integer nproc, lgrp, lstat, mreal, mint, mcplx, mdouble, lworld
> integer msum, mmax
> parameter(lstat=10)
> c lgrp = current communicator
> c mint = default datatype for integers
> common /PPARMS/ nproc, lgrp, mreal, mint, mcplx, mdouble, lworld
> c mmax = MPI_MAX
> common /PPARMSX/ msum, mmax
> c local data
> integer j, ierr
> c find maximum
> call MPI_ALLREDUCE(if,ig,nxp,mint,mmax,lgrp,ierr)
> c copy output from scratch array
> do 10 j = 1, nxp
> if(j) = ig(j)
> 10 continue
> return
> end
> c-----------------------------------------------------------------------
> 
> BTW, should I have a pre-default of msids (msid,msid1,msid2,...msid7)?
> 
> Best Regards
> Sincerely
> Amin
> 
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list