[mpich-discuss] Problem sometimes when running on winxp on >=2 processes and MPE_IBCAST

Jayesh Krishna jayesh at mcs.anl.gov
Thu May 8 12:27:59 CDT 2008


 Hi,
  I did not see the exact example source code in RS 6000 MPI programming
guide (The closest example was a template code which slightly differs from
the one that you sent us).
  After removing the MPI_Barrier() from for loop (keeping the MPI_Bcast()
still in for loop) I do not get any errors when running your program for
0< n <=4 procs. Send us your code and the logic of the code for further
analysis.
  Meanwhile the block distribution code (para_range -- to find the range
for block distribution) that you use in your source code is not valid if
you have less work to do than the number of procs (i.e., no need for
distribution --- You are using the array, jjsta, indices 3 to 6 in your
job = which is 4 elements to work with => the distribution range finder
code will fail if you run your code with > 4 procs).

Regards,
Jayesh
-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Ben Tay
Sent: Wednesday, May 07, 2008 7:45 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Problem sometimes when running on winxp on
>=2 processes and MPE_IBCAST

Hi,

I've removed the MPI_Barrier. The bound checking is also enabled. 
However the same error still happens, randomly when processes =2 and
always when processes =4. I did not encounter this error when I run it in
my school's servers.

I also just compile using MPICH. Interestingly, there is no problem at
all. So I guess this problem is due to MPICH2.

Thank you very much.

Jayesh Krishna wrote:
>
>  Hi,
>   Please find my observations below,
>
> 1) As Anthony pointed out you don't have to call MPI_Barrier() in a 
> loop for all processes (see usage of MPI collectives).
> 2) When running the program with more than 4 procs, some array 
> accesses are out of bounds (Try re-compiling your program with Run 
> time checking for "Array and String bounds" --> If you are using VS 
> check out "Configuration Properties" --> Fortran --> Runtime --> * for 
> setting the runtime checking)
>
> Regards,
> Jayesh
>
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Anthony Chan
> Sent: Wednesday, May 07, 2008 11:13 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Problem sometimes when running on winxp 
> on >=2 processes and MPE_IBCAST
>
>
> May not be related to the error that you saw.  You shouldn't call 
> MPI_Barrier and MPI_Bcast with a do loop over processes.
>
> A.Chan
> ----- "Ben Tay" <zonexo at gmail.com> wrote:
>
> > Hi Rajeev,
> >
> > I've attached the code. Thank you very much.
> >
> > Regards.
> >
> > Rajeev Thakur wrote:
> > > Can you send us the code?
> > >
> > > MPE_IBCAST is not a part of the MPI standard. There is no 
> > > equivalent
> > for it
> > > in MPICH2. You could spawn a thread that calls MPI_Bcast though
> > (after
> > > following all the caveats of MPI and threads as defined in the
> > standard).
> > >
> > > Rajeev
> > >
> > >  
> > >> -----Original Message-----
> > >> From: owner-mpich-discuss at mcs.anl.gov 
> > >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Ben Tay
> > >> Sent: Wednesday, May 07, 2008 10:25 AM
> > >> To: mpich-discuss at mcs.anl.gov
> > >> Subject: [mpich-discuss] Problem sometimes when running on winxp 
> > >> on
> > >> >=2 processes and MPE_IBCAST
> > >>
> > >> Hi,
> > >>
> > >> I tried to run a mpi code which is copied from an example by the 
> > >> RS 6000 book. It is supposed to broadcast and synchronize all
values.
> > >> When I ran it on my school's linux servers, there is no problem.
> > >> However, if I run it on my own winxp, on >=2 processes, sometimes 
> > >> it work, other times I get the error:
> > >>
> > >> [01:3216].....ERROR:result command received but the wait_list is 
> > >> empty.
> > >> [01:3216]...ERROR:unable to handle the command: "cmd=result
> > >> src=1 dest=1
> > >> tag=7 c
> > >> md_tag=3 cmd_orig=dbget ctx_key=1 value="port=1518 
> > >> description=gotchama-16e5ed i
> > >> fname=192.168.1.105 " result=DBS_SUCCESS "
> > >> [01:3216].ERROR:error closing the unknown context socket:
> > >> generic socket failure , error stack:
> > >> MPIDU_Sock_wait(2603): The I/O operation has been aborted because 
> > >> of either a th read exit or an application request.
> > >> (errno 995) [01:3216]..ERROR:sock_op_close returned while unknown 
> > >> context is in
> > >> state: SMPD_
> > >> IDLE
> > >>
> > >> Or
> > >>
> > >> [01:3308].....ERROR:result command received but the wait_list is 
> > >> empty.
> > >> [01:3308]...ERROR:unable to handle the command: "cmd=result
> > >> src=1 dest=1
> > >> tag=15
> > >> cmd_tag=5 cmd_orig=barrier ctx_key=0 result=DBS_SUCCESS "
> > >> [01:3308]..ERROR:sock_op_close returned while unknown context is
> > in
> > >> state: SMPD_
> > >> IDLE
> > >>
> > >> There is no problem if I run on 1 process. If it's >=4, then the 
> > >> error happens all the time. Moreover, it's a rather simple code 
> > >> and so there shouldn't be anything wrong with it.
> > >> Why is this so?
> > >>
> > >> Btw, the RS 6000 book also mention a routine called MPE_IBCAST, 
> > >> which is a non-blocking version of MPI_BCAST. Is there a similar 
> > >> routine in MPICH2?
> > >>
> > >> Thank you very much
> > >>
> > >> Regards.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>    
> > >
> > >
> > >  
> >
> >
> > program mpi_test2
> >
> > !     test to show updating for i,j double loop (partial continuous 
> data)
> > for specific req data only
> >
> > !     ie update u(2:6,2:6) values instead of all u values, also for 
> struct
> > data
> >
> > !     FVM use
> >
> > implicit none
> >
> > include "mpif.h"     
> >
> > integer, parameter :: size_x=8,size_y=8
> >
> > integer :: i,j,k,ierr,rank,nprocs,u(size_x,size_y)
> >
> > integer :: jsta,jend,jsta2,jend1,inext,iprev,isend1,irecv1,isend2
> >
> > integer :: irecv2,is,ie,js,je
> >
> > integer, allocatable :: jjsta(:), jjlen(:),jjreq(:),u_tmp(:,:)
> >
> > INTEGER istatus(MPI_STATUS_SIZE)
> >
> >
> >
> > call MPI_Init(ierr)
> >
> > call MPI_Comm_rank(MPI_COMM_WORLD,rank,ierr)
> >  
> > call MPI_Comm_size(MPI_COMM_WORLD,nprocs,ierr)
> >
> > allocate (jjsta(0:nprocs-1),jjlen(0:nprocs-1),jjreq(0:nprocs-1))
> >
> > is=3; ie=6;   js=3;   je=6
> >
> > allocate (u_tmp(is:ie,js:je))
> >
> >
> >
> > do k = 0, nprocs - 1
> >
> >       call para_range(js,je, nprocs, k, jsta, jend)
> >
> >       jjsta(k) = jsta
> >      
> >       jjlen(k) = (ie-is+1) * (jend - jsta + 1)
> >
> > end do
> >
> > call para_range(js, je, nprocs, rank , jsta, jend)
> >
> > do j=jsta,jend
> >
> >       do i=is,ie
> >
> >               u(i,j)=(j-1)*size_x+i
> >
> >              
> >
> >       end do
> >
> > end do
> >
> > do j=jsta,jend
> >
> >       do i=is,ie
> >
> >               u_tmp(i,j)=u(i,j)
> >
> >              
> >
> >       end do
> >
> > end do
> >
> > do k=0,nprocs-1
> >
> >       call MPI_Barrier(MPI_COMM_WORLD,ierr)
> >
> >       if (k==rank) then
> >
> >               print *, rank
> >
> >               write (*,'(8i5)') u
> >
> >              
> >
> >       end if
> >
> > end do
> >
> > do k = 0, nprocs - 1
> >
> >      
> >
> >       call MPI_BCAST(u_tmp(is,jjsta(k)), jjlen(k), MPI_Integer,k, 
> > MPI_COMM_WORLD, ierr)
> >
> > end do
> >
> >
> >
> >
> > deallocate (jjsta, jjlen, jjreq)
> >
> > u(is:ie,js:je)=u_tmp(is:ie,js:je)
> >
> >
> >
> > do k=0,nprocs-1
> >
> >       call MPI_Barrier(MPI_COMM_WORLD,ierr)
> >
> >       if (k==rank) then
> >
> >               print *, rank
> >
> >               write (*,'(8i5)') u
> >
> >              
> >
> >       end if
> >
> > end do
> >
> >
> >
> >
> > call MPI_Finalize(ierr)
> >
> > contains
> >
> > subroutine para_range(n1, n2, nprocs, irank, ista, iend)
> > !     block distribution
> >
> > integer n1 !The lowest value of the iteration variable (IN)
> >
> > integer n2 !The highest value of the iteration variable (IN)
> >
> > integer nprocs !The number of processes (IN)
> >
> > integer irank !The rank for which you want to know the range of
> > iterations(IN)
> >
> > integer ista !The lowest value of the iteration variable that 
> > process irank executes (OUT)
> >
> > integer iend !The highest value of the iteration variable that 
> > process irank executes (OUT)
> >
> > integer iwork1,iwork2
> >
> > iwork1 = (n2 - n1 + 1) / nprocs
> >
> > iwork2 = mod(n2 - n1 + 1, nprocs)
> >
> > ista = irank * iwork1 + n1 + min(irank, iwork2)
> >
> > iend = ista + iwork1 - 1
> >
> > if (iwork2 > irank) iend = iend + 1
> >
> > end subroutine para_range
> >
> > end program mpi_test2
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080508/2c18afb4/attachment.htm>


More information about the mpich-discuss mailing list