[mpich-discuss] Problem sometimes when running on winxp on >=2 processes and MPE_IBCAST
Jayesh Krishna
jayesh at mcs.anl.gov
Mon May 12 09:11:19 CDT 2008
Hi,
Can you try running the example program, cpi.exe (MPICH2\examples),
provided with MPICH2 with 2/4/8 procs?
# Please let us know the details on how you are running your MPI job
(mpiexec command, job script etc and its details.)
# Pls provide us the details of the machine you are running your MPI job
(proc, num of cores etc)
# Do you get the same error when running non-MPI programs (Try "mpiexec -n
8 hostname" on your system)?
I would also recommend uninstalling/re-installing MPICH2 on your system
and trying out the tests above.
Regards,
Jayesh
-----Original Message-----
From: Ben Tay [mailto:zonexo at gmail.com]
Sent: Sunday, May 11, 2008 9:16 AM
To: Jayesh Krishna
Subject: Re: [mpich-discuss] Problem sometimes when running on winxp on
>=2 processes and MPE_IBCAST
Hi,
I forgot that I modify the original template. Sorry about that. Anyway, I
tested again using the fpi example. On 4 processes, sometimes it work,
sometimes it doesn't and gives the error msg:
[01:3384].....ERROR:result command received but the wait_list is empty.
[01:3384]...ERROR:unable to handle the command: "cmd=result src=1 dest=1
tag=5 c
md_tag=1 cmd_orig=dbput ctx_key=1 result=DBS_SUCCESS "
[01:3384]..ERROR:sock_op_close returned while unknown context is in
state: SMPD_
IDLE
On 8 processes:
[01:1252].....ERROR:result command received but the wait_list is empty.
[01:1252]...ERROR:unable to handle the command: "cmd=result src=1 dest=1
tag=9 c
md_tag=1 cmd_orig=dbput ctx_key=3 result=DBS_SUCCESS "
[01:3576].....ERROR:result command received but the wait_list is empty.
[01:3576]...ERROR:unable to handle the command: "cmd=result src=1 dest=1
tag=11
cmd_tag=1 cmd_orig=dbput ctx_key=1 result=DBS_SUCCESS "
[01:1252]..ERROR:sock_op_close returned while unknown context is in
state: SMPD_
IDLE
[01:3576]..ERROR:sock_op_close returned while unknown context is in
state: SMPD_
IDLE
There seems to be sometime wrong with my MPICH2.
Thank you.
Regards.
Jayesh Krishna wrote:
>
> Hi,
> I did not see the exact example source code in RS 6000 MPI
> programming guide (The closest example was a template code which
> slightly differs from the one that you sent us).
> After removing the MPI_Barrier() from for loop (keeping the
> MPI_Bcast() still in for loop) I do not get any errors when running
> your program for 0< n <=4 procs. Send us your code and the logic of
> the code for further analysis.
> Meanwhile the block distribution code (para_range -- to find the
> range for block distribution) that you use in your source code is not
> valid if you have less work to do than the number of procs (i.e., no
> need for distribution --- You are using the array, jjsta, indices 3 to
> 6 in your job = which is 4 elements to work with => the distribution
> range finder code will fail if you run your code with > 4 procs).
>
> Regards,
> Jayesh
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Ben Tay
> Sent: Wednesday, May 07, 2008 7:45 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Problem sometimes when running on winxp
> on >=2 processes and MPE_IBCAST
>
> Hi,
>
> I've removed the MPI_Barrier. The bound checking is also enabled.
> However the same error still happens, randomly when processes =2 and
> always when processes =4. I did not encounter this error when I run it
> in my school's servers.
>
> I also just compile using MPICH. Interestingly, there is no problem at
> all. So I guess this problem is due to MPICH2.
>
> Thank you very much.
>
> Jayesh Krishna wrote:
> >
> > Hi,
> > Please find my observations below,
> >
> > 1) As Anthony pointed out you don't have to call MPI_Barrier() in a
> > loop for all processes (see usage of MPI collectives).
> > 2) When running the program with more than 4 procs, some array
> > accesses are out of bounds (Try re-compiling your program with Run
> > time checking for "Array and String bounds" --> If you are using VS
> > check out "Configuration Properties" --> Fortran --> Runtime --> *
> > for setting the runtime checking)
> >
> > Regards,
> > Jayesh
> >
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Anthony Chan
> > Sent: Wednesday, May 07, 2008 11:13 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: Re: [mpich-discuss] Problem sometimes when running on winxp
> > on >=2 processes and MPE_IBCAST
> >
> >
> > May not be related to the error that you saw. You shouldn't call
> > MPI_Barrier and MPI_Bcast with a do loop over processes.
> >
> > A.Chan
> > ----- "Ben Tay" <zonexo at gmail.com> wrote:
> >
> > > Hi Rajeev,
> > >
> > > I've attached the code. Thank you very much.
> > >
> > > Regards.
> > >
> > > Rajeev Thakur wrote:
> > > > Can you send us the code?
> > > >
> > > > MPE_IBCAST is not a part of the MPI standard. There is no
> > > > equivalent
> > > for it
> > > > in MPICH2. You could spawn a thread that calls MPI_Bcast though
> > > (after
> > > > following all the caveats of MPI and threads as defined in the
> > > standard).
> > > >
> > > > Rajeev
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: owner-mpich-discuss at mcs.anl.gov
> > > >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Ben Tay
> > > >> Sent: Wednesday, May 07, 2008 10:25 AM
> > > >> To: mpich-discuss at mcs.anl.gov
> > > >> Subject: [mpich-discuss] Problem sometimes when running on
> > > >> winxp on
> > > >> >=2 processes and MPE_IBCAST
> > > >>
> > > >> Hi,
> > > >>
> > > >> I tried to run a mpi code which is copied from an example by
> > > >> the RS 6000 book. It is supposed to broadcast and synchronize
> > > >> all
> values.
> > > >> When I ran it on my school's linux servers, there is no problem.
> > > >> However, if I run it on my own winxp, on >=2 processes,
> > > >> sometimes it work, other times I get the error:
> > > >>
> > > >> [01:3216].....ERROR:result command received but the wait_list
> > > >> is empty.
> > > >> [01:3216]...ERROR:unable to handle the command: "cmd=result
> > > >> src=1 dest=1
> > > >> tag=7 c
> > > >> md_tag=3 cmd_orig=dbget ctx_key=1 value="port=1518
> > > >> description=gotchama-16e5ed i
> > > >> fname=192.168.1.105 " result=DBS_SUCCESS "
> > > >> [01:3216].ERROR:error closing the unknown context socket:
> > > >> generic socket failure , error stack:
> > > >> MPIDU_Sock_wait(2603): The I/O operation has been aborted
> > > >> because of either a th read exit or an application request.
> > > >> (errno 995) [01:3216]..ERROR:sock_op_close returned while
> > > >> unknown context is in
> > > >> state: SMPD_
> > > >> IDLE
> > > >>
> > > >> Or
> > > >>
> > > >> [01:3308].....ERROR:result command received but the wait_list
> > > >> is empty.
> > > >> [01:3308]...ERROR:unable to handle the command: "cmd=result
> > > >> src=1 dest=1
> > > >> tag=15
> > > >> cmd_tag=5 cmd_orig=barrier ctx_key=0 result=DBS_SUCCESS "
> > > >> [01:3308]..ERROR:sock_op_close returned while unknown context
> > > >> is
> > > in
> > > >> state: SMPD_
> > > >> IDLE
> > > >>
> > > >> There is no problem if I run on 1 process. If it's >=4, then
> > > >> the error happens all the time. Moreover, it's a rather simple
> > > >> code and so there shouldn't be anything wrong with it.
> > > >> Why is this so?
> > > >>
> > > >> Btw, the RS 6000 book also mention a routine called MPE_IBCAST,
> > > >> which is a non-blocking version of MPI_BCAST. Is there a
> > > >> similar routine in MPICH2?
> > > >>
> > > >> Thank you very much
> > > >>
> > > >> Regards.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > >
> > >
> > > program mpi_test2
> > >
> > > ! test to show updating for i,j double loop (partial continuous
> > data)
> > > for specific req data only
> > >
> > > ! ie update u(2:6,2:6) values instead of all u values, also for
> > struct
> > > data
> > >
> > > ! FVM use
> > >
> > > implicit none
> > >
> > > include "mpif.h"
> > >
> > > integer, parameter :: size_x=8,size_y=8
> > >
> > > integer :: i,j,k,ierr,rank,nprocs,u(size_x,size_y)
> > >
> > > integer :: jsta,jend,jsta2,jend1,inext,iprev,isend1,irecv1,isend2
> > >
> > > integer :: irecv2,is,ie,js,je
> > >
> > > integer, allocatable :: jjsta(:), jjlen(:),jjreq(:),u_tmp(:,:)
> > >
> > > INTEGER istatus(MPI_STATUS_SIZE)
> > >
> > >
> > >
> > > call MPI_Init(ierr)
> > >
> > > call MPI_Comm_rank(MPI_COMM_WORLD,rank,ierr)
> > >
> > > call MPI_Comm_size(MPI_COMM_WORLD,nprocs,ierr)
> > >
> > > allocate (jjsta(0:nprocs-1),jjlen(0:nprocs-1),jjreq(0:nprocs-1))
> > >
> > > is=3; ie=6; js=3; je=6
> > >
> > > allocate (u_tmp(is:ie,js:je))
> > >
> > >
> > >
> > > do k = 0, nprocs - 1
> > >
> > > call para_range(js,je, nprocs, k, jsta, jend)
> > >
> > > jjsta(k) = jsta
> > >
> > > jjlen(k) = (ie-is+1) * (jend - jsta + 1)
> > >
> > > end do
> > >
> > > call para_range(js, je, nprocs, rank , jsta, jend)
> > >
> > > do j=jsta,jend
> > >
> > > do i=is,ie
> > >
> > > u(i,j)=(j-1)*size_x+i
> > >
> > >
> > >
> > > end do
> > >
> > > end do
> > >
> > > do j=jsta,jend
> > >
> > > do i=is,ie
> > >
> > > u_tmp(i,j)=u(i,j)
> > >
> > >
> > >
> > > end do
> > >
> > > end do
> > >
> > > do k=0,nprocs-1
> > >
> > > call MPI_Barrier(MPI_COMM_WORLD,ierr)
> > >
> > > if (k==rank) then
> > >
> > > print *, rank
> > >
> > > write (*,'(8i5)') u
> > >
> > >
> > >
> > > end if
> > >
> > > end do
> > >
> > > do k = 0, nprocs - 1
> > >
> > >
> > >
> > > call MPI_BCAST(u_tmp(is,jjsta(k)), jjlen(k), MPI_Integer,k,
> > > MPI_COMM_WORLD, ierr)
> > >
> > > end do
> > >
> > >
> > >
> > >
> > > deallocate (jjsta, jjlen, jjreq)
> > >
> > > u(is:ie,js:je)=u_tmp(is:ie,js:je)
> > >
> > >
> > >
> > > do k=0,nprocs-1
> > >
> > > call MPI_Barrier(MPI_COMM_WORLD,ierr)
> > >
> > > if (k==rank) then
> > >
> > > print *, rank
> > >
> > > write (*,'(8i5)') u
> > >
> > >
> > >
> > > end if
> > >
> > > end do
> > >
> > >
> > >
> > >
> > > call MPI_Finalize(ierr)
> > >
> > > contains
> > >
> > > subroutine para_range(n1, n2, nprocs, irank, ista, iend)
> > > ! block distribution
> > >
> > > integer n1 !The lowest value of the iteration variable (IN)
> > >
> > > integer n2 !The highest value of the iteration variable (IN)
> > >
> > > integer nprocs !The number of processes (IN)
> > >
> > > integer irank !The rank for which you want to know the range of
> > > iterations(IN)
> > >
> > > integer ista !The lowest value of the iteration variable that
> > > process irank executes (OUT)
> > >
> > > integer iend !The highest value of the iteration variable that
> > > process irank executes (OUT)
> > >
> > > integer iwork1,iwork2
> > >
> > > iwork1 = (n2 - n1 + 1) / nprocs
> > >
> > > iwork2 = mod(n2 - n1 + 1, nprocs)
> > >
> > > ista = irank * iwork1 + n1 + min(irank, iwork2)
> > >
> > > iend = ista + iwork1 - 1
> > >
> > > if (iwork2 > irank) iend = iend + 1
> > >
> > > end subroutine para_range
> > >
> > > end program mpi_test2
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080512/e22583c4/attachment.htm>
More information about the mpich-discuss
mailing list