[mpich-discuss] MPICH2 internal errors on Win 7 x64

Wei Huang huangwei at ucar.edu
Tue May 10 16:19:54 CDT 2011


I am trying to install mpich2-1.3.2p1 to my cluster,

I configureed with:

 ./configure \
        --prefix=/neem2/huangwei/apps/mpich2-1.3.2p1 \
        --with-pm=hydra \
        --without-hydra-bindlib


When I run the examples, I got some problem with 2 processors, as below:


mpiexec -n 1 cpi

Process 0 of 1 is on neem
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000265


mpiexec -n 2 cpi

Process 0 of 2 is on neem
Process 1 of 2 is on neem
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.000212
[mpiexec at neem] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)

Anyone knows what is wrong here?

Thanks,

Wei


huangwei at ucar.edu
VETS/CISL
National Center for Atmospheric Research
P.O. Box 3000 (1850 Table Mesa Dr.)
Boulder, CO 80307-3000 USA
(303) 497-8924





On May 9, 2011, at 4:27 PM, Joe Vallino wrote:

> Thanks Rajeev; I'm still just learning MPI, so the MPI_IN_PLACE error was caused by my own naivete.
> 
> The noncompliant code was:
> 
>   CALL MPI_GATHER(lc_convex(mygid+1), 1, MPI_INTEGER, lc_convex(mygid+1), &
>                  1, MPI_INTEGER, 0, myComm, ierr)
> 
> which I replaced with
> 
>   if (mygid == 0) then
>      CALL MPI_GATHER(MPI_IN_PLACE, 1, MPI_INTEGER, lc_convex(mygid+1), &
>                      1, MPI_INTEGER, 0, myComm, ierr)
>   else    
>      CALL MPI_GATHER(lc_convex(mygid+1), 1, MPI_INTEGER, MPI_IN_PLACE, &
>                      1, MPI_INTEGER, 0, myComm, ierr)
>   end if
> 
> since this is the only place where gathering occurs.  I assume this is the only way to fix the noncompliant code, but it is certainly not as "pretty".  There were a few other areas requiring related fixes, but all is working now.
> 
> Thanks for all your help!!
> -joe
> 
> From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> To: mpich-discuss at mcs.anl.gov
> Sent: Monday, May 9, 2011 1:52:40 PM
> Subject: Re: [mpich-discuss] MPICH2 internal errors on Win 7 x64
> 
> It probably means that the data that the root sends to itself in MPI_Gather may not already be in the right location in recvbuf.
> 
> Rajeev
> 
> 
> On May 9, 2011, at 12:41 PM, Joe Vallino wrote:
> 
> > Rajeev, et al.
> > 
> > The use of MPI_IN_PLACE did allow MPCHI2 to run w/o errors (thanks).  Interestingly, the test problem generates a different (and incorrect) answer from what it should now.  Intel MPI also produces the same incorrect answer when using MPI_IN_PLACE, but produces the correct result when violating the MPI 2.2 standard regarding identical sbuf and rbuf .
> > 
> > Any ideas pop to mind with this situation? Anything else magical with MPI_IN_PLACE ?
> > 
> > cheers
> > -joe
> > 
> > From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> > To: mpich-discuss at mcs.anl.gov
> > Sent: Monday, May 9, 2011 10:58:33 AM
> > Subject: Re: [mpich-discuss] MPICH2 internal errors on Win 7 x64
> > 
> > The error check was added in a recent version of MPICH2.
> > 
> > Rajeev
> >  
> > On May 9, 2011, at 9:50 AM, Joe Vallino wrote:
> > 
> > > Thanks Rajeev.  I'll take a look at that, but I wonder why the code runs fine on Intel MPI which is based on MPICH2.  
> > > 
> > > cheers,
> > > -joe
> > > 
> > > From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> > > To: mpich-discuss at mcs.anl.gov
> > > Sent: Monday, May 9, 2011 9:55:11 AM
> > > Subject: Re: [mpich-discuss] MPICH2 internal errors on Win 7 x64
> > > 
> > > The code is passing the same buffer as sendbuf and recvbuf to MPI_Gather, which is not allowed. You need to use MPI_IN_PLACE as described in the MPI standard (see MPI 2.2 for easy reference).
> > > 
> > > Rajeev
> > > 
> > > 
> > > On May 8, 2011, at 6:46 PM, Joe Vallino wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I've installed MPICH2 (1.3.2p1, Windows EM64T binaries) on a Window 7 x64 machine (2 sockets, 4 cores each).  MPICH2 works fine for simple tests, be when I attempt to run a more complex use of MPI, I get various internal MPI errors, such as:
> > > > 
> > > > Fatal error in PMPI_Gather: Invalid buffer pointer, error stack:
> > > > PMPI_Gather(863): MPI_Gather(sbuf=0000000000BC8040, scount=1, MPI_INTEGER, rbuf=0000000000BC8040, rcount=1, MPI_INTEGER, root=0, comm=0x84000004) failed
> > > > PMPI_Gather(806): Buffers must not be aliased
> > > > 
> > > > job aborted:
> > > > rank: node: exit code[: error message]
> > > > 0: ECO37: 1: process 0 exited without calling finalize
> > > > 1: ECO37: 123
> > > > 
> > > > The errors occur regardless if using x32 or x64 builds.
> > > > 
> > > > The code I'm tying to run is pVTDIRECT (see TOMS package 897 on netlib.org), and the above errors are produced by running the simple test routine that comes with the package.  Since the package can be easily compiled and run, this should allow others to confirm the problem, if anyone is feeling so motivated :)
> > > > 
> > > > As an attempt to confirm the problem is with MPICH2 build, I installed a commercial MPI build (csWMPI II), which works fine with the TOMS package, so this would indicate the problem is with MPICH2.
> > > > 
> > > > Since the TOMS package uses Fortran 95, and I'm using the latest Intel ifort compiler with VS2008, I tried to build MPICH2 from the 1.3.2p1 source, but after banging my head on that for a day w/o success, I decided to see if anyone has any suggestions here (or if anyone can confirm the problem with the TOMS package under Windows MPICH2 release).
> > > > 
> > > > - Can anyone point me to a Win x64 build that used newer versions of intel fortran (V 11 or 12) and/or more recent releases of Windows SDK, which seem to be the main wild cards in the build process?
> > > > 
> > > > - I will continue to try and build MPICH2 for windows, but I suspect I will not succeed given my *cough* skills.
> > > > 
> > > > Thanks!
> > > > -joe
> > > > _______________________________________________
> > > > mpich-discuss mailing list
> > > > mpich-discuss at mcs.anl.gov
> > > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > 
> > > _______________________________________________
> > > mpich-discuss mailing list
> > > mpich-discuss at mcs.anl.gov
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > > _______________________________________________
> > > mpich-discuss mailing list
> > > mpich-discuss at mcs.anl.gov
> > > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > 
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110510/6de575d6/attachment.htm>


More information about the mpich-discuss mailing list