[mpich-discuss] icc error with mpich2

Dave Goodell goodell at mcs.anl.gov
Wed Aug 25 14:52:53 CDT 2010


This error usually happens when a collective call (such as MPI_Gather) is incorrectly used.  For most collectives, if the same buffer is passed as the send and the recv buffer then MPI_IN_PLACE should be used instead.

Can you try the following MPICH2 tarball instead of 1.2.1p1?

http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/1.2.1/mpich2-1.2.1-r7076.tar.gz

It should help you narrow down which collective call is causing the problem by giving you a better error message.

Also, it looks like you have passed "--with-atomic-primitives=no" to your configure step.  I'm guessing this is because our atomic assembly instructions package, OpenPA, failed to configure successfully on your ia64 machine.  If this is the case, you are probably better off configuring with "--with-device=ch3:sock" instead in order to get better performance.  The atomic primitive emulation is quite slow and very rarely tested.

-Dave

On Aug 24, 2010, at 5:59 PM CDT, Siraj wrote:

> Hi,
> Thanks for the reply. Well i have changed the path of include directory by changing it from 
> /hpc/home/sislam/mpich2-1.2.1p1/include
> to
> /hpc/home/sislam/mpich2-1.2.1p1/src/include
> 
> and the problem is solved but why there are tow include directories in mich2-1.2.p1 folder. 
> 
> Actually this machine have one another old mpich but i have disable it and have installed the new mpich2  using command 
> 
> ./configure --prefix=/hpc/u2/sislam/mpich2-1.2.1p1''--with-atomic-primitives=no' 'CC=icc' 'F77=ifort'
> 
> and then i run make and make install. 
> 
> Now my code is compiled but when I run it using mpirun -np 8 cam command i get the following error after some time 
> 
> -------------------------------------
> Filename specifier for history file            1  = %c.cam2.h%t.%y-%m.nc
>  Filename specifier for history file            7  = %c.cam2.i.%y-%m-%d-%s.nc
>  Accumulation precision history file            1 =           8
>  Packing density history file            1 =           2
>  Number of time samples per file (MFILT) for history file            1  is
>            1
>  Accumulation precision history file            7 =           8
>  Packing density history file            7 =           1
>  Number of time samples per file (MFILT) for history file            7  is
>            1
> Assertion failed in file helper_fns.c at line 335: 0
> memcpy argument memory ranges overlap, dst_=0x60000000062baf34 src_=0x60000000062baf34 len_=4
> 
> internal ABORT - process 1
> Assertion failed in file helper_fns.c at line 335: 0
> Assertion failed in file helper_fns.c at line 335: 0
> memcpy argument memory ranges overlap, dst_=0x600000000623e1a0 src_=0x600000000623e1a0 len_=4
> 
> internal ABORT - process 4
> Assertion failed in file helper_fns.c at line 335: 0
> memcpy argument memory ranges overlap, dst_=0x600000000623c088 src_=0x600000000623c088 len_=4
> 
> internal ABORT - process 2
> Assertion failed in file helper_fns.c at line 335: 0
> memcpy argument memory ranges overlap, dst_=0x600000000623c0d8 src_=0x600000000623c0d8 len_=4
> 
> internal ABORT - process 6
> Assertion failed in file helper_fns.c at line 335: 0
> memcpy argument memory ranges overlap, dst_=0x600000000626c814 src_=0x600000000626c814 len_=4
> 
> internal ABORT - process 5
> memcpy argument memory ranges overlap, dst_=0x6000000006237d50 src_=0x6000000006237d50 len_=4
> 
> internal ABORT - process 0
> rank 4 in job 2  pg-hpc-altix-01_48099   caused collective abort of all ranks
>   exit status of rank 4: return code 1
> Assertion failed in file helper_fns.c at line 335: 0
> memcpy argument memory ranges overlap, dst_=0x600000000623c08c src_=0x600000000623c08c len_=4
> 
> internal ABORT - process 3
> Assertion failed in file helper_fns.c at line 335: 0
> memcpy argument memory ranges overlap, dst_=0x600000000623906c src_=0x600000000623906c len_=4
> 
> internal ABORT - process 7
> rank 3 in job 2  pg-hpc-altix-01_48099   caused collective abort of all ranks
>   exit status of rank 3: return code 1
> rank 1 in job 2  pg-hpc-altix-01_48099   caused collective abort of all ranks
>   exit status of rank 1: return code 1
> rank 0 in job 2  pg-hpc-altix-01_48099   caused collective abort of all ranks
>   exit status of rank 0: return code 1
> (seq_mct_drv) : Initialize lnd component
> CAM run failed
> 
> Siraj
> 
> ------------------------------------
> 
> 
> Message: 1
> Date: Mon, 23 Aug 2010 19:30:42 -0600 (GMT-06:00)
> From: chan at mcs.anl.gov
> Subject: Re: [mpich-discuss] icc error with mpich2
> To: mpich-discuss at mcs.anl.gov
> Message-ID:
>        <381714913.1224641282613442701.JavaMail.root at zimbra.anl.gov>
> Content-Type: text/plain; charset=utf-8
> 
> 
> ----- "Siraj" <sirajkhan78 at gmail.com> wrote:
> 
> 
> > /usr/include/mpi.h(30): error: invalid redeclaration of type name
> > "MPI_Request" (declared at line 264 of
> > "/hpc/home/sislam/mpich2-1.2.
> 1p1/include/mpi.h")
> >   typedef unsigned int          MPI_Request;
> 
> It looks like you have 2 mpi.h header files, one in /usr/include/mpi.h
> and one in /hpc/home/sislam/mpich2-1.2.1p1/include. You may have
> more than 1 MPI implementation on your machine.
> 
> > Note: I have compiled mpich using following command
> > ./configure --prefix=/hpc/u2/sislam/mpich2-1.2.1p1'
> > '--with-atomic-primitives=no' 'CC=icc' 'F77=ifort'
> 
> did you do "make" and "make install" after configure ?
> your mpich2 install directory, /hpc/u2/sislam/mpich2-1.2.1p1,
> is different from above 2 ?!
> 
> Can you try using the mpi wrappers, like mpicc and pif90,
> from /hpc/u2/sislam/mpich2-1.2.1p1/bin to compile and link
> your ccsm code ?  In general, one should use MPI wrappers provided
> by MPI implementation instead of using native compiler directly.
> MPI wrappers provided by MPICH2 is very careful about the include
> and link path orders, so it should avoid the problem that you see here.
> If not, let us know...
> 
> 
> A.Chan
> 
> 
> 
> Siraj
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list