[mpich-discuss] MPICH with PGI compilers (Assertion failed in helper_fns.c)

Dave Goodell goodell at mcs.anl.gov
Wed Mar 24 09:59:54 CDT 2010


This has come up before:

https://lists.mcs.anl.gov/mailman/htdig/mpich-discuss/2010-March/006658.html

and

https://trac.mcs.anl.gov/projects/mpich2/ticket/1006

In the first case there is probably a real bug, but we never received  
enough information from the poster to reproduce it for ourselves.  The  
good news for that one is that the workaround of CPPFLAGS="-DNDEBUG"  
did work for him.  The second case was a combination of user error and  
poor error messages/behavior.

Do you have a small test case that I can use to reproduce the problem?

The change you mentioned (r6344) would help us track this problem  
down, but it unfortunately isn't present in mpich2-1.2.1p1.

-Dave

On Mar 24, 2010, at 12:59 AM, Sarat Sreepathi wrote:

> Hello:
>
> We recently installed new PGI compilers(10.2-1 64-bit) and built  
> MPICH(1.2.1p1) with the new compilers on our Opteron cluster.  
> Initial tests revealed no problems.
>
> But when I built and ran a parallel program today, it crashed with  
> an assertion failure in the MPI source. (src/mpi/coll/helper_fns.c):  
> memcpy argument memory ranges overlap
> The detailed error and configuration details are enclosed below.
>
> I found a recent changeset that's related to this: http://trac.mcs.anl.gov/projects/mpich2/changeset/6344#file2
> Your help is greatly appreciated in resolving the issue.
>
> Thanks,
> Sarat.
>
> $> mpirun -n 1 ./epanetmsx input0.txt output0.txt
> Assertion failed in file helper_fns.c at line 337: 0
> memcpy argument memory ranges overlap, dst_=0x633440 src_=0x633440  
> len_=16
>
> internal ABORT - process 0
> rank 0 in job 33695  master_4268   caused collective abort of all  
> ranks
>   exit status of rank 0: return code 1
>
> Excerpt from src/mpi/coll/helper_fns.c
> 333     if (sendtype_iscontig && recvtype_iscontig)
> 334     {
> 335         MPIU_Memcpy(((char *) recvbuf + recvtype_true_lb),
> 336                ((char *) sendbuf + sendtype_true_lb),
> 337                copy_sz);
> 338     }
>
> The cluster is running an older OS: SuSE 10.
> $ uname -a
> Linux master 2.6.14-ck5-suse10-osmp #55 SMP Tue Jan 3 13:19:36 EST  
> 2006 x86_64 x86_64 x86_64 GNU/Linux
>
> >$ mpich2version
> MPICH2 Version:        1.2.1p1
> MPICH2 Release date:    Unknown, built on Sun Mar  7 21:16:25 EST 2010
> MPICH2 Device:        ch3:nemesis
> MPICH2 configure:     --prefix=/usr/local/mpich2-1.2.1 --enable-f77  
> --enable-f90 --enable-cxx
> MPICH2 CC:     pgcc  -O2
> MPICH2 CXX:     pgCC  -O2
> MPICH2 F77:     pgf77
> MPICH2 F90:     pgf90
>
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Sarat Sreepathi
> Doctoral Student
> Dept. of Computer Science
> North Carolina State University
> sarat_s at ncsu.edu ~ (919)645-7775
> http://www.sarats.com
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list