[mpich-discuss] MPICH with PGI compilers (Assertion failed in helper_fns.c)
Dave Goodell
goodell at mcs.anl.gov
Wed Mar 24 09:59:54 CDT 2010
This has come up before:
https://lists.mcs.anl.gov/mailman/htdig/mpich-discuss/2010-March/006658.html
and
https://trac.mcs.anl.gov/projects/mpich2/ticket/1006
In the first case there is probably a real bug, but we never received
enough information from the poster to reproduce it for ourselves. The
good news for that one is that the workaround of CPPFLAGS="-DNDEBUG"
did work for him. The second case was a combination of user error and
poor error messages/behavior.
Do you have a small test case that I can use to reproduce the problem?
The change you mentioned (r6344) would help us track this problem
down, but it unfortunately isn't present in mpich2-1.2.1p1.
-Dave
On Mar 24, 2010, at 12:59 AM, Sarat Sreepathi wrote:
> Hello:
>
> We recently installed new PGI compilers(10.2-1 64-bit) and built
> MPICH(1.2.1p1) with the new compilers on our Opteron cluster.
> Initial tests revealed no problems.
>
> But when I built and ran a parallel program today, it crashed with
> an assertion failure in the MPI source. (src/mpi/coll/helper_fns.c):
> memcpy argument memory ranges overlap
> The detailed error and configuration details are enclosed below.
>
> I found a recent changeset that's related to this: http://trac.mcs.anl.gov/projects/mpich2/changeset/6344#file2
> Your help is greatly appreciated in resolving the issue.
>
> Thanks,
> Sarat.
>
> $> mpirun -n 1 ./epanetmsx input0.txt output0.txt
> Assertion failed in file helper_fns.c at line 337: 0
> memcpy argument memory ranges overlap, dst_=0x633440 src_=0x633440
> len_=16
>
> internal ABORT - process 0
> rank 0 in job 33695 master_4268 caused collective abort of all
> ranks
> exit status of rank 0: return code 1
>
> Excerpt from src/mpi/coll/helper_fns.c
> 333 if (sendtype_iscontig && recvtype_iscontig)
> 334 {
> 335 MPIU_Memcpy(((char *) recvbuf + recvtype_true_lb),
> 336 ((char *) sendbuf + sendtype_true_lb),
> 337 copy_sz);
> 338 }
>
> The cluster is running an older OS: SuSE 10.
> $ uname -a
> Linux master 2.6.14-ck5-suse10-osmp #55 SMP Tue Jan 3 13:19:36 EST
> 2006 x86_64 x86_64 x86_64 GNU/Linux
>
> >$ mpich2version
> MPICH2 Version: 1.2.1p1
> MPICH2 Release date: Unknown, built on Sun Mar 7 21:16:25 EST 2010
> MPICH2 Device: ch3:nemesis
> MPICH2 configure: --prefix=/usr/local/mpich2-1.2.1 --enable-f77
> --enable-f90 --enable-cxx
> MPICH2 CC: pgcc -O2
> MPICH2 CXX: pgCC -O2
> MPICH2 F77: pgf77
> MPICH2 F90: pgf90
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Sarat Sreepathi
> Doctoral Student
> Dept. of Computer Science
> North Carolina State University
> sarat_s at ncsu.edu ~ (919)645-7775
> http://www.sarats.com
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list