[mpich-discuss] MPICH with PGI compilers (Assertion failed in helper_fns.c)

Sarat Sreepathi sarat_s at ncsu.edu
Wed Mar 24 14:42:56 CDT 2010


Dave,

Thanks for your suggestions. I tracked down the problem to a
MPI_Scatterv call where the source and destination buffers overlap.
I guess we never noticed this issue as this check may not have been
enforced until recently.

I modified the call to use MPI_IN_PLACE for the root process and it worked.

Posting to the list for the benefit of others who might encounter this.
Earlier:
MPI_Scatterv(sources,recvCount,displs, MPI_INT, sources, localTrials,
MPI_INT, 0, MPI_COMM_WORLD);

Updated:
MPI_Scatterv(sources,recvCount,displs, MPI_INT, (rank) ? sources :
MPI_IN_PLACE , localTrials, MPI_INT, 0, MPI_COMM_WORLD);

Thanks,
Sarat.

On 03/24/2010 10:59 AM, Dave Goodell wrote:
> This has come up before:
>
> https://lists.mcs.anl.gov/mailman/htdig/mpich-discuss/2010-March/006658.html
>
>
> and
>
> https://trac.mcs.anl.gov/projects/mpich2/ticket/1006
>
> In the first case there is probably a real bug, but we never received
> enough information from the poster to reproduce it for ourselves.  The
> good news for that one is that the workaround of CPPFLAGS="-DNDEBUG"
> did work for him.  The second case was a combination of user error and
> poor error messages/behavior.
>
> Do you have a small test case that I can use to reproduce the problem?
>
> The change you mentioned (r6344) would help us track this problem
> down, but it unfortunately isn't present in mpich2-1.2.1p1.
>
> -Dave
>
> On Mar 24, 2010, at 12:59 AM, Sarat Sreepathi wrote:
>
>> Hello:
>>
>> We recently installed new PGI compilers(10.2-1 64-bit) and built
>> MPICH(1.2.1p1) with the new compilers on our Opteron cluster. Initial
>> tests revealed no problems.
>>
>> But when I built and ran a parallel program today, it crashed with an
>> assertion failure in the MPI source. (src/mpi/coll/helper_fns.c):
>> memcpy argument memory ranges overlap
>> The detailed error and configuration details are enclosed below.
>>
>> I found a recent changeset that's related to this:
>> http://trac.mcs.anl.gov/projects/mpich2/changeset/6344#file2
>> Your help is greatly appreciated in resolving the issue.
>>
>> Thanks,
>> Sarat.
>>
>> $> mpirun -n 1 ./epanetmsx input0.txt output0.txt
>> Assertion failed in file helper_fns.c at line 337: 0
>> memcpy argument memory ranges overlap, dst_=0x633440 src_=0x633440
>> len_=16
>>
>> internal ABORT - process 0
>> rank 0 in job 33695  master_4268   caused collective abort of all ranks
>>   exit status of rank 0: return code 1
>>
>> Excerpt from src/mpi/coll/helper_fns.c
>> 333     if (sendtype_iscontig && recvtype_iscontig)
>> 334     {
>> 335         MPIU_Memcpy(((char *) recvbuf + recvtype_true_lb),
>> 336                ((char *) sendbuf + sendtype_true_lb),
>> 337                copy_sz);
>> 338     }
>>
>> The cluster is running an older OS: SuSE 10.
>> $ uname -a
>> Linux master 2.6.14-ck5-suse10-osmp #55 SMP Tue Jan 3 13:19:36 EST
>> 2006 x86_64 x86_64 x86_64 GNU/Linux
>>
>> >$ mpich2version
>> MPICH2 Version:        1.2.1p1
>> MPICH2 Release date:    Unknown, built on Sun Mar  7 21:16:25 EST 2010
>> MPICH2 Device:        ch3:nemesis
>> MPICH2 configure:     --prefix=/usr/local/mpich2-1.2.1 --enable-f77
>> --enable-f90 --enable-cxx
>> MPICH2 CC:     pgcc  -O2
>> MPICH2 CXX:     pgCC  -O2
>> MPICH2 F77:     pgf77
>> MPICH2 F90:     pgf90
>>
>> -- 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Sarat Sreepathi
>> Doctoral Student
>> Dept. of Computer Science
>> North Carolina State University
>> sarat_s at ncsu.edu ~ (919)645-7775
>> http://www.sarats.com
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sarat Sreepathi
Doctoral Student
Dept. of Computer Science
North Carolina State University
sarat_s at ncsu.edu ~ (919)645-7775
http://www.sarats.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



More information about the mpich-discuss mailing list