[mpich-discuss] Assertion failed in file helper_fns.c

Dave Goodell goodell at mcs.anl.gov
Mon Aug 23 11:03:44 CDT 2010


FWIW, if you want to use an improved version of 1.2.1p1 that has this additional error checking, you can try this tarball: 

http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/1.2.1/mpich2-1.2.1-r7074.tar.gz

It will at least tell you which collective is being called incorrectly.

-Dave

On Aug 12, 2010, at 9:42 PM CDT, Dave Goodell wrote:

> If you download and install our most recent release, you should get a better error message: http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.3b1/mpich2-1.3b1.tar.gz
> 
> In particular, we don't even know what collective operation is failing for you.
> 
> Unfortunately, I don't have time to write up an example right now.  Search for MPI_IN_PLACE in chapter 5 of the MPI Standard: http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
> 
> -Dave
> 
> On Aug 12, 2010, at 6:53 PM CDT, Rui Mei wrote:
> 
>> Thanks for your reply, Dave. I was using 1.2.1p1 version. I do not understand the fix. Could you give me more hints on this?
>> 
>> Rui
>> 
>> On Thu, Aug 12, 2010 at 4:53 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
>> What version of mpich2 are you using?
>> 
>> In some older versions of mpich2, you could get this sort of message by the same buffer as the send and recv buffers for many collective operations.  Very old versions often would not complain at all.
>> 
>> In newer versions of mpich2 you will almost always get a much more helpful error message instead.
>> 
>> In all cases, the fix is typically to either use the MPI_IN_PLACE option appropriately for the given collective, or to use separate buffers for sending and receiving.
>> 
>> -Dave
>> 
>> On Aug 12, 2010, at 3:44 PM CDT, Rui Mei wrote:
>> 
>>> Hi, all,
>>> 
>>> I was trying to run my research model with MPICH and got the following error message. I was trying to locate this file helper_fns.c. And I found this file name appeared in libmpich.a under MPI lib folder. This is the only clue I have found. But I do not know what is wrong with it. I hope to get some advice here. Thank you very much.
>>> 
>>> 
>>> Rui
>>> 
>>> "
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x600000000570def0 src_=0x600000000570def0 len_=4
>>> 
>>> internal ABORT - process 0
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x6000000005712088 src_=0x6000000005712088 len_=4
>>> 
>>> internal ABORT - process 2
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x6000000005770fa4 src_=0x6000000005770fa4 len_=4
>>> 
>>> internal ABORT - process 1
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x600000000571209c src_=0x600000000571209c len_=4
>>> 
>>> internal ABORT - process 3
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x60000000057427e4 src_=0x60000000057427e4 len_=4
>>> 
>>> internal ABORT - process 5
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x6000000005714180 src_=0x6000000005714180 len_=4
>>> 
>>> internal ABORT - process 4
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x600000000570f07c src_=0x600000000570f07c len_=4
>>> 
>>> internal ABORT - process 7
>>> Assertion failed in file helper_fns.c at line 337: 0
>>> memcpy argument memory ranges overlap, dst_=0x60000000057120a8 src_=0x60000000057120a8 len_=4
>>> 
>>> internal ABORT - process 6
>>> rank 7 in job 1 linuxAltix_6629 caused collective abort of all ranks
>>> exit status of rank 7: return code 1
>>> rank 4 in job 1 linuxAltix_6629 caused collective abort of all ranks
>>> exit status of rank 4: return code 1
>>> rank 3 in job 1 linuxAltix_6629 caused collective abort of all ranks
>>> exit status of rank 3: return code 1
>>> rank 2 in job 1 linuxAltix_6629 caused collective abort of all ranks
>>> exit status of rank 2: killed by signal 9
>>> (seq_mct_drv) : Initialize lnd component
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list