[mpich-discuss] Assertion failure in ch3_progress

Dorian Krause ddkrause at uni-bonn.de
Fri Jan 30 03:45:29 CST 2009


Hi Darius,

thanks for your answer.

I will try to break it down to a simple example. What I can
say right now is that

a) The problem depends on the communication value (the larger
the communication volume the earlier (timesteps) the problem occurs).

b) It only occurs when the procs are on different machines.


It would be helpful if there is a way to make sure that MPICH2
behaves in the same way on shared memory and distributed memory
machines (e.g. doesn't use IPC). Is there such a way (I suspect
that there is a different behaviour because of point b))?

Thanks.

Dorian


Darius Buntinas wrote:
> This means that the header of the received packet has been corrupted.
> It looks like it might be an internal bug.  Can you send us a short
> program that demonstrates this?
>
> Thanks,
> -d
>
> On 01/27/2009 07:25 AM, Dorian Krause wrote:
>   
>> Hi List,
>>
>> I'm running an application with mpich2-1.1a2 (intel compiler) which uses
>> onesided communication to put data from a contiguous buffer on the origin
>> side into a strided (derived datatype) buffer on the target side. The
>> program runs fine with (let's say) 4 procs on a single machine but fails
>> with
>>
>> Assertion failed in file ch3_progress.c at line 473: pkt->type >= 0 &&
>> pkt->type < MPIDI_NEM_PKT_END
>> internal ABORT - process 3
>>
>>
>> if submitted to the cluster (I suppose it does not use nemesis in the
>> first case ?). In ch3_progress.c I can read that "invalid pkt data will
>> result in unpredictable behavior".
>>
>> Can you tell me what that means? What is "pkt data" and to which input
>> from the application does the pkt instance corresponds?
>>
>>
>> Thanks,
>> Dorian
>>     



More information about the mpich-discuss mailing list