[mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.

Rajeev Thakur thakur at mcs.anl.gov
Fri May 14 17:17:12 CDT 2010


At least the bug in Send/Recv of large messages has been fixed today, so
the data should be sent correctly. The latest source in svn or the
nightly snapshots from today will have the fix. 

MPI_Get_count still returns a negative value because the count field in
status defined as an int. We will need to change that to a 64-bit value.

Rajeev
 
> MPICH2 does not actually send the message, as you can see by 
> running the attached code.
> 
>   # MPICH2 1.2.1, incorrect cols[1]
>   [1] receiving...
>   [0] sending...
>   [1] count -103432106, cols[0] 1
> 
> 
> How much memory does crush have (you need about 7GB to do this without
> swapping)?  In particular, most of the time it took Open MPI 
> to send the
> message (with your source) was actually just spent faulting the
> send/recv buffers.  The attached faults the buffers first, and the
> subsequent send/recv takes less than 2 seconds.
> 
> Actually, it's clear that MPICH2 never touches either buffer 
> because it
> returns immediately regardless of whether they have been 
> faulted first.
> 
> Jed
> 
> 



More information about the mpich-discuss mailing list