[mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.

Jed Brown jed at 59A2.org
Sat Feb 6 09:56:01 CST 2010


On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith <bsmith at mcs.anl.gov> wrote:
> To cheer you up, when I run with openMPI it runs forever sucking down  
> 100% CPU trying to send the messages :-)

On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete
after several seconds, but still prints the wrong count.

MPICH2 does not actually send the message, as you can see by running the
attached code.

  # Open MPI 1.4.1, correct cols[0]
  [0] sending...
  [1] receiving...
  count -103432106, cols[0] 0

  # MPICH2 1.2.1, incorrect cols[1]
  [1] receiving...
  [0] sending...
  [1] count -103432106, cols[0] 1


How much memory does crush have (you need about 7GB to do this without
swapping)?  In particular, most of the time it took Open MPI to send the
message (with your source) was actually just spent faulting the
send/recv buffers.  The attached faults the buffers first, and the
subsequent send/recv takes less than 2 seconds.

Actually, it's clear that MPICH2 never touches either buffer because it
returns immediately regardless of whether they have been faulted first.

Jed

-------------- next part --------------
A non-text attachment was scrubbed...
Name: b.c
Type: text/x-csrc
Size: 928 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100206/335f71ff/attachment.c>


More information about the mpich-discuss mailing list