[mpich-discuss] Isend Irecv error

Kenneth Leiter kenneth.leiter at gmail.com
Fri Jun 15 09:35:05 CDT 2012


Hello,

I am stumped by a problem I am having with my code failing when I use
a large number of processors.  I have produced a standalone code to
demonstrate the error.  I don't see the error with other MPI
implementations that are available to me (intel mpi and openmpi).  I
am using mpich-1.4.1p1.

The test code sends and receives a buffer from all other tasks.  I
realize that I should write this as a collective operation (like
Bcast), but in my real code I only communicate to a few neighbor tasks
and must use point-to-point operations.  This test code demonstrates
the same problem I see in my real code.

On my machine, everything works fine up to 128 processors (I have 24
cores per node on the machine), but fails at 256 processors.  Using
other mpi implementations I can get to 1500 processors with no
problem.  I have seen the same behavior on two different machines.

I get an error in MPI_Waitall:

Fatal error in PMPI_Waitall: See the MPI_ERROR field in MPI_Status for
the error code

When I examine the MPI_Status I get:

Task ID | Error code

230 0
231 0
232 0
233 0
234 0
235 0
236 604005647
237 18
238 18
239 18
240 18
241 18
242 18
243 18

I have attached the test code to this message.

Thanks,
Ken Leiter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpichTest.cxx
Type: application/octet-stream
Size: 1984 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120615/b6eca271/attachment.obj>


More information about the mpich-discuss mailing list