[mpich-discuss] Isend Irecv error

Darius Buntinas buntinas at mcs.anl.gov
Fri Jun 15 14:36:29 CDT 2012


I spoke too soon.  The connection retry code is already in 1.4.1p1, so this might be something else.  Let's try upping the retries to something ridiculous.  In the file
    src/mpid/ch3/channels/nemesis/nemesis/netmod/tcp/tcp_impl.h
on line 29 change the value of MPIDI_NEM_TCP_MAX_CONNECT_RETRIES to something like 10000 .  Then do:

    make clean
    make
    make install

and recompile you program and try it again, and see if it changes anything.

-d



On Jun 15, 2012, at 9:35 AM, Kenneth Leiter wrote:

> Hello,
> 
> I am stumped by a problem I am having with my code failing when I use
> a large number of processors.  I have produced a standalone code to
> demonstrate the error.  I don't see the error with other MPI
> implementations that are available to me (intel mpi and openmpi).  I
> am using mpich-1.4.1p1.
> 
> The test code sends and receives a buffer from all other tasks.  I
> realize that I should write this as a collective operation (like
> Bcast), but in my real code I only communicate to a few neighbor tasks
> and must use point-to-point operations.  This test code demonstrates
> the same problem I see in my real code.
> 
> On my machine, everything works fine up to 128 processors (I have 24
> cores per node on the machine), but fails at 256 processors.  Using
> other mpi implementations I can get to 1500 processors with no
> problem.  I have seen the same behavior on two different machines.
> 
> I get an error in MPI_Waitall:
> 
> Fatal error in PMPI_Waitall: See the MPI_ERROR field in MPI_Status for
> the error code
> 
> When I examine the MPI_Status I get:
> 
> Task ID | Error code
> 
> 230 0
> 231 0
> 232 0
> 233 0
> 234 0
> 235 0
> 236 604005647
> 237 18
> 238 18
> 239 18
> 240 18
> 241 18
> 242 18
> 243 18
> 
> I have attached the test code to this message.
> 
> Thanks,
> Ken Leiter
> <mpichTest.cxx>_______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list