[mpich-discuss] Error in MPI_PROBE

Rajeev Thakur thakur at mcs.anl.gov
Fri May 4 11:17:13 CDT 2012


Make sure you are compiling the program with the mpicc from MPICH2 1.4.1, not some other mpicc by mistake. Give the full path if necessary. Likewise for mpiexec.

Rajeev

On May 4, 2012, at 9:39 AM, Luiz Carlos da Costa Junior wrote:

> Hi all,
> 
> Recently, I implemented a very naive protocol to check for error messages before actually receive the message. To accomplish this, I used the MPI_PROBE function.
> I have tested and run my program successfully several times, but I got the following error:
> 	• Fatal error in MPI_Probe: Invalid communicator, error stack:
> 	• MPI_Probe(113): MPI_Probe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0x8185c80) failed
> 	• MPI_Probe(85).: Invalid communicator
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xffc98664, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..................: MPI_Send(buf=0xfff71364, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPIDI_EagerContigShortSend(262): failure occurred while attempting to send an eager message
> 	• MPIDI_CH3_iStartMsg(36)........: Communication error with rank 0
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xfffc8964, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xffdb4c64, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xff9c2f64, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xfffe1d64, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xffd5e464, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xff9ae864, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xff8df4e4, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xffc94fe4, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xff9caae4, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused
> 	• Fatal error in MPI_Send: Other MPI error, error stack:
> 	• MPI_Send(173)..............: MPI_Send(buf=0xfffe7be4, count=1, MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
> 	• MPID_nem_tcp_connpoll(1826): Communication error with rank 0: Connection refused 
> a) Line 1 says that the communicator is invalid but at line 2 it is possible to see that the MPI_PROBE function recognized that the communicator is MPI_COMM_WORLD. In which conditions this can happen?
> b) What is the meaning of the numbers right after the mpi functions names that appear inside the parenthesis on error messages (MPI_Probe(113) for example)?
> 
> I re-run the case and it worked perfectly.
> 
> I am using MPICH2 Version: 1.4.1p1.
> 
> Thanks in advance,
> Luiz
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list