[mpich-discuss] Error in MPI_PROBE

Luiz Carlos da Costa Junior lcjunior at ufrj.br
Fri May 4 09:39:36 CDT 2012


Hi all,

Recently, I implemented a very naive protocol to check for error messages
before actually receive the message. To accomplish this, I used the
MPI_PROBE function.
I have tested and run my program successfully several times, but I got the
following error:


   1. Fatal error in MPI_Probe: Invalid communicator, error stack:
   2. MPI_Probe(113): MPI_Probe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG,
   MPI_COMM_WORLD, status=0x8185c80) failed
   3. MPI_Probe(85).: Invalid communicator
   4. Fatal error in MPI_Send: Other MPI error, error stack:
   5. MPI_Send(173)..............: MPI_Send(buf=0xffc98664, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   6. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   7. Fatal error in MPI_Send: Other MPI error, error stack:
   8. MPI_Send(173)..................: MPI_Send(buf=0xfff71364, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   9. MPIDI_EagerContigShortSend(262): failure occurred while attempting to
   send an eager message
   10. MPIDI_CH3_iStartMsg(36)........: Communication error with rank 0
   11. Fatal error in MPI_Send: Other MPI error, error stack:
   12. MPI_Send(173)..............: MPI_Send(buf=0xfffc8964, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   13. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   14. Fatal error in MPI_Send: Other MPI error, error stack:
   15. MPI_Send(173)..............: MPI_Send(buf=0xffdb4c64, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   16. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   17. Fatal error in MPI_Send: Other MPI error, error stack:
   18. MPI_Send(173)..............: MPI_Send(buf=0xff9c2f64, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   19. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   20. Fatal error in MPI_Send: Other MPI error, error stack:
   21. MPI_Send(173)..............: MPI_Send(buf=0xfffe1d64, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   22. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   23. Fatal error in MPI_Send: Other MPI error, error stack:
   24. MPI_Send(173)..............: MPI_Send(buf=0xffd5e464, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   25. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   26. Fatal error in MPI_Send: Other MPI error, error stack:
   27. MPI_Send(173)..............: MPI_Send(buf=0xff9ae864, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   28. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   29. Fatal error in MPI_Send: Other MPI error, error stack:
   30. MPI_Send(173)..............: MPI_Send(buf=0xff8df4e4, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   31. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   32. Fatal error in MPI_Send: Other MPI error, error stack:
   33. MPI_Send(173)..............: MPI_Send(buf=0xffc94fe4, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   34. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   35. Fatal error in MPI_Send: Other MPI error, error stack:
   36. MPI_Send(173)..............: MPI_Send(buf=0xff9caae4, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   37. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused
   38. Fatal error in MPI_Send: Other MPI error, error stack:
   39. MPI_Send(173)..............: MPI_Send(buf=0xfffe7be4, count=1,
   MPI_INTEGER, dest=0, tag=97, MPI_COMM_WORLD) failed
   40. MPID_nem_tcp_connpoll(1826): Communication error with rank 0:
   Connection refused

a) Line 1 says that the communicator is invalid but at line 2 it is
possible to see that the MPI_PROBE function recognized that the
communicator is MPI_COMM_WORLD. In which conditions this can happen?
b) What is the meaning of the numbers right after the mpi functions names
that appear inside the parenthesis on error messages (MPI_Probe(113) for
example)?

I re-run the case and it worked perfectly.

I am using MPICH2 Version: 1.4.1p1.

Thanks in advance,
Luiz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120504/7688adda/attachment.htm>


More information about the mpich-discuss mailing list