[mpich-discuss] MPI_Init problem

Rod Cook rod at cookies.demon.co.uk
Fri May 8 11:02:52 CDT 2009


This problem has been resolved. This is an old message that has been stuck somewhere in the ether for the past few weeks.

Rod Cook.
  ----- Original Message ----- 
  From: Rod Cook 
  To: mpich-discuss at mcs.anl.gov 
  Sent: Monday, April 20, 2009 3:16 PM
  Subject: [mpich-discuss] MPI_Init problem


  I have a farming code where an MPI program spawns a number of jobs, these jobs then run a sequence of other MPI applications which fail in MPI_Init. I am using MPICH2 v1.08 under Windows XP. The MPI program (FARMER) spawns two executables (JOB) using MPI_Comm_spawn. These executables (JOB) then run (using CreateProcess) another MPI application (APP).  mpiexec is used to start all the MPI executables.

  APP crashes (times out) in it's call to MPI_Init with the following message:

  MPIDU_Sock_post_connect failed.
  Fatal error in MPI_Init: Other MPI error, error stack:
  MPIR_Init_thread(294): Initialization failed
  MPID_Init(83)........: channel initialization failed
  MPID_Init(334).......: PMI_Init returned -1[0] PMI_ConnectToHost failed: unable
  to post a connect to Yellowtail yellowtail :1061, error: Unknown
  error class, error stack:
  MPIDU_Sock_post_connect(1228): unable to connect to Yellowtail yellowtail 
                            on port 1061, exhausted all endpoints (errno -1)
  MPIDU_Sock_post_connect(1275): unable to connect to                 on port 1061,
   No connection could be made because the target machine actively refused it. (er
  rno 10061)
  MPIDU_Sock_post_connect(1275): unable to connect to yellowtail on port 1061, No
  connection could be made because the target machine actively refused it. (errno
  10061)
  MPIDU_Sock_post_connect(1275): unable to connect to Yellowtail on port 1061, No
  connection could be made because the target machine actively refused it. (errno
  10061)
  uPMI_ConnectToHost returning PMI_FAIL
  [0] PMI_Init failed.


  I think the crash is caused by MPI_Init in APP trying to synchronise with MPI_Comm_spawn in FARMER. There is no need for this from the farming codes point of view because APP is independent of both FARMER and JOB. The farming works OK if I replace APP with a version that doesn't use MPI. 


  Thanks for your help,

  Rod Cook
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090508/d05969bf/attachment.htm>


More information about the mpich-discuss mailing list