[mpich-discuss] MPI_Init problem

Jayesh Krishna jayesh at mcs.anl.gov
Fri May 8 11:25:22 CDT 2009


Ok. Let us know if need any help.
 
Regards,
Jayesh

  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rod Cook
Sent: Friday, May 08, 2009 11:03 AM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Init problem


This problem has been resolved. This is an old message that has been stuck
somewhere in the ether for the past few weeks.
 
Rod Cook.

----- Original Message ----- 
From: Rod  <mailto:rod at cookies.demon.co.uk> Cook 
To: mpich-discuss at mcs.anl.gov 
Sent: Monday, April 20, 2009 3:16 PM
Subject: [mpich-discuss] MPI_Init problem

I have a farming code where an MPI program spawns a number of jobs, these
jobs then run a sequence of other MPI applications which fail in MPI_Init.
I am using MPICH2 v1.08 under Windows XP. The MPI program (FARMER) spawns
two executables (JOB) using MPI_Comm_spawn. These executables (JOB) then
run (using CreateProcess) another MPI application (APP).  mpiexec is used
to start all the MPI executables.
 
APP crashes (times out) in it's call to MPI_Init with the following
message:
 
MPIDU_Sock_post_connect failed.
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(83)........: channel initialization failed
MPID_Init(334).......: PMI_Init returned -1[0] PMI_ConnectToHost failed:
unable
to post a connect to Yellowtail yellowtail :1061, error: Unknown
error class, error stack:
MPIDU_Sock_post_connect(1228): unable to connect to Yellowtail yellowtail 
                          on port 1061, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1275): unable to connect to                 on
port 1061,
 No connection could be made because the target machine actively refused
it. (er
rno 10061)
MPIDU_Sock_post_connect(1275): unable to connect to yellowtail on port
1061, No
connection could be made because the target machine actively refused it.
(errno
10061)
MPIDU_Sock_post_connect(1275): unable to connect to Yellowtail on port
1061, No
connection could be made because the target machine actively refused it.
(errno
10061)
uPMI_ConnectToHost returning PMI_FAIL
[0] PMI_Init failed.

 
I think the crash is caused by MPI_Init in APP trying to synchronise with
MPI_Comm_spawn in FARMER. There is no need for this from the farming codes
point of view because APP is independent of both FARMER and JOB. The
farming works OK if I replace APP with a version that doesn't use MPI. 
 
 
Thanks for your help,
 
Rod Cook
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090508/6bc85fb4/attachment.htm>


More information about the mpich-discuss mailing list