<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.6000.16825" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>I have a farming code where an MPI program
spawns a number of jobs, these jobs then run a sequence of other MPI
applications which fail in MPI_Init. I am using MPICH2 v1.08 under Windows XP.
T</FONT><FONT face=Arial size=2>he </FONT><FONT face=Arial size=2>MPI program
(FARMER) spawns two executables (JOB) using MPI_Comm_spawn. These executables
(JOB) then run (using CreateProcess) another MPI application
(APP). mpiexec is used to start all the MPI executables.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT><FONT face=Arial size=2></FONT><FONT
face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>APP crashes (times out) in it's call
to MPI_Init with the following message:</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>MPIDU_Sock_post_connect failed.<BR>Fatal error in
MPI_Init: Other MPI error, error stack:<BR>MPIR_Init_thread(294): Initialization
failed<BR>MPID_Init(83)........: channel initialization
failed<BR>MPID_Init(334).......: PMI_Init returned -1[0] PMI_ConnectToHost
failed: unable<BR>to post a connect to Yellowtail yellowtail :1061, error:
Unknown<BR>error class, error stack:<BR>MPIDU_Sock_post_connect(1228): unable to
connect to Yellowtail yellowtail </FONT></DIV>
<DIV><FONT face=Arial
size=2>
on port 1061, exhausted all endpoints (errno
-1)<BR>MPIDU_Sock_post_connect(1275): unable to connect
to
on port 1061,<BR> No connection could be made because the target
machine actively refused it. (er<BR>rno 10061)<BR>MPIDU_Sock_post_connect(1275):
unable to connect to yellowtail on port 1061, No<BR>connection could be made
because the target machine actively refused it.
(errno<BR>10061)<BR>MPIDU_Sock_post_connect(1275): unable to connect to
Yellowtail on port 1061, No<BR>connection could be made because the target
machine actively refused it. (errno<BR>10061)<BR>uPMI_ConnectToHost returning
PMI_FAIL<BR>[0] PMI_Init failed.<BR></DIV></FONT>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I think the crash is caused by MPI_Init in
APP trying to synchronise with MPI_Comm_spawn in FARMER. There is
no need for this from the farming codes point of view because
APP is independent of both FARMER and JOB. The farming works OK if I
replace APP with a version that doesn't use MPI. </FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Thanks for your help,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Rod Cook</FONT></DIV>
<DIV> </DIV></BODY></HTML>