<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=WINDOWS-1255">
<TITLE>How do I get the communicator of the spawned group in the spawnee?</TITLE>
<META content="MSHTML 6.00.2715.400" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=062395504-05072005>Hello
Rajeev,</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=062395504-05072005> It's interesting that if I send
signal -9 to any one of the children after calling disconnect on the comworld of
the parent, all the children die gracefully, the parent remains alive and I can
restart the children, I tried this many times and didn't see any problems, so it
seems like the infrastructure is in place to handle this kind of thing. Is MPICH
an open source project? I mean if I changed the code to be "fault
tolerent" in this situation would you consider adding the changes to the code
base? Is there a way to allow this kind of behavior within the bounds of the
"official" 2.0 standard? The two behavours I "need"
are:</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=062395504-05072005></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=062395504-05072005>1)
Ability to kill and restart all the children without affecting the parent (this
is in case a child goes into a near infinite loop on an
algorithm).</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=062395504-05072005>2)
That if one child dies all the children will die without affecting the
parent.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=062395504-05072005></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=062395504-05072005>Since
our application runs user code not under our control these are "essential"
features for us. Unfortunetly windows compatibility is another "essential"
feature so we are somewhat limited in our choice of MPI implementations.
</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=062395504-05072005></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=062395504-05072005>Regards,</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=062395504-05072005>David</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Rajeev Thakur
[mailto:thakur@mcs.anl.gov]<BR><B>Sent:</B> Monday, July 04, 2005 6:58
PM<BR><B>To:</B> David Minor<BR><B>Subject:</B> RE: [MPICH] How do I get the
communicator of the spawned group in the spawnee?<BR><BR></FONT></DIV>
<DIV dir=ltr align=left><SPAN class=804452515-04072005><FONT face=Arial
color=#0000ff size=2>David,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=804452515-04072005><FONT face=Arial
color=#0000ff size=2> The
communicator passed to MPI_Abort must be a valid communicator on the process
calling MPI_Abort. Therefore, you cannot abort only the child. However, a
child could die on its own</FONT></SPAN><SPAN class=804452515-04072005><FONT
face=Arial color=#0000ff size=2>, and one would like this case to be
handled gracefully, without taking down the parent. This is up to the
implementation to handle. A "fault tolerant" implementation will try to do
this. MPICH2 doesn't support it yet, but we hope to do it sometime in the
future.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=804452515-04072005><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=804452515-04072005><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=804452515-04072005><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> David Minor
[mailto:david-m@orbotech.com] <BR><B>Sent:</B> Monday, July 04, 2005 12:11
AM<BR><B>To:</B> 'Rajeev Thakur'<BR><B>Subject:</B> RE: [MPICH] How do I get
the communicator of the spawned group in the spawnee?<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=285140705-04072005><FONT face=Arial color=#0000ff
size=2>Hello Rajeev,</FONT></SPAN></DIV>
<DIV><SPAN class=285140705-04072005><FONT face=Arial color=#0000ff
size=2>Using the intercommunicator I can communicate with the spawned
processes, but I cannot call an abort on them without aborting the parent. I
would like for the spawned proceeese to be able to crash, or be
aborted, without crashing the parent process, which could then spawn
them again. I thought that if the parent process could get a communicator
only to the spawned processes I'd be able to do this. </FONT></SPAN></DIV>
<DIV><SPAN class=285140705-04072005><FONT face=Arial color=#0000ff
size=2>David</FONT></SPAN></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Rajeev Thakur
[mailto:thakur@mcs.anl.gov]<BR><B>Sent:</B> Sunday, July 03, 2005 6:48
PM<BR><B>To:</B> David Minor; mpich-discuss@mcs.anl.gov<BR><B>Subject:</B>
RE: [MPICH] How do I get the communicator of the spawned group in the
spawnee?<BR><BR></FONT></DIV>
<DIV dir=ltr align=left><SPAN class=505214415-03072005><FONT face=Arial
color=#0000ff size=2>The intercommunicator returned by MPI_Comm_spawn is
the one you are looking for.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=505214415-03072005><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=505214415-03072005><FONT face=Arial
color=#0000ff size=2>MPI_Comm_get_parent on the spawned
processes returns an intercommunicator that has the spawned
processes in one group and the parent processes in the other group.
MPI_Comm_spawn on the spawnees returns the same intercommunicator, which
can be used for communication with the spawned
processes.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=505214415-03072005><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=505214415-03072005><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=505214415-03072005><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>David
Minor<BR><B>Sent:</B> Sunday, July 03, 2005 9:15 AM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> [MPICH] How do I get the
communicator of the spawned group in the spawnee?<BR></FONT><BR></DIV>
<DIV></DIV>
<P><FONT face=Arial size=2>Hello List,</FONT> </P>
<P><FONT face=Arial size=2>intercomm.Get_parent() from the spawned
processes returns me the communicator of the spawnee, but how do I get
the communicator of the</FONT></P>
<P><FONT face=Arial size=2>spawned processes from the spawnee?
intercomm.Get_remote_group() returns me the group, but how do I get the
communicator?</FONT></P>
<P><FONT face=Arial size=2>Thanks,</FONT> <BR><FONT face=Arial
size=2>David Minor</FONT>
</P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>