<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Problem with -machinefile</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16414" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=100105517-20042007><FONT face=Arial
color=#0000ff size=2>Are you using the latest release, 1.0.5p4? There is a fix
in there for a problem with machinefile. It might
help.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=100105517-20042007><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=100105517-20042007><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=100105517-20042007><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Blankenship,
David<BR><B>Sent:</B> Friday, April 20, 2007 11:27 AM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> [MPICH] Problem with
-machinefile<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/rtf format -->
<P><FONT face=Arial size=2>I am having a problem running mpiexec with the
-machine file option. (Red Hat Enterprise 4 - 64 bit)</FONT> </P>
<P><FONT face=Arial size=2>When I use the -machinefile option, my application
hangs (deadlocks) while attempting communication. The master is sending, the
workers are receiving, but nothing happens. Any thoughts?</FONT></P>
<P><FONT face=Arial size=2>I start my MPD ring as follows:</FONT> </P>
<P><FONT face="Courier New" size=2>> mpdboot -n 3 -f mpd.hosts</FONT>
<BR><FONT face="Courier New" size=2>> cat mpd.hosts</FONT> <BR><FONT
face="Courier New" size=2>pad-lnx52:2</FONT> <BR><FONT face="Courier New"
size=2>noclue:2</FONT> <BR><FONT face="Courier New" size=2>question:4</FONT>
</P>
<P><FONT face=Arial size=2>I can then run my application with the -host option
or by letting the MPD ring choose the systems using either of the following
command lines:</FONT></P>
<P><FONT face="Courier New" size=2>> mpiexec -l -n 1 -host pad-lnx52
lithorun dev/LithoWare/Samples/FEM1D.xml Output.xml : -n 2 -host noclue
lithorun : -n 2 -host question lithorun</FONT></P>
<P><FONT face="Courier New" size=2>> mpiexec -l -n 5 lithorun
dev/LithoWare/Samples/FEM1D.xml Output.xml</FONT> </P>
<P><FONT face=Arial size=2>But when I try to use the -machine file option, my
application hangs. The master is sending; all of the workers are receiving,
but no communication appears to actually be happening.</FONT></P>
<P><FONT face="Courier New" size=2>> mpiexec -machinefile mpd.hosts -l -n 5
lithorun dev/LithoWare/Samples/FEM1D.xml Output.xml</FONT> </P>
<P><FONT face=Arial size=2>Here is trace of the process when it hangs. You can
see that the workers have been started and are waiting for a work packet in a
MPI::COMM_WORLD.Probe call. The master has divided up the work and is
attempting to send the first packet using a MPI::COMM_WORLD.Send call. Then,
nothing else happens. This only occurs when I am trying to use the
-machinefile option. </FONT></P>
<P><FONT face="Courier New" size=2>> mpiexec -machinefile mpd.hosts -l -n 5
lithorun dev/LithoWare/Samples/FEM1D.xml Output.xml</FONT> <BR><FONT
face="Courier New" size=2>3: Worker on noclue</FONT> <BR><FONT
face="Courier New" size=2>3: Waiting for work...</FONT> <BR><FONT
face="Courier New" size=2>2: Worker on noclue</FONT> <BR><FONT
face="Courier New" size=2>2: Waiting for work...</FONT> <BR><FONT
face="Courier New" size=2>1: Worker on pad-lnx52.kla-tencor.com</FONT>
<BR><FONT face="Courier New" size=2>1: Waiting for work...</FONT> <BR><FONT
face="Courier New" size=2>0: Master on pad-lnx52.kla-tencor.com</FONT>
<BR><FONT face="Courier New" size=2>0: Loading
dev/LithoWare/Samples/FEM1D.xml</FONT> <BR><FONT face="Courier New" size=2>0:
Found Factorial(FEM1D)</FONT> <BR><FONT face="Courier New" size=2>4: Worker on
question.kla-tencor.com</FONT> <BR><FONT face="Courier New" size=2>4: Waiting
for work...</FONT> <BR><FONT face="Courier New" size=2>0: Loading
Sample.plt</FONT> <BR><FONT face="Courier New" size=2>0: Distributing
Factorial(FEM1D) with 45 experiments over 4 processes with 3 work
packets</FONT> <BR><FONT face="Courier New" size=2>0: Sending
work(1625)</FONT> </P><BR>
<P><FONT face=Arial size=2>For a point of reference here is a trace of the
process when it works:</FONT> </P>
<P><FONT face="Courier New" size=2>> mpiexec -l -n 5 lithorun
dev/LithoWare/Samples/FEM1D.xml Output.xml</FONT> <BR><FONT face="Courier New"
size=2>0: Master on pad-lnx52.kla-tencor.com</FONT> <BR><FONT
face="Courier New" size=2>0: Loading dev/LithoWare/Samples/FEM1D.xml</FONT>
<BR><FONT face="Courier New" size=2>0: Found Factorial(FEM1D)</FONT> <BR><FONT
face="Courier New" size=2>0: Loading Sample.plt</FONT> <BR><FONT
face="Courier New" size=2>0: Distributing Factorial(FEM1D) with 45 experiments
over 4 processes with 3 work packets</FONT> <BR><FONT face="Courier New"
size=2>0: Sending work(1625)</FONT> <BR><FONT face="Courier New" size=2>1:
Worker on noclue</FONT> <BR><FONT face="Courier New" size=2>1: Waiting for
work...</FONT> <BR><FONT face="Courier New" size=2>2: Worker on noclue</FONT>
<BR><FONT face="Courier New" size=2>2: Waiting for work...</FONT> <BR><FONT
face="Courier New" size=2>3: Worker on question.kla-tencor.com</FONT>
<BR><FONT face="Courier New" size=2>0: Sent work(1625)</FONT> <BR><FONT
face="Courier New" size=2>0: Sending work(1619)</FONT> <BR><FONT
face="Courier New" size=2>3: Waiting for work...</FONT> <BR><FONT
face="Courier New" size=2>4: Worker on question.kla-tencor.com</FONT>
<BR><FONT face="Courier New" size=2>4: Waiting for work...</FONT> <BR><FONT
face="Courier New" size=2>4: Received work(1625)</FONT> <BR><FONT
face="Courier New" size=2>4: Found Factorial(FEM1D)</FONT> <BR><FONT
face="Courier New" size=2>4: Loading Sample.plt</FONT> <BR><FONT
face="Courier New" size=2>0: Sent work(1619)</FONT> <BR><FONT
face="Courier New" size=2>0: Sending work(1622)</FONT> <BR><FONT
face="Courier New" size=2>4: Running Factorial(FEM1D) with 15
experiments</FONT> <BR><FONT face="Courier New" size=2>3: Received
work(1619)</FONT> <BR><FONT face="Courier New" size=2>3: Found
Factorial(FEM1D)</FONT> <BR><FONT face="Courier New" size=2>3: Loading
Sample.plt</FONT> <BR><FONT face="Courier New" size=2>3: Running
Factorial(FEM1D) with 15 experiments</FONT> <BR><FONT face="Courier New"
size=2>0: Sent work(1622)</FONT> <BR><FONT face="Courier New" size=2>0:
Waiting for results...</FONT> <BR><FONT face="Courier New" size=2>2: Received
work(1622)</FONT> <BR><FONT face="Courier New" size=2>2: Found
Factorial(FEM1D)</FONT> <BR><FONT face="Courier New" size=2>2: Loading
Sample.plt</FONT> <BR><FONT face="Courier New" size=2>2: Running
Factorial(FEM1D) with 15 experiments</FONT> <BR><FONT face="Courier New"
size=2>0: Received results(1672)</FONT> <BR><FONT face="Courier New" size=2>0:
Waiting for results...</FONT> <BR><FONT face="Courier New" size=2>4:
Factorial(FEM1D) complete (0.04175)</FONT> <BR><FONT face="Courier New"
size=2>4: Sending results(1672)</FONT> <BR><FONT face="Courier New" size=2>4:
Waiting for work...</FONT> <BR><FONT face="Courier New" size=2>3:
Factorial(FEM1D) complete (0.0420239)</FONT> <BR><FONT face="Courier New"
size=2>0: Received results(1652)</FONT> <BR><FONT face="Courier New" size=2>0:
Waiting for results...</FONT> <BR><FONT face="Courier New" size=2>3: Sending
results(1652)</FONT> <BR><FONT face="Courier New" size=2>3: Waiting for
work...</FONT> <BR><FONT face="Courier New" size=2>2: Factorial(FEM1D)
complete (0.0852771)</FONT> <BR><FONT face="Courier New" size=2>0: Received
results(1400)</FONT> <BR><FONT face="Courier New" size=2>0: Factorial(FEM1D)
complete (0.136751)</FONT> <BR><FONT face="Courier New" size=2>2: Sending
results(1400)</FONT> <BR><FONT face="Courier New" size=2>2: Waiting for
work...</FONT> <BR><FONT face="Courier New" size=2>1: Received work(0)</FONT>
<BR><FONT face="Courier New" size=2>4: Received work(0)</FONT> <BR><FONT
face="Courier New" size=2>3: Received work(0)</FONT> <BR><FONT
face="Courier New" size=2>2: Received work(0)</FONT>
</P></BLOCKQUOTE></BODY></HTML>