<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>-nolocal switch not working</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16481" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=248032421-12072007><FONT face=Arial
color=#0000ff size=2>This might be a bug in MPICH-1. Can you use MPICH2
instead?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=248032421-12072007><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=248032421-12072007><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of
</B>Milo<BR><B>Sent:</B> Thursday, July 12, 2007 10:43 AM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> [MPICH] -nolocal switch not
working<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/rtf format -->
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Hi Guys,</FONT></SPAN><SPAN
lang=en-us> <FONT face=Calibri>I’</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>m having a problem with the</FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri>–</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>nolocal
switch.</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> I want my cluster
headnode, not to do any number-crunching, but just be use as an execution
node. If I use the</FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri>–</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>nolocal</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> switch,
the job runs only on 1 process, no matter how many I specify
with</FONT></SPAN><SPAN lang=en-us> <FONT face=Calibri>–</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>np.</FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri></FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri> </FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri> Some details:</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>If I have the headnode (SIB) in
my machines file, it get</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>’</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>s assigned
process zero, and then mpirun starts cycling through the machines file line by
line</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>, and allocated
another</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> 2 processes to
SIB</FONT></SPAN><SPAN lang=en-us> <FONT face=Calibri>ONTOP</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri> of process 0</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>:</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>>></FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>SIB:/mpich/examples sharcnet$ mpirun -np 6
-machinefile machines cpi</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Process 0 on
sib</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Process 3 on
node2</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Process 2 on
node1</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Process 5 on
node1</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Process 1 on
sib</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Process 4 on
sib</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>pi is approximately
3.1416009869231249, Error is 0.0000083333333318</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>wall clock time =
0.003049</FONT></SPAN><SPAN lang=en-us></SPAN></P>
<P dir=ltr><SPAN lang=en-us></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>If I leave SIB out of te
machines file, it doesn</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>’</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>t get assigned
the 2 addition processes, but still gets process 0, which
isn</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>’</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>t just a dissemination process, it does real
num</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>b</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>er-crunching</FONT></SPAN><SPAN lang=en-us>
<FONT face=Calibri>as part of the job (what I</FONT></SPAN><SPAN lang=en-us>
<FONT face=Calibri>don’t</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>
want). If I use the</FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri>–</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>noloca
command, I get the following output:</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us> <FONT
face=Calibri>>></FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> mpirun
-nolocal -np 4 -machinefile machines cpi</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Process 0 on
node1</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>pi is approximately
3.1416009869231254, Error is 0.0000083333333323</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>wall clock time =
0.000119</FONT></SPAN><SPAN lang=en-us></SPAN></P>
<P dir=ltr><SPAN lang=en-us></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>I tried running it with
the</FONT></SPAN><SPAN lang=en-us> <FONT face=Calibri>–</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>t switch to test only, and under that condition,
it seems</FONT></SPAN><SPAN lang=en-us> <FONT face=Calibri>to show me it
SHOULD work fine</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>:</FONT></SPAN><SPAN lang=en-us></SPAN></P>
<P dir=ltr><SPAN lang=en-us> <FONT
face=Calibri>>></FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri>mpirun -t -nolocal -np 4 -machinefile machines
cpi</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Procgroup
file:</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>node1 0
/mpich/examples/cpi</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>node2 1
/mpich/examples/cpi</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>node1 1
/mpich/examples/cpi</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>node2 1
/mpich/examples/cpi</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>ssh node1
"/mpich/examples/cpi" -p4pg "/mpich/examples/PI14147" -p4wd
"/mpich/examples"</FONT></SPAN><SPAN lang=en-us></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Yet from the second console
clip, you can see it clearly doesn</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>’</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>t
work.</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>Any idea? I</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>’</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>ve done</FONT></SPAN><SPAN lang=en-us> <FONT face=Calibri>a
lot</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> of searching, and
can</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>’</FONT></SPAN><SPAN
lang=en-us><FONT face=Calibri>t find an answer. I am running a Mac
cluster with</FONT></SPAN><SPAN lang=en-us> <FONT face=Calibri>intel
chips</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> and OS X 10.4, Mpich
version 1.2.7p1</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>.</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> I found a
mailing list</FONT></SPAN><SPAN lang=en-us> <FONT
face=Calibri>thread</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri> from
2004 with the exact same problem on</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri> Sparc</FONT></SPAN><SPAN lang=en-us><FONT
face=Calibri>’</FONT></SPAN><SPAN lang=en-us><FONT face=Calibri>s and SUSE
(</FONT></SPAN><SPAN lang=en-us></SPAN><A
href="http://www.beowulf.org/archive/2004-December/011510.html"><SPAN
lang=en-us><U><FONT face=Calibri
color=#0000ff>http://www.beowulf.org/archive/2004-December/011510.html</FONT></U></SPAN><SPAN
lang=en-us></SPAN></A><SPAN lang=en-us><FONT face=Calibri>), no solution.
</FONT></SPAN></P>
<P dir=ltr><SPAN lang=en-us><FONT face=Calibri>-Milo</FONT></SPAN><SPAN
lang=en-us></SPAN></P></BLOCKQUOTE></BODY></HTML>