[MPICH] -nolocal switch not working

Milo stinger at rogers.com
Thu Jul 12 10:43:03 CDT 2007


Hi Guys, I'm having a problem with the -nolocal switch. I want my cluster
headnode, not to do any number-crunching, but just be use as an execution
node. If I use the -nolocal switch, the job runs only on 1 process, no
matter how many I specify with -np.      Some details:

If I have the headnode (SIB) in my machines file, it get's assigned process
zero, and then mpirun starts cycling through the machines file line by line,
and allocated another 2 processes to SIB ONTOP of process 0:
>>SIB:/mpich/examples sharcnet$ mpirun -np 6 -machinefile machines cpi
Process 0 on sib
Process 3 on node2
Process 2 on node1
Process 5 on node1
Process 1 on sib
Process 4 on sib
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.003049

If I leave SIB out of te machines file, it doesn't get assigned the 2
addition processes, but still gets process 0, which isn't just a
dissemination process, it does real number-crunching as part of the job
(what I don't want).  If I use the -noloca command, I get the following
output:
	>> mpirun -nolocal -np 4 -machinefile machines cpi
Process 0 on node1
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
wall clock time = 0.000119

I tried running it with the -t switch to test only, and under that
condition, it seems to show me it SHOULD work fine:
	>> mpirun  -t -nolocal -np 4 -machinefile machines cpi
Procgroup file:
node1 0 /mpich/examples/cpi
node2 1 /mpich/examples/cpi
node1 1 /mpich/examples/cpi
node2 1 /mpich/examples/cpi
ssh node1 "/mpich/examples/cpi"  -p4pg "/mpich/examples/PI14147" -p4wd
"/mpich/examples"

Yet from the second console clip, you can see it clearly doesn't work.
Any idea? I've done a lot of searching, and can't find an answer.  I am
running a Mac cluster with intel chips and OS X 10.4, Mpich version 1.2.7p1.
I found a mailing list thread from 2004 with the exact same problem on
Sparc's and SUSE (http://www.beowulf.org/archive/2004-December/011510.html),
no solution. 

-Milo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070712/df89833c/attachment.htm>


More information about the mpich-discuss mailing list