[MPICH] -nolocal switch not working
Rajeev Thakur
thakur at mcs.anl.gov
Thu Jul 12 16:24:17 CDT 2007
This might be a bug in MPICH-1. Can you use MPICH2 instead?
Rajeev
_____
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Milo
Sent: Thursday, July 12, 2007 10:43 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] -nolocal switch not working
Hi Guys, I'm having a problem with the -nolocal switch. I want my cluster
headnode, not to do any number-crunching, but just be use as an execution
node. If I use the -nolocal switch, the job runs only on 1 process, no
matter how many I specify with -np. Some details:
If I have the headnode (SIB) in my machines file, it get's assigned process
zero, and then mpirun starts cycling through the machines file line by line,
and allocated another 2 processes to SIB ONTOP of process 0:
>>SIB:/mpich/examples sharcnet$ mpirun -np 6 -machinefile machines cpi
Process 0 on sib
Process 3 on node2
Process 2 on node1
Process 5 on node1
Process 1 on sib
Process 4 on sib
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.003049
If I leave SIB out of te machines file, it doesn't get assigned the 2
addition processes, but still gets process 0, which isn't just a
dissemination process, it does real number-crunching as part of the job
(what I don't want). If I use the -noloca command, I get the following
output:
>> mpirun -nolocal -np 4 -machinefile machines cpi
Process 0 on node1
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
wall clock time = 0.000119
I tried running it with the -t switch to test only, and under that
condition, it seems to show me it SHOULD work fine:
>> mpirun -t -nolocal -np 4 -machinefile machines cpi
Procgroup file:
node1 0 /mpich/examples/cpi
node2 1 /mpich/examples/cpi
node1 1 /mpich/examples/cpi
node2 1 /mpich/examples/cpi
ssh node1 "/mpich/examples/cpi" -p4pg "/mpich/examples/PI14147" -p4wd
"/mpich/examples"
Yet from the second console clip, you can see it clearly doesn't work.
Any idea? I've done a lot of searching, and can't find an answer. I am
running a Mac cluster with intel chips and OS X 10.4, Mpich version 1.2.7p1.
I found a mailing list thread from 2004 with the exact same problem on
Sparc's and SUSE (
<http://www.beowulf.org/archive/2004-December/011510.html>
http://www.beowulf.org/archive/2004-December/011510.html), no solution.
-Milo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070712/d0e49baa/attachment.htm>
More information about the mpich-discuss
mailing list