[MPICH] -nolocal switch not working

Rajeev Thakur thakur at mcs.anl.gov
Thu Jul 12 16:24:17 CDT 2007


This might be a bug in MPICH-1. Can you use MPICH2 instead?
 
Rajeev


  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Milo
Sent: Thursday, July 12, 2007 10:43 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] -nolocal switch not working



Hi Guys, I'm having a problem with the -nolocal switch. I want my cluster
headnode, not to do any number-crunching, but just be use as an execution
node. If I use the -nolocal switch, the job runs only on 1 process, no
matter how many I specify with -np.      Some details:

If I have the headnode (SIB) in my machines file, it get's assigned process
zero, and then mpirun starts cycling through the machines file line by line,
and allocated another 2 processes to SIB ONTOP of process 0:

>>SIB:/mpich/examples sharcnet$ mpirun -np 6 -machinefile machines cpi

Process 0 on sib

Process 3 on node2

Process 2 on node1

Process 5 on node1

Process 1 on sib

Process 4 on sib

pi is approximately 3.1416009869231249, Error is 0.0000083333333318

wall clock time = 0.003049



If I leave SIB out of te machines file, it doesn't get assigned the 2
addition processes, but still gets process 0, which isn't just a
dissemination process, it does real number-crunching as part of the job
(what I don't want).  If I use the -noloca command, I get the following
output:

        >> mpirun -nolocal -np 4 -machinefile machines cpi

Process 0 on node1

pi is approximately 3.1416009869231254, Error is 0.0000083333333323

wall clock time = 0.000119



I tried running it with the -t switch to test only, and under that
condition, it seems to show me it SHOULD work fine:

        >> mpirun  -t -nolocal -np 4 -machinefile machines cpi

Procgroup file:

node1 0 /mpich/examples/cpi

node2 1 /mpich/examples/cpi

node1 1 /mpich/examples/cpi

node2 1 /mpich/examples/cpi

ssh node1 "/mpich/examples/cpi"  -p4pg "/mpich/examples/PI14147" -p4wd
"/mpich/examples"

Yet from the second console clip, you can see it clearly doesn't work.

Any idea? I've done a lot of searching, and can't find an answer.  I am
running a Mac cluster with intel chips and OS X 10.4, Mpich version 1.2.7p1.
I found a mailing list thread from 2004 with the exact same problem on
Sparc's and SUSE (
<http://www.beowulf.org/archive/2004-December/011510.html>
http://www.beowulf.org/archive/2004-December/011510.html), no solution. 

-Milo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070712/d0e49baa/attachment.htm>


More information about the mpich-discuss mailing list