[MPICH] -nolocal switch not working
Reuti
reuti at staff.uni-marburg.de
Thu Jul 12 16:35:32 CDT 2007
Am 12.07.2007 um 23:24 schrieb Rajeev Thakur:
> This might be a bug in MPICH-1. Can you use MPICH2 instead?
>
> Rajeev
>
> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-
> discuss at mcs.anl.gov] On Behalf Of Milo
> Sent: Thursday, July 12, 2007 10:43 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] -nolocal switch not working
>
> Hi Guys, I’m having a problem with the –nolocal switch. I want my
> cluster headnode, not to do any number-crunching, but just be use
> as an execution node. If I use the –nolocal switch, the job runs
> only on 1 process, no matter how many I specify with –np. Some
> details:
>
> If I have the headnode (SIB) in my machines file, it get’s assigned
> process zero, and then mpirun starts cycling through the machines
> file line by line, and allocated another 2 processes to SIB ONTOP
> of process 0:
>
> >>SIB:/mpich/examples sharcnet$ mpirun -np 6 -machinefile machines cpi
>
> Process 0 on sib
sib is also the response of the command:
hostname
or are you getting there the FQDN? - Reuti
> Process 3 on node2
>
> Process 2 on node1
>
> Process 5 on node1
>
> Process 1 on sib
>
> Process 4 on sib
>
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
>
> wall clock time = 0.003049
>
>
> If I leave SIB out of te machines file, it doesn’t get assigned the
> 2 addition processes, but still gets process 0, which isn’t just a
> dissemination process, it does real number-crunching as part of the
> job (what I don’t want). If I use the –noloca command, I get the
> following output:
>
> >> mpirun -nolocal -np 4 -machinefile machines cpi
>
> Process 0 on node1
>
> pi is approximately 3.1416009869231254, Error is 0.0000083333333323
>
> wall clock time = 0.000119
>
>
> I tried running it with the –t switch to test only, and under that
> condition, it seems to show me it SHOULD work fine:
>
> >> mpirun -t -nolocal -np 4 -machinefile machines cpi
>
> Procgroup file:
>
> node1 0 /mpich/examples/cpi
>
> node2 1 /mpich/examples/cpi
>
> node1 1 /mpich/examples/cpi
>
> node2 1 /mpich/examples/cpi
>
> ssh node1 "/mpich/examples/cpi" -p4pg "/mpich/examples/PI14147" -
> p4wd "/mpich/examples"
>
> Yet from the second console clip, you can see it clearly doesn’t work.
>
> Any idea? I’ve done a lot of searching, and can’t find an answer.
> I am running a Mac cluster with intel chips and OS X 10.4, Mpich
> version 1.2.7p1. I found a mailing list thread from 2004 with the
> exact same problem on Sparc’s and SUSE (http://www.beowulf.org/
> archive/2004-December/011510.html), no solution.
>
> -Milo
More information about the mpich-discuss
mailing list