[MPICH] -nolocal switch not working

Reuti reuti at staff.uni-marburg.de
Thu Jul 12 16:35:32 CDT 2007


Am 12.07.2007 um 23:24 schrieb Rajeev Thakur:

> This might be a bug in MPICH-1. Can you use MPICH2 instead?
>
> Rajeev
>
> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich- 
> discuss at mcs.anl.gov] On Behalf Of Milo
> Sent: Thursday, July 12, 2007 10:43 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] -nolocal switch not working
>
> Hi Guys, I’m having a problem with the –nolocal switch. I want my  
> cluster headnode, not to do any number-crunching, but just be use  
> as an execution node. If I use the –nolocal switch, the job runs  
> only on 1 process, no matter how many I specify with –np.      Some  
> details:
>
> If I have the headnode (SIB) in my machines file, it get’s assigned  
> process zero, and then mpirun starts cycling through the machines  
> file line by line, and allocated another 2 processes to SIB ONTOP  
> of process 0:
>
> >>SIB:/mpich/examples sharcnet$ mpirun -np 6 -machinefile machines cpi
>
> Process 0 on sib
sib is also the response of the command:

hostname

or are you getting there the FQDN? - Reuti

> Process 3 on node2
>
> Process 2 on node1
>
> Process 5 on node1
>
> Process 1 on sib
>
> Process 4 on sib
>
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
>
> wall clock time = 0.003049
>
>
> If I leave SIB out of te machines file, it doesn’t get assigned the  
> 2 addition processes, but still gets process 0, which isn’t just a  
> dissemination process, it does real number-crunching as part of the  
> job (what I don’t want).  If I use the –noloca command, I get the  
> following output:
>
>         >> mpirun -nolocal -np 4 -machinefile machines cpi
>
> Process 0 on node1
>
> pi is approximately 3.1416009869231254, Error is 0.0000083333333323
>
> wall clock time = 0.000119
>
>
> I tried running it with the –t switch to test only, and under that  
> condition, it seems to show me it SHOULD work fine:
>
>         >> mpirun  -t -nolocal -np 4 -machinefile machines cpi
>
> Procgroup file:
>
> node1 0 /mpich/examples/cpi
>
> node2 1 /mpich/examples/cpi
>
> node1 1 /mpich/examples/cpi
>
> node2 1 /mpich/examples/cpi
>
> ssh node1 "/mpich/examples/cpi"  -p4pg "/mpich/examples/PI14147" - 
> p4wd "/mpich/examples"
>
> Yet from the second console clip, you can see it clearly doesn’t work.
>
> Any idea? I’ve done a lot of searching, and can’t find an answer.   
> I am running a Mac cluster with intel chips and OS X 10.4, Mpich    
> version 1.2.7p1. I found a mailing list thread from 2004 with the  
> exact same problem on Sparc’s and SUSE (http://www.beowulf.org/ 
> archive/2004-December/011510.html), no solution.
>
> -Milo




More information about the mpich-discuss mailing list