[MPICH] -nolocal in MPICH2?

Ralph Butler rbutler at mtsu.edu
Tue Jul 17 16:39:33 CDT 2007


On TueJul 17, at Tue Jul 17 4:20PM, Milo wrote:

> Thanks for the tip Ralph. I don't know how I missed that  
> argument :/ I think
> I bypassed it in my skimming subconsciously because I was getting  
> confused
> with the -1 argument to mpdboot. Anyway, I'm curious if this will  
> actually
> work for my small cluster. What if there is wraparound in the mpd  
> ring? i.e:
> have 7 hosts in the ring(including the local node where mpdboot was
> executed), and start a job with "-n 40". The "-1" switch would  
> definitely
> stop process 0 from starting on the local node, but what about  
> process 6?
> This being the first process in the first mpd ring wraparound  
> phase, would
> it get assigned to my local node or would the -1 handle these cases  
> properly
> and skip the local node in this and subsequent wraparounds?  I ask  
> because
> the description explicitly says "1st proc".
> I read a little about the --ncpus=n option for mpdboot, which could  
> be used
> to avoid any kind of wraparound. But that means I'd have to restart  
> mpd with
> a different "n" every time I needed to run a different number of  
> processes.
> (i.e: "-n 36" would require --ncpus=6 for my 6 execution nodes, but  
> "-n 30"
> would need --ncpus=5 to avoid wraparound).
>

Yes, wrap-around could be a problem.  As I mentioned before, I was  
not totally
sure what you wanted to do.  I note that Rajeev has sent a msg  
indicating that
the -machinefile option may be more what you need.  The 1.0.5p4  
version of
mpiexec has a bug in that option that has been fixed in cvs and  
should be in
the next release.  It may or may not affect you.


> Anyway, I just thought that the use of execution nodes not being  
> assigned
> jobs was common practice in production clusters and thus thought  
> configuring
> a specific node to act as a execution host wouldn't be overly  
> complicated
> (restarting my mpd ring for every job seems undersirable).
>
> -Milo
>
>
> -----Original Message-----
> From: Ralph Butler [mailto:rbutler at mtsu.edu]
> Sent: Tuesday, July 17, 2007 4:31 PM
> To: Milo
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] -nolocal in MPICH2?
>
> I am not totally clear on what you want to do.  Using the -h option
> to mpiexec shows this option:
>       -1    # override default of trying 1st proc locally
> Below is a demo where I did NOT use -1 on the first run and the first
> process runs locally, i.e. on bp400.
> The second run uses -1 and causes all processes to run on subsequent
> hosts in the mpd ring.  This
> may be sufficient for what you need.
>
> --ralph
>
> (bp400:55) % mpiexec -n 4 hostname
> bp400
> bp403
> bp416
> bp413
> (bp400:56) % mpiexec -1 -n 4 hostname
> bp403
> bp413
> bp416
> bp414
>
> On TueJul 17, at Tue Jul 17 2:24PM, Milo wrote:
>
>> Hi guys. After some tweaking, I got the code I needed to compile
>> (and link properly) with MPICH2 (1.0.5p4) under OSX 10.4.  All my
>> initial ring tests worked, and everything seems to be working just
>> fine. Except apparently neither mpiexec or mpirun have the -nolocal
>> switch anymore. What do I need to do to configure the node I'm
>> launching jobs from be strictly an execution node. I wouldn't mind
>> if this node was used as the ssh dissemination point, I just don't
>> want it to actually do any work on the job being launched.
>>
>> -Milo
>>
>




More information about the mpich-discuss mailing list