[MPICH] -nolocal in MPICH2?
Ralph Butler
rbutler at mtsu.edu
Tue Jul 17 16:39:33 CDT 2007
On TueJul 17, at Tue Jul 17 4:20PM, Milo wrote:
> Thanks for the tip Ralph. I don't know how I missed that
> argument :/ I think
> I bypassed it in my skimming subconsciously because I was getting
> confused
> with the -1 argument to mpdboot. Anyway, I'm curious if this will
> actually
> work for my small cluster. What if there is wraparound in the mpd
> ring? i.e:
> have 7 hosts in the ring(including the local node where mpdboot was
> executed), and start a job with "-n 40". The "-1" switch would
> definitely
> stop process 0 from starting on the local node, but what about
> process 6?
> This being the first process in the first mpd ring wraparound
> phase, would
> it get assigned to my local node or would the -1 handle these cases
> properly
> and skip the local node in this and subsequent wraparounds? I ask
> because
> the description explicitly says "1st proc".
> I read a little about the --ncpus=n option for mpdboot, which could
> be used
> to avoid any kind of wraparound. But that means I'd have to restart
> mpd with
> a different "n" every time I needed to run a different number of
> processes.
> (i.e: "-n 36" would require --ncpus=6 for my 6 execution nodes, but
> "-n 30"
> would need --ncpus=5 to avoid wraparound).
>
Yes, wrap-around could be a problem. As I mentioned before, I was
not totally
sure what you wanted to do. I note that Rajeev has sent a msg
indicating that
the -machinefile option may be more what you need. The 1.0.5p4
version of
mpiexec has a bug in that option that has been fixed in cvs and
should be in
the next release. It may or may not affect you.
> Anyway, I just thought that the use of execution nodes not being
> assigned
> jobs was common practice in production clusters and thus thought
> configuring
> a specific node to act as a execution host wouldn't be overly
> complicated
> (restarting my mpd ring for every job seems undersirable).
>
> -Milo
>
>
> -----Original Message-----
> From: Ralph Butler [mailto:rbutler at mtsu.edu]
> Sent: Tuesday, July 17, 2007 4:31 PM
> To: Milo
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] -nolocal in MPICH2?
>
> I am not totally clear on what you want to do. Using the -h option
> to mpiexec shows this option:
> -1 # override default of trying 1st proc locally
> Below is a demo where I did NOT use -1 on the first run and the first
> process runs locally, i.e. on bp400.
> The second run uses -1 and causes all processes to run on subsequent
> hosts in the mpd ring. This
> may be sufficient for what you need.
>
> --ralph
>
> (bp400:55) % mpiexec -n 4 hostname
> bp400
> bp403
> bp416
> bp413
> (bp400:56) % mpiexec -1 -n 4 hostname
> bp403
> bp413
> bp416
> bp414
>
> On TueJul 17, at Tue Jul 17 2:24PM, Milo wrote:
>
>> Hi guys. After some tweaking, I got the code I needed to compile
>> (and link properly) with MPICH2 (1.0.5p4) under OSX 10.4. All my
>> initial ring tests worked, and everything seems to be working just
>> fine. Except apparently neither mpiexec or mpirun have the -nolocal
>> switch anymore. What do I need to do to configure the node I'm
>> launching jobs from be strictly an execution node. I wouldn't mind
>> if this node was used as the ssh dissemination point, I just don't
>> want it to actually do any work on the job being launched.
>>
>> -Milo
>>
>
More information about the mpich-discuss
mailing list