[mpich-discuss] modifying the round-robin

Gus Correa gus at ldeo.columbia.edu
Thu Dec 11 17:11:13 CST 2008


Hello Benjamin, list

My suggestion is to install a job queueing and scheduling system on your 
cluster,
which takes care of resource availability for you,
directs jobs to idle nodes when they become available,
and allows further control of who / where / when the jobs run.
I've been using Torque/PBS, and the Maui job scheduler,
which are excellent and free on Linux:
http://www.clusterresources.com/pages/products.php

There are also RPMs for some Linux distributions.
If you run jobs on a routine basis, or if there are many users on this 
cluster,
it is worth the effort, even for a small cluster.
You don't have to baby-sit the jobs.

A more complex tool is the Sun Grid Engine (SGE).
http://gridengine.sunsource.net/

I hope this helps.

Gus Correa

---------------------------------------------------------------------
Gustavo Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Ralph Butler wrote:
> With the correct set of cmd-line options, you can accomplish lots of 
> scenarios. But, it may make for some
> long cmd lines. I have a set of machines named bp400-bp415 in an mpd 
> ring. I can use the -machinefile
> option to mpiexec to map processes to hosts. For example, below is a 
> demo where I want to run 4 processes
> and I want the first 2 to run on bp402 and the next 2 to run on bp404.
> --ralph
>
> (bp400:56)% cat tempmf
> bp402:2
> bp404:2
> (bp400:57)% mpiexec -l -machinefile tempmf -n 4 hostname | sort
> 0: bp402
> 1: bp402
> 2: bp404
> 3: bp404
>
>
> On ThuDec 11, at Thu Dec 11 6:41AM, Benjamin Svetitsky wrote:
>
>> Thanks, the -1 option indeed gets all 4 processes to run on nodeB. 
>> But then if I start another -n 4 job, it goes to nodeB as well. Is 
>> there a way to get mpd to do load balancing here (namely to send all 
>> 4 to the next node in line) without specifying the node in the 
>> mpiexec command?
>>
>> Ben
>>
>> Rajeev Thakur wrote:
>>> Since you are running from node C, MPD will place the first process 
>>> on node
>>> C by default. You can turn that feature off with the "-1" option to 
>>> mpiexec.
>>> Rajeev
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Benjamin 
>>>> Svetitsky
>>>> Sent: Wednesday, December 10, 2008 2:41 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [mpich-discuss] modifying the round-robin
>>>>
>>>> Greetings,
>>>>
>>>> I am running MPICH2 on a cluster of four quad-core machines under 
>>>> Linux. If I run a job such as
>>>>
>>>> mpiexec -l -n 4 hostname
>>>>
>>>> then one process runs on each node, whereas I would prefer that all 
>>>> four run on the same node. I tried modifying mpd.hosts to read:
>>>>
>>>> nodeA:4
>>>> nodeB:4
>>>> nodeC:4
>>>> nodeD:4
>>>>
>>>> but the result is not what I expected:
>>>>
>>>> nodeC% mpiexec -l -n 4 hostname
>>>> 0: nodeC
>>>> 3: nodeB
>>>> 2: nodeB
>>>> 1: nodeB
>>>>
>>>> How can I get the mpd to fill the hosts one by one reliably?
>>>>
>>>> Incidentally, the :4 option is not documented in the Installation 
>>>> Guide. I picked it up in the gutter. If it doesn't do this, what 
>>>> DOES it do?
>>>>
>>>> Thanks,
>>>> Ben
>>>> -- 
>>>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>>>> School of Physics and Astronomy Fax: +972-3-640 7932
>>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>>>>
>>
>> -- 
>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>> School of Physics and Astronomy Fax: +972-3-640 7932
>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs




More information about the mpich-discuss mailing list