[mpich-discuss] modifying the round-robin

Gus Correa gus at ldeo.columbia.edu
Thu Dec 11 17:11:13 CST 2008

Hello Benjamin, list

My suggestion is to install a job queueing and scheduling system on your 
which takes care of resource availability for you,
directs jobs to idle nodes when they become available,
and allows further control of who / where / when the jobs run.
I've been using Torque/PBS, and the Maui job scheduler,
which are excellent and free on Linux:

There are also RPMs for some Linux distributions.
If you run jobs on a routine basis, or if there are many users on this 
it is worth the effort, even for a small cluster.
You don't have to baby-sit the jobs.

A more complex tool is the Sun Grid Engine (SGE).

I hope this helps.

Gus Correa

Gustavo Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA

Ralph Butler wrote:
> With the correct set of cmd-line options, you can accomplish lots of 
> scenarios. But, it may make for some
> long cmd lines. I have a set of machines named bp400-bp415 in an mpd 
> ring. I can use the -machinefile
> option to mpiexec to map processes to hosts. For example, below is a 
> demo where I want to run 4 processes
> and I want the first 2 to run on bp402 and the next 2 to run on bp404.
> --ralph
> (bp400:56)% cat tempmf
> bp402:2
> bp404:2
> (bp400:57)% mpiexec -l -machinefile tempmf -n 4 hostname | sort
> 0: bp402
> 1: bp402
> 2: bp404
> 3: bp404
> On ThuDec 11, at Thu Dec 11 6:41AM, Benjamin Svetitsky wrote:
>> Thanks, the -1 option indeed gets all 4 processes to run on nodeB. 
>> But then if I start another -n 4 job, it goes to nodeB as well. Is 
>> there a way to get mpd to do load balancing here (namely to send all 
>> 4 to the next node in line) without specifying the node in the 
>> mpiexec command?
>> Ben
>> Rajeev Thakur wrote:
>>> Since you are running from node C, MPD will place the first process 
>>> on node
>>> C by default. You can turn that feature off with the "-1" option to 
>>> mpiexec.
>>> Rajeev
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Benjamin 
>>>> Svetitsky
>>>> Sent: Wednesday, December 10, 2008 2:41 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [mpich-discuss] modifying the round-robin
>>>> Greetings,
>>>> I am running MPICH2 on a cluster of four quad-core machines under 
>>>> Linux. If I run a job such as
>>>> mpiexec -l -n 4 hostname
>>>> then one process runs on each node, whereas I would prefer that all 
>>>> four run on the same node. I tried modifying mpd.hosts to read:
>>>> nodeA:4
>>>> nodeB:4
>>>> nodeC:4
>>>> nodeD:4
>>>> but the result is not what I expected:
>>>> nodeC% mpiexec -l -n 4 hostname
>>>> 0: nodeC
>>>> 3: nodeB
>>>> 2: nodeB
>>>> 1: nodeB
>>>> How can I get the mpd to fill the hosts one by one reliably?
>>>> Incidentally, the :4 option is not documented in the Installation 
>>>> Guide. I picked it up in the gutter. If it doesn't do this, what 
>>>> DOES it do?
>>>> Thanks,
>>>> Ben
>>>> -- 
>>>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>>>> School of Physics and Astronomy Fax: +972-3-640 7932
>>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>> -- 
>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>> School of Physics and Astronomy Fax: +972-3-640 7932
>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs

More information about the mpich-discuss mailing list