[MPICH] one shot jobs in mpich2?

Ralph M. Butler rbutler at mtsu.edu
Fri Jun 17 09:52:55 CDT 2005


I am sitting in an Internet-cafe sort of place far from home,
so this email may fail to get through.  But, I wll give it a try.

You can start the mpd daemons on a set of machines.  Then, you can
use the -machinefile option (or a set of -host options) to mpiexec
to determine where the ranks for a given job are placed.  For
example, you can start a ring of mpds on machines named:
m1, m2, m3, m4, m5, m6

Then you can run a job with this machinefile:
m1
m3
m5

like this:
    mpiexec -machinefile mf -n 3 mypgm

and the procs will run on the odd-numbered machines.

Then, you put this into mf:
m2
m4
m6

and run again with the procs executing on the even machines.

> Date: Fri, 17 Jun 2005 09:23:31 -0400
> From: Benjamin Rutt <rutt at bmi.osu.edu>
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] one shot jobs in mpich2?
>
> Let's say I have a dynamically (well, between every job) changing
> hosts file that I want to run jobs on, on a linux cluster.  Doing this
> in mpich 1.2.6 is straightforward.  Let's say I have a variable $HF
> and $HFSIZE that holds a filename which lists the hosts to run on, and
> the number of hosts, respectively.  Then, running this in mpich1.2.6
> is as simple as:
>
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 1
>
>     [...HF, HFSIZE changes...]
>
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 2
>
> However, in mpich2, it seems that I would need to manage the MPD
> daemons:
>
>     mpdallexit # ensure nothing running previously
>     mpdboot -n $HFSIZE -f $HF
>     mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 1
>
>     [...HF, HFSIZE changes...]
>
>     mpdallexit # ensure nothing running previously
>     mpdboot -n $HFSIZE -f $HF
>     mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 2
>
> Is that the only way to do what I want in mpich2?  Or am I missing
> something?  Since $HF may change between runs, it seems that I must
> restart the MPD daemons every time.  Is there another startup
> mechanism other than MPD in mpich2 which is more suited to one-shot
> jobs?
>
> Finally, is my mpich2 approach even feasible, if I want to run both
> jobs simultaneously?  E.g.  with mpich1.2.6, I could just do:
>
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram &
>
>     [...HF, HFSIZE changes...]
>
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram &
>
> What would the equivalent look like in mpich2?  I'm thinking here that
> my 'mpdallexit' above precludes running both jobs simultaneously, but
> maybe there's a workaround?
>
> Thanks,
> --
> Benjamin
>
>
>




More information about the mpich-discuss mailing list