[MPICH] one shot jobs in mpich2?
Ralph M. Butler
rbutler at mtsu.edu
Fri Jun 17 09:52:55 CDT 2005
I am sitting in an Internet-cafe sort of place far from home,
so this email may fail to get through. But, I wll give it a try.
You can start the mpd daemons on a set of machines. Then, you can
use the -machinefile option (or a set of -host options) to mpiexec
to determine where the ranks for a given job are placed. For
example, you can start a ring of mpds on machines named:
m1, m2, m3, m4, m5, m6
Then you can run a job with this machinefile:
m1
m3
m5
like this:
mpiexec -machinefile mf -n 3 mypgm
and the procs will run on the odd-numbered machines.
Then, you put this into mf:
m2
m4
m6
and run again with the procs executing on the even machines.
> Date: Fri, 17 Jun 2005 09:23:31 -0400
> From: Benjamin Rutt <rutt at bmi.osu.edu>
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] one shot jobs in mpich2?
>
> Let's say I have a dynamically (well, between every job) changing
> hosts file that I want to run jobs on, on a linux cluster. Doing this
> in mpich 1.2.6 is straightforward. Let's say I have a variable $HF
> and $HFSIZE that holds a filename which lists the hosts to run on, and
> the number of hosts, respectively. Then, running this in mpich1.2.6
> is as simple as:
>
> mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 1
>
> [...HF, HFSIZE changes...]
>
> mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 2
>
> However, in mpich2, it seems that I would need to manage the MPD
> daemons:
>
> mpdallexit # ensure nothing running previously
> mpdboot -n $HFSIZE -f $HF
> mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 1
>
> [...HF, HFSIZE changes...]
>
> mpdallexit # ensure nothing running previously
> mpdboot -n $HFSIZE -f $HF
> mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 2
>
> Is that the only way to do what I want in mpich2? Or am I missing
> something? Since $HF may change between runs, it seems that I must
> restart the MPD daemons every time. Is there another startup
> mechanism other than MPD in mpich2 which is more suited to one-shot
> jobs?
>
> Finally, is my mpich2 approach even feasible, if I want to run both
> jobs simultaneously? E.g. with mpich1.2.6, I could just do:
>
> mpirun -np $HFSIZE -machinefile $HF ./myprogram &
>
> [...HF, HFSIZE changes...]
>
> mpirun -np $HFSIZE -machinefile $HF ./myprogram &
>
> What would the equivalent look like in mpich2? I'm thinking here that
> my 'mpdallexit' above precludes running both jobs simultaneously, but
> maybe there's a workaround?
>
> Thanks,
> --
> Benjamin
>
>
>
More information about the mpich-discuss
mailing list