[MPICH] one shot jobs in mpich2?
Benjamin Rutt
rutt at bmi.osu.edu
Fri Jun 17 08:23:31 CDT 2005
Let's say I have a dynamically (well, between every job) changing
hosts file that I want to run jobs on, on a linux cluster. Doing this
in mpich 1.2.6 is straightforward. Let's say I have a variable $HF
and $HFSIZE that holds a filename which lists the hosts to run on, and
the number of hosts, respectively. Then, running this in mpich1.2.6
is as simple as:
mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 1
[...HF, HFSIZE changes...]
mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 2
However, in mpich2, it seems that I would need to manage the MPD
daemons:
mpdallexit # ensure nothing running previously
mpdboot -n $HFSIZE -f $HF
mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 1
[...HF, HFSIZE changes...]
mpdallexit # ensure nothing running previously
mpdboot -n $HFSIZE -f $HF
mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 2
Is that the only way to do what I want in mpich2? Or am I missing
something? Since $HF may change between runs, it seems that I must
restart the MPD daemons every time. Is there another startup
mechanism other than MPD in mpich2 which is more suited to one-shot
jobs?
Finally, is my mpich2 approach even feasible, if I want to run both
jobs simultaneously? E.g. with mpich1.2.6, I could just do:
mpirun -np $HFSIZE -machinefile $HF ./myprogram &
[...HF, HFSIZE changes...]
mpirun -np $HFSIZE -machinefile $HF ./myprogram &
What would the equivalent look like in mpich2? I'm thinking here that
my 'mpdallexit' above precludes running both jobs simultaneously, but
maybe there's a workaround?
Thanks,
--
Benjamin
More information about the mpich-discuss
mailing list