[MPICH] one shot jobs in mpich2?

Benjamin Rutt rutt at bmi.osu.edu
Fri Jun 17 08:23:31 CDT 2005


Let's say I have a dynamically (well, between every job) changing
hosts file that I want to run jobs on, on a linux cluster.  Doing this
in mpich 1.2.6 is straightforward.  Let's say I have a variable $HF
and $HFSIZE that holds a filename which lists the hosts to run on, and
the number of hosts, respectively.  Then, running this in mpich1.2.6
is as simple as:

    mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 1

    [...HF, HFSIZE changes...]

    mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 2

However, in mpich2, it seems that I would need to manage the MPD
daemons:

    mpdallexit # ensure nothing running previously
    mpdboot -n $HFSIZE -f $HF
    mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 1

    [...HF, HFSIZE changes...]

    mpdallexit # ensure nothing running previously
    mpdboot -n $HFSIZE -f $HF
    mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 2

Is that the only way to do what I want in mpich2?  Or am I missing
something?  Since $HF may change between runs, it seems that I must
restart the MPD daemons every time.  Is there another startup
mechanism other than MPD in mpich2 which is more suited to one-shot
jobs?

Finally, is my mpich2 approach even feasible, if I want to run both
jobs simultaneously?  E.g.  with mpich1.2.6, I could just do:

    mpirun -np $HFSIZE -machinefile $HF ./myprogram &

    [...HF, HFSIZE changes...]

    mpirun -np $HFSIZE -machinefile $HF ./myprogram &

What would the equivalent look like in mpich2?  I'm thinking here that
my 'mpdallexit' above precludes running both jobs simultaneously, but
maybe there's a workaround?

Thanks,
-- 
Benjamin




More information about the mpich-discuss mailing list