[MPICH] one shot jobs in mpich2?

Reuti reuti at staff.uni-marburg.de
Fri Jun 17 09:09:43 CDT 2005


Hi Benjamin,

you could compile MPICH2 for smpd instead of mpd and use it in a 
daemonless mode. So the behavior would be similar to MPICH 1.2.6 with 
the ch_p4 device.

Another possibility is to start a smpd per node for each job. So 
shutting down the daemons belonging to one job will not interfere with 
the other job.

Although it is intended for use with SGE, you can have a look at the 
Howto for MPICH2 integration and maybe get some ideas for your usage (or 
use SGE to handle the jobs ;-) ).

http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html

I didn't found a way to integrate the mpd method into SGE, as it creates 
many new processgroups, which prevents a proper shutdown of a job in 
case that you issue a qdel for it.

Cheers - Reuti


Benjamin Rutt wrote:
> Let's say I have a dynamically (well, between every job) changing
> hosts file that I want to run jobs on, on a linux cluster.  Doing this
> in mpich 1.2.6 is straightforward.  Let's say I have a variable $HF
> and $HFSIZE that holds a filename which lists the hosts to run on, and
> the number of hosts, respectively.  Then, running this in mpich1.2.6
> is as simple as:
> 
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 1
> 
>     [...HF, HFSIZE changes...]
> 
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram # job 2
> 
> However, in mpich2, it seems that I would need to manage the MPD
> daemons:
> 
>     mpdallexit # ensure nothing running previously
>     mpdboot -n $HFSIZE -f $HF
>     mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 1
> 
>     [...HF, HFSIZE changes...]
> 
>     mpdallexit # ensure nothing running previously
>     mpdboot -n $HFSIZE -f $HF
>     mpiexec -machinefile $HF -np $HFSIZE ./myprogram # job 2
> 
> Is that the only way to do what I want in mpich2?  Or am I missing
> something?  Since $HF may change between runs, it seems that I must
> restart the MPD daemons every time.  Is there another startup
> mechanism other than MPD in mpich2 which is more suited to one-shot
> jobs?
> 
> Finally, is my mpich2 approach even feasible, if I want to run both
> jobs simultaneously?  E.g.  with mpich1.2.6, I could just do:
> 
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram &
> 
>     [...HF, HFSIZE changes...]
> 
>     mpirun -np $HFSIZE -machinefile $HF ./myprogram &
> 
> What would the equivalent look like in mpich2?  I'm thinking here that
> my 'mpdallexit' above precludes running both jobs simultaneously, but
> maybe there's a workaround?
> 
> Thanks,




More information about the mpich-discuss mailing list