[MPICH] one shot jobs in mpich2?

Reuti reuti at staff.uni-marburg.de
Fri Jun 17 14:20:25 CDT 2005


Benjamin,

Quoting Benjamin Rutt <rutt at bmi.osu.edu>:

> Reuti <reuti at staff.uni-marburg.de> writes:
> 
> > Hi Benjamin,
> >
> > you could compile MPICH2 for smpd instead of mpd and use it in a
> > daemonless mode. So the behavior would be similar to MPICH 1.2.6 with
> > the ch_p4 device.
> 
> Can you give me some more information about this daemonless mode?  I'm
> able to run mpich2 jobs with smpd, I tested across up to three nodes,
> but I cannot find any information on daemonless operation.

I got the hint from the MPICH2 team, when I wrote the integration Howto into 
SGE, it's still not in the documentation of MPICH2. When you already compiled 
MPICH2 with smpd operation, you just have to use these lines to start the job 
(adjust to your environment):

export MPIEXEC_RSH=rsh
export PATH=/usr/mpich2_smpd/bin:$PATH
mpiexec -rsh -nopm -n $NSLOTS -machinefile $TMPDIR/machines ~/mpihello

The default is "ssh -x" AFAIR otherwise, but as SGE has private rshds dedicated 
to each job, it's safe to use the "rsh client" with "SGE's rshd" in a cluster.

> > Another possibility is to start a smpd per node for each job. So
> > shutting down the daemons belonging to one job will not interfere with
> > the other job.
> 
> I suppose this will work OK for me, given the ability to use the same
> open port across all machines (unique for that job+user, not just
> user), and given that I'd I write a wrapper script to run 'smpd -s'
> via ssh/rsh to all the nodes, capture their PIDs for later killing,
> run the job using mpiexec, and later on kill the captured smpd PIDs
> across on all nodes.  It's kind of sad/tragic to start a daemon for
> only one job, but hey, that is what I asked for.  :-)
> 
> Actually, how would I tell a given mpiexec to use the set of already
> running smpd's on (e.g.) port 8888 on localhost + others, rather than
> the set of running smpd's on port 9999 on localhost + others?

You have to calculate a portnumber, which I do in my scripts with:

port=$((JOB_ID % 5000 + 20000))

As long as the job turnaround will not spawn 5000 jobs (JOB_ID is given by SGE 
as you can guess), this is safe (unless other programs would use a port in this 
range of course). This portnumber must be used for starting the daemons, running 
the jobs, and the shutdown of the daemons in a proper way (job abort is special, 
see below).

1. For starting the daemons you have to give the option "-p $port" then
   in the rsh command to each node

2. To run the job:
   mpiexec -n $NSLOTS -machinefile $TMPDIR/machines -port $port ~/mpihello

3. To shut down: $MPICH2_ROOT/bin/smpd -port $port -shutdown $host
   loop over all used hosts

But it would be best not to kill only the daemon for a job abort, but the whole 
process group of the job (kill -9 -- -$pid), otherwise in case of an abort the 
job may survive the death of the parent. I also used "-d 0" to get the smpds 
still bound to the SGE daemons for the job and not to let them vanish into 
daemon land. This way I don't have to kill anything by hand, as SGE will take 
care of removing all the processgroup stuff (for this job) from the node. 
Whether it's an intended end of the progrm, or a forced one via "qdel" in the 
middle of the job doesn't matter.

I really suggest to look into SGE, as the prolog and epilog scripts will setup 
the whole smpd universe for the user, who only has to use the correct portnumber 
for his/her job. Also shutdown will be handled by SGE to remove the daemons.

Cheers - Reuti


> > Although it is intended for use with SGE, you can have a look at the
> > Howto for MPICH2 integration and maybe get some ideas for your usage
> > (or use SGE to handle the jobs ;-) ).
> >
> >
> http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-
integration.html
> 
> Thanks, I will check it out.
> -- 
> Benjamin
> 





More information about the mpich-discuss mailing list