[MPICH] MPICH2 startup w/ PBS

Ralph M. Butler rbutler at mtsu.edu
Thu Apr 6 19:42:16 CDT 2006


Hi Jeff:

I have seen bits of this conversation for a few days and decided to jump in.
There seem to be points of confusion on both sides of the conversation.  You seem 
somewhat confused by a couple of mpdboot options and we are (I am) a bit confused
about exactly what your needs are.  So, first I will describe a bit about mpdboot.
Then, I will make a few assumptions about your needs and provide a sample script
to handle them.  (I have actually run this script on my cluster faking the presence
of PBS.)

I believe that you indicated that each of your hosts is a dual-cpu system and that
PBS is therefore providing you N hosts in the PBS_NODEFILE and N*2 procs to run on
those hosts.

mpdboot's help message tells us that the --ncpus option is ONLY used on the local
machine; it says that the values for ncpus for all other hosts are specified in
the mpd.hosts file (or file specified via the -f option).

mpdboot ALWAYS starts the ring on the local host and hooks other hosts into the ring.
If you want to actually run processes on hosts other than the local one, the -1 (one)
option to mpiexec causes the first process to be started on the next host in the mpd
ring after the local host.  So, let's assume that you have a ring of 4 mpds, one of
which is on the local host.  Let's further assume that you want to run 6 processes
on the 3 hosts other than the local one.  This implies that you want 2 per node probably
because each one is a dual processor.  There are lots of ways to accomplish this
including using the -machinefile option to mpiexec.  However, for a relatively simple
solution, you can do something like the script below.

I explain the script here:

1.  Setup fake values for PBS_NODEFILE and EXE just to mimic what I think PBS may have
placed in your environment.
2.  Calculate numnodes to be the number of hosts in the pbs node file.
3.  Calculate ringsize to be one bigger than the number of nodes because we will have
one mpd running locally but will ot run processes under it.
4.  Create a temprorary file of host names for mpdboot.  Use the hostnames from the 
pbs node file, but place a :2 after each name because each host is a dual-proc.
5.  Use mpdboot to start the mpd ring.
6.  Delete the temporary file.
7.  Compute the number of processes to be the number of hosts to be 2* the number of
hosts in the pbs node file.  You may actually get this value from somewhere else (e.g.
an env var named NP), but I assume that it will ot be more than 2* the number of hosts.
8.  Use mpiexec to run the user pgm.
9.  Use mpdallexit to bring down the mpd ring.

---- the script:

#!/bin/sh

PBS_NODEFILE=temphostfile
EXE=/bin/hostname

numnodes=`cat $PBS_NODEFILE | wc -l`
ringsize=$(($numnodes+1))
awk '{ print $1 ":2" }' < $PBS_NODEFILE > tempnodefile
mpdboot -f tempnodefile -n $ringsize
rm tempnodefile
numprocs=$((numnodes*2))
mpiexec -1 -l -n $numprocs $EXE
mpdallexit

> Good morning,
>
>  I hate to bother everyone early in the morning, but I'm
> looking for some advice on MPICH2 startup. I've been starting
> an mpd on each node in the cluster via,
> 
> mpdboot -n 25 -f /home/jlayton/mpd.hosts
> 
> where the file mpd.hosts contains a list of all possible hosts.
> So I'm basically starting mpd on every node. Then I run the
> code using mpiexec
> 
> mpiexec -machinefile ${PBS_NODEFILE} -n ${NP} ./${EXE}
> 
> and run mpdallexit after the code is finished to stop all of the
> mpds. Notice that I'm using PBS for queuing/scheduling.
>  This is something of a pain, because we lose nodes for
> various projects or training so I'm constantly having to go into
> the list of hosts and edit it. I also have to change the count on
> the mpdboot command.
>  Is there a better way to start up MPICH2 codes using PBS?
> 
> Thanks!
> 
> Jeff




More information about the mpich-discuss mailing list