[mpich-discuss] MPD in the PBS environment

Anne M. Hammond hammond at txcorp.com
Wed Feb 11 23:04:53 CST 2009


We booted a ring on our cluster using mpdboot.  But
we are having some problems with the ring being persistent.

So I wanted to have the PBS job boot a ring for the individual
job.

We are using the directions in 5.7.1 of the MPICH2 User Guide.

These are the relevant lines from the qsub file:

sort -u $PBS_NODEFILE > mpd.hosts
mpdboot -f mpd.hosts -n $NNODES --rsh=/usr/bin/rsh
mpiexec -machinefile $PBS_NODEFILE -np $NNODES $RUNJOB -i $WORK_AREA/$PREFILE/$PREFILE.in -dim 2 -n 100000 -d 10000 > $PREFILE.log
mpdallexit

mpd.hosts:
node12
node13

When the ring is not running, this is the error message from the
PBS job:

mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
     probable cause:  no mpd daemon on this machine
     possible cause:  unix socket /tmp/mpd2.console_root has been removed
mpiexec_node12.cl.corp.com (__init__ 1190): forked process failed; 
status=255

Do you have to have a persistent ring booted in order to use mpd
from PBS?  Or is my qsub script incorrect?

Thanks in advance,
Anne






More information about the mpich-discuss mailing list