[mpich-discuss] MPD in the PBS environment
Anne M. Hammond
hammond at txcorp.com
Wed Feb 11 23:04:53 CST 2009
We booted a ring on our cluster using mpdboot. But
we are having some problems with the ring being persistent.
So I wanted to have the PBS job boot a ring for the individual
job.
We are using the directions in 5.7.1 of the MPICH2 User Guide.
These are the relevant lines from the qsub file:
sort -u $PBS_NODEFILE > mpd.hosts
mpdboot -f mpd.hosts -n $NNODES --rsh=/usr/bin/rsh
mpiexec -machinefile $PBS_NODEFILE -np $NNODES $RUNJOB -i $WORK_AREA/$PREFILE/$PREFILE.in -dim 2 -n 100000 -d 10000 > $PREFILE.log
mpdallexit
mpd.hosts:
node12
node13
When the ring is not running, this is the error message from the
PBS job:
mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/mpd2.console_root has been removed
mpiexec_node12.cl.corp.com (__init__ 1190): forked process failed;
status=255
Do you have to have a persistent ring booted in order to use mpd
from PBS? Or is my qsub script incorrect?
Thanks in advance,
Anne
More information about the mpich-discuss
mailing list