[mpich-discuss] MPD in the PBS environment
Anne M. Hammond
hammond at txcorp.com
Thu Feb 12 14:22:10 CST 2009
This must be the problem (in /etc/bashrc):
export MPD_USE_ROOT_MPD=1
On Thu, 12 Feb 2009, Anne M. Hammond wrote:
> Thanks Anthony. Although NNODES was defined, it was incorrect
> number of mpd's to start. This has been fixed.
>
> The mpds are now launching on the nodes that PBS allocates, but the
> mpiexec process is still trying to connect to a root mpd:
>
> [hammond at boron ecrp12NoRing]$ more ecrp12NoRing.log
> mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
> probable cause: no mpd daemon on this machine
> possible cause: unix socket /tmp/mpd2.console_root has been removed
> mpiexec_node12.cl.corp.com (__init__ 1190): forked process failed; status=255
>
> The file /tmp/mpd2.console_hammond exists. Shouldn't mpiexec
> be trying to connect to that socket??
>
>
> On Thu, 12 Feb 2009, Anne M. Hammond wrote:
>
>> Yes. NNODES is set:
>>
>> setenv NNODES `wc $PBS_NODEFILE|awk '{print $1}'`
>>
>>
>> On Thu, 12 Feb 2009, Anthony Chan wrote:
>>
>> >
>> > Did you set NNODES in your PBS script ?
>> >
>> > ----- "Anne M. Hammond" <hammond at txcorp.com> wrote:
>> >
>> > > These are the relevant lines from the qsub file:
>> > >
>> > > sort -u $PBS_NODEFILE > mpd.hosts
>> > > mpdboot -f mpd.hosts -n $NNODES --rsh=/usr/bin/rsh
>> > > mpiexec -machinefile $PBS_NODEFILE -np $NNODES $RUNJOB -i
>> > > $WORK_AREA/$PREFILE/$PREFILE.in -dim 2 -n 100000 -d 10000 >
>> > > $PREFILE.log
>> > > mpdallexit
>> > >
>> > > mpd.hosts:
>> > > node12
>> > > node13
>> > >
>> > > When the ring is not running, this is the error message from the
>> > > PBS job:
>> > >
>> > > mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
>> > > probable cause: no mpd daemon on this machine
>> > > possible cause: unix socket /tmp/mpd2.console_root has been
>> > > removed
>> > > mpiexec_node12.cl.corp.com (__init__ 1190): forked process failed;
>> > > status=255
>> > >
>> > > Do you have to have a persistent ring booted in order to use mpd
>> > > from PBS? Or is my qsub script incorrect?
>> > >
>> > > Thanks in advance,
>> > > Anne
>> >
>> >
>>
>>
>
>
--
Anne M. Hammond - Systems / Network Administration - Tech-X Corp
hammond_at_txcorp.com 720-974-1840
More information about the mpich-discuss
mailing list