[mpich-discuss] mpdboot error

Dave Goodell goodell at mcs.anl.gov
Fri Apr 23 13:21:21 CDT 2010


It's good to hear that it was just a bad mpdboot line.

As for your further troubles with mpd, please see this page: http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q 
:_My_MPD_ring_won.27t_start.2C_what.27s_wrong.3F

Particularly, I would recommend using hydra if you don't specifically  
need MPD.

-Dave

On Apr 23, 2010, at 1:15 PM, Jacob Harvey wrote:

> Dave and MPICH2 users,
>
> Thanks for your timely response. I did actually have an error in my
> invocation of mpdboot. I was missing a space on my mpdboot line (doh).
> ie. I had:
>
> mpdboot -n 2-f mpd.hosts
>
> But this has actually caused me to run into some other weird problems.
> I want to run on 2 nodes with 2 procs each (4 tot). But my PBS script
> seems to hang on booting the mpd ring. So for instance the main part
> of my PBS script looks like this ($NODES = 2):
>
> echo 'Building the MPD ring'
> $MPI_HOME/bin/mpdboot -n $NODES -f $PBS_NODEFILE -r ssh
> echo ' '
>
> echo 'Inspecting if all MPI nodes have been activated'
> $MPI_HOME/bin/mpdtrace -l
> echo ' '
>
> echo 'Checking the connectivity'
> $MPI_HOME/bin/mpdringtest 100
> echo ' '
>
> echo 'Running my code in parallel'
> echo ' '
>
> if [[ `uname -i` == "x86_64" ]]; then
>     mpirun -np 4 ~/dlpoly/dl_poly_2.20_i86/execute/DLPOLY.X
> else
>     mpirun -np 4 ~/dlpoly/dl_poly_2.20_i386/execute/DLPOLY.X
> fi
>
> Yet all I get is the 'Building the mpd ring' line with no further
> output (from either the PBS script or the program I am running) in my
> output file. But while the job is "running" I can ssh to the nodes
> involved in the mpd ring and can see the mpd ring running (either
> seeing the process in top, or mpdtrace -l). What seems to force
> mpdboot to move on is specifying more hosts than are available (ie.
> "mpdboot -n 4 -f mpd.hosts" when only 2 nodes are used). In that case
> I get the typical "Too many hosts" error but the mpd ring is formed
> and the job runs as expected. Like I said I've used this same script
> on another one of our clusters and it works just fine and I've gone
> through the troubleshooting section in the manual but did not find an
> error.
>
> Any suggestions on why the mpd ring is hanging?
>
> Jacob
>
> On Fri, Apr 23, 2010 at 1:16 PM, Dave Goodell <goodell at mcs.anl.gov>  
> wrote:
>> Hi Jacob,
>>
>> Can you post your mpdboot invocation?  There is probably either a  
>> mistake in
>> that line or a bug in mpdboot.
>>
>> -Dave
>>
>> On Apr 23, 2010, at 11:41 AM, Jacob Harvey wrote:
>>
>>> MPICH2 users,
>>>
>>> Has anyone seen this error from mpdboot before?
>>>
>>> Traceback (most recent call last):
>>>  File "/opt/mpich2-1.2.1-i86/bin/mpdboot", line 476, in ?
>>>   mpdboot()
>>>  File "/opt/mpich2-1.2.1-i86/bin/mpdboot", line 158, in mpdboot
>>>   totalnumToStart = int(argv[argidx+1])
>>> ValueError: invalid literal for int(): 2-f
>>>
>>> Its odd because I only get this error when I try to set up an mpd  
>>> ring
>>> for calculations in a PBS script. But if I try to set up an mpd ring
>>> from the command line then the ring forms just fine. Plus I've gone
>>> through the debugging MPD rings in the manual and did not turn up  
>>> any
>>> errors in doing so. Another puzzling piece of information is that I
>>> use the same exact PBS script on a different cluster and the mpd  
>>> ring
>>> forms just fine. So I'm confused as to what is the problem here. Any
>>> thoughts? Thanks in advance!
>>>
>>> Jacob
>>>
>>> --
>>> --
>>> Jacob Harvey
>>>
>>> Graduate Student
>>>
>>> University of Massachusetts Amherst
>>>
>>> j.harv8 at gmail.com
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>
>
>
> -- 
> --
> Jacob Harvey
>
> Graduate Student
>
> University of Massachusetts Amherst
>
> j.harv8 at gmail.com
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list