[mpich-discuss] mpdboot hangs, but ...

Dave Goodell goodell at mcs.anl.gov
Wed Dec 9 13:30:11 CST 2009


The regular user hang might be because you don't have a proper  
~/.mpd.conf when running as the regular user.  Can you try just  
running "mpd &" as the regular user and see if you get any error  
messages out?

Appendix A of this document also can be helpful for mpd problems: http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.2.1-installguide.pdf

If you CTRL-C the hung mpdboot process when running as a regular user,  
what traceback to you get?

-Dave

On Dec 6, 2009, at 9:19 AM, Benjamin Svetitsky wrote:

> If I run mpdboot as a regular user then it hangs WITHOUT starting up  
> the daemons.    -Ben
>
> Rajeev Thakur wrote:
>> Are you running as root? Can you try running it as regular user  
>> first?
>> Rajeev
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov 
>>> ] On Behalf Of Benjamin Svetitsky
>>> Sent: Sunday, December 06, 2009 8:02 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] mpdboot hangs, but ...
>>>
>>> Hi Dave,
>>>
>>> Thanks for the quick work.  But the problem is still there.  I  
>>> downloaded the file and put it where you said; then I did:
>>>  configure
>>>  make
>>>  make install
>>> and verified that the new copy of mpd.py is in /usr/local/bin.  But
>>>  mpdboot -n 4 -f /root/mpd.hosts
>>> still doesn't exit after starting up the daemons.
>>>
>>> 		-Ben
>>>
>>> Dave Goodell wrote:
>>>> This has been fixed in the trunk.  Anyone who needs a fix
>>> in the short
>>>> term should be able to download the following copy of
>>> mpd.py and drop
>>>> it into src/pm/mpd/ in their MPICH2 source tree (and then
>>> re-install MPICH2):
>>>>
>>> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/
>>>> pm/mpd/mpd.py
>>>>
>>>>
>>>> -Dave
>>>>
>>>> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
>>>>
>>>>> Hi Ben,
>>>>>
>>>>> This looks very similar to ticket #963: https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>>>>
>>>>> Please feel free to add yourself to the CC list if you
>>> would like to
>>>>> receive progress updates.  Thanks for letting us know that you  
>>>>> are having trouble.
>>>>>
>>>>> Thinking about it this morning, I just had an idea of what
>>> might be
>>>>> going on. I'll spend some time on it today and see if I
>>> can reproduce
>>>>> it and work up a fix.
>>>>>
>>>>> In the mean time, you can either use the hydra process
>>> manager (built
>>>>> by default as "mpiexec.hydra") or copy the mpd.py script
>>> from 1.1.1p1
>>>>> as a workaround.
>>>>>
>>>>> -Dave
>>>>>
>>>>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>>>>
>>>>>> Hello everybody,
>>>>>>
>>>>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1.  When I run
>>>>>>
>>>>>>    mpdboot -n 4 -f /root/mpd.hosts
>>>>>>
>>>>>> as root, the command just hangs until I hit ^C.  Nonetheless,  
>>>>>> it starts the daemons successfully and I can run MPI jobs as
>>> usual (so
>>>>>> far).  A subsequent mpdallexit kills the saemons without  
>>>>>> complaint
>>>>>>
>>>>>> Details:
>>>>>>
>>>>>> I am running four Intel quad-cores under CentOS:
>>>>>> Linux version 2.6.18-164.6.1.el5.centos.plus The file /root/ 
>>>>>> mpd.hosts contains:
>>>>>> --
>>>>>> nodeA
>>>>>> nodeB
>>>>>> nodeC
>>>>>> nodeD
>>>>>> --
>>>>>> and I executed mpdboot on nodeC.
>>>>>> I compiled the MPICH2 source without any config options.
>>>>>> After mpdboot hangs for several minutes and I hit ^C, it  
>>>>>> responds:
>>>>>> --
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>>>>>  mpdboot()
>>>>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>>>>>  handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>>>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>>>>>  for line in fd.readlines():    # handle output from shells  
>>>>>> that echo stuff
>>>>>> KeyboardInterrupt
>>>>>> --
>>>>>> which may be irrelevant.
>>>>>>
>>>>>> Thanks,
>>>>>>    Ben
>>>>>> -- 
>>>>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640  
>>>>>> 8870
>>>>>> School of Physics and Astronomy  Fax:              +972-3-640  
>>>>>> 7932
>>>>>> Tel Aviv University              E-mail:       
>>>>>> bqs at julian.tau.ac.il
>>>>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/ 
>>>>>> ~bqs
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> -- 
>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> -- 
> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
> School of Physics and Astronomy  Fax:              +972-3-640 7932
> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list