[mpich-discuss] mpdboot hangs, but ...

Dave Goodell goodell at mcs.anl.gov
Mon Dec 14 09:10:39 CST 2009


Unfortunately, I haven't been able to reproduce this hang here when  
using the updated version of mpd.py that I posted earlier, which makes  
it difficult to debug and fix your problem.

I would recommend trying the hydra process manager instead.  It will  
be the default process manager in the next release and is where we are  
putting most of our PM development effort.

http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager

If you really need MPD specifically, I might be able to figure out  
what's going on from some "lsof" output on your system together with a  
tweaked mpd.py/mpdboot.py.  But we should probably open a ticket to  
track this at that point.

-Dave

On Dec 10, 2009, at 9:33 AM, Benjamin Svetitsky wrote:

> Quite right.  I had a line in my ~/.mpd.conf that said
> MPD_USE_ROOT_MPD=1
>
> When I changed the 1 to 0 I was able to run mpdboot as a regular  
> user. The result was exactly the same as when I ran mpdboot as  
> root:  It started the daemons and then hung until I hit ^C,  
> whereupon the traceback was the same as noted below.  After the ^C I  
> was able to run MPI jobs.
>
> 			Ben
>
>
> Dave Goodell wrote:
>> The regular user hang might be because you don't have a proper  
>> ~/.mpd.conf when running as the regular user.  Can you try just  
>> running "mpd &" as the regular user and see if you get any error  
>> messages out?
>> Appendix A of this document also can be helpful for mpd problems: http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.2.1-installguide.pdf 
>>  If you CTRL-C the hung mpdboot process when running as a regular  
>> user, what traceback to you get?
>> -Dave
>> On Dec 6, 2009, at 9:19 AM, Benjamin Svetitsky wrote:
>>> If I run mpdboot as a regular user then it hangs WITHOUT starting  
>>> up the daemons.    -Ben
>>>
>>> Rajeev Thakur wrote:
>>>> Are you running as root? Can you try running it as regular user  
>>>> first?
>>>> Rajeev
>>>>> -----Original Message-----
>>>>> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov 
>>>>> ] On Behalf Of Benjamin Svetitsky
>>>>> Sent: Sunday, December 06, 2009 8:02 AM
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [mpich-discuss] mpdboot hangs, but ...
>>>>>
>>>>> Hi Dave,
>>>>>
>>>>> Thanks for the quick work.  But the problem is still there.  I  
>>>>> downloaded the file and put it where you said; then I did:
>>>>> configure
>>>>> make
>>>>> make install
>>>>> and verified that the new copy of mpd.py is in /usr/local/bin.   
>>>>> But
>>>>> mpdboot -n 4 -f /root/mpd.hosts
>>>>> still doesn't exit after starting up the daemons.
>>>>>
>>>>>        -Ben
>>>>>
>>>>> Dave Goodell wrote:
>>>>>> This has been fixed in the trunk.  Anyone who needs a fix
>>>>> in the short
>>>>>> term should be able to download the following copy of
>>>>> mpd.py and drop
>>>>>> it into src/pm/mpd/ in their MPICH2 source tree (and then
>>>>> re-install MPICH2):
>>>>>>
>>>>> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/
>>>>>> pm/mpd/mpd.py
>>>>>>
>>>>>>
>>>>>> -Dave
>>>>>>
>>>>>> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
>>>>>>
>>>>>>> Hi Ben,
>>>>>>>
>>>>>>> This looks very similar to ticket #963: https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>>>>>>
>>>>>>> Please feel free to add yourself to the CC list if you
>>>>> would like to
>>>>>>> receive progress updates.  Thanks for letting us know that you  
>>>>>>> are having trouble.
>>>>>>>
>>>>>>> Thinking about it this morning, I just had an idea of what
>>>>> might be
>>>>>>> going on. I'll spend some time on it today and see if I
>>>>> can reproduce
>>>>>>> it and work up a fix.
>>>>>>>
>>>>>>> In the mean time, you can either use the hydra process
>>>>> manager (built
>>>>>>> by default as "mpiexec.hydra") or copy the mpd.py script
>>>>> from 1.1.1p1
>>>>>>> as a workaround.
>>>>>>>
>>>>>>> -Dave
>>>>>>>
>>>>>>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>>>>>>
>>>>>>>> Hello everybody,
>>>>>>>>
>>>>>>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1.  When I run
>>>>>>>>
>>>>>>>>   mpdboot -n 4 -f /root/mpd.hosts
>>>>>>>>
>>>>>>>> as root, the command just hangs until I hit ^C.  Nonetheless,  
>>>>>>>> it starts the daemons successfully and I can run MPI jobs as
>>>>> usual (so
>>>>>>>> far).  A subsequent mpdallexit kills the saemons without  
>>>>>>>> complaint
>>>>>>>>
>>>>>>>> Details:
>>>>>>>>
>>>>>>>> I am running four Intel quad-cores under CentOS:
>>>>>>>> Linux version 2.6.18-164.6.1.el5.centos.plus The file /root/ 
>>>>>>>> mpd.hosts contains:
>>>>>>>> -- 
>>>>>>>> nodeA
>>>>>>>> nodeB
>>>>>>>> nodeC
>>>>>>>> nodeD
>>>>>>>> -- 
>>>>>>>> and I executed mpdboot on nodeC.
>>>>>>>> I compiled the MPICH2 source without any config options.
>>>>>>>> After mpdboot hangs for several minutes and I hit ^C, it  
>>>>>>>> responds:
>>>>>>>> -- 
>>>>>>>> Traceback (most recent call last):
>>>>>>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>>>>>>> mpdboot()
>>>>>>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>>>>>>> handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>>>>>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>>>>>>> for line in fd.readlines():    # handle output from shells  
>>>>>>>> that echo stuff
>>>>>>>> KeyboardInterrupt
>>>>>>>> -- 
>>>>>>>> which may be irrelevant.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>   Ben
>>>>>>>> -- 
>>>>>>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640  
>>>>>>>> 8870
>>>>>>>> School of Physics and Astronomy  Fax:              +972-3-640  
>>>>>>>> 7932
>>>>>>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>>>>>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> -- 
>>>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>>>>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>>>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> -- 
>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> -- 
> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
> School of Physics and Astronomy  Fax:              +972-3-640 7932
> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list