[mpich-discuss] mpdboot hangs, but ...

Benjamin Svetitsky bqs at julian.tau.ac.il
Thu Dec 10 09:33:22 CST 2009


Quite right.  I had a line in my ~/.mpd.conf that said
MPD_USE_ROOT_MPD=1

When I changed the 1 to 0 I was able to run mpdboot as a regular user. 
The result was exactly the same as when I ran mpdboot as root:  It 
started the daemons and then hung until I hit ^C, whereupon the 
traceback was the same as noted below.  After the ^C I was able to run 
MPI jobs.

			Ben


Dave Goodell wrote:
> The regular user hang might be because you don't have a proper 
> ~/.mpd.conf when running as the regular user.  Can you try just running 
> "mpd &" as the regular user and see if you get any error messages out?
> 
> Appendix A of this document also can be helpful for mpd problems: 
> http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.2.1-installguide.pdf 
> 
> 
> If you CTRL-C the hung mpdboot process when running as a regular user, 
> what traceback to you get?
> 
> -Dave
> 
> On Dec 6, 2009, at 9:19 AM, Benjamin Svetitsky wrote:
> 
>> If I run mpdboot as a regular user then it hangs WITHOUT starting up 
>> the daemons.    -Ben
>>
>> Rajeev Thakur wrote:
>>> Are you running as root? Can you try running it as regular user first?
>>> Rajeev
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov 
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Benjamin 
>>>> Svetitsky
>>>> Sent: Sunday, December 06, 2009 8:02 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] mpdboot hangs, but ...
>>>>
>>>> Hi Dave,
>>>>
>>>> Thanks for the quick work.  But the problem is still there.  I 
>>>> downloaded the file and put it where you said; then I did:
>>>>  configure
>>>>  make
>>>>  make install
>>>> and verified that the new copy of mpd.py is in /usr/local/bin.  But
>>>>  mpdboot -n 4 -f /root/mpd.hosts
>>>> still doesn't exit after starting up the daemons.
>>>>
>>>>         -Ben
>>>>
>>>> Dave Goodell wrote:
>>>>> This has been fixed in the trunk.  Anyone who needs a fix
>>>> in the short
>>>>> term should be able to download the following copy of
>>>> mpd.py and drop
>>>>> it into src/pm/mpd/ in their MPICH2 source tree (and then
>>>> re-install MPICH2):
>>>>>
>>>> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/
>>>>> pm/mpd/mpd.py
>>>>>
>>>>>
>>>>> -Dave
>>>>>
>>>>> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
>>>>>
>>>>>> Hi Ben,
>>>>>>
>>>>>> This looks very similar to ticket #963: 
>>>>>> https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>>>>>
>>>>>> Please feel free to add yourself to the CC list if you
>>>> would like to
>>>>>> receive progress updates.  Thanks for letting us know that you are 
>>>>>> having trouble.
>>>>>>
>>>>>> Thinking about it this morning, I just had an idea of what
>>>> might be
>>>>>> going on. I'll spend some time on it today and see if I
>>>> can reproduce
>>>>>> it and work up a fix.
>>>>>>
>>>>>> In the mean time, you can either use the hydra process
>>>> manager (built
>>>>>> by default as "mpiexec.hydra") or copy the mpd.py script
>>>> from 1.1.1p1
>>>>>> as a workaround.
>>>>>>
>>>>>> -Dave
>>>>>>
>>>>>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>>>>>
>>>>>>> Hello everybody,
>>>>>>>
>>>>>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1.  When I run
>>>>>>>
>>>>>>>    mpdboot -n 4 -f /root/mpd.hosts
>>>>>>>
>>>>>>> as root, the command just hangs until I hit ^C.  Nonetheless, it 
>>>>>>> starts the daemons successfully and I can run MPI jobs as
>>>> usual (so
>>>>>>> far).  A subsequent mpdallexit kills the saemons without complaint
>>>>>>>
>>>>>>> Details:
>>>>>>>
>>>>>>> I am running four Intel quad-cores under CentOS:
>>>>>>> Linux version 2.6.18-164.6.1.el5.centos.plus The file 
>>>>>>> /root/mpd.hosts contains:
>>>>>>> -- 
>>>>>>> nodeA
>>>>>>> nodeB
>>>>>>> nodeC
>>>>>>> nodeD
>>>>>>> -- 
>>>>>>> and I executed mpdboot on nodeC.
>>>>>>> I compiled the MPICH2 source without any config options.
>>>>>>> After mpdboot hangs for several minutes and I hit ^C, it responds:
>>>>>>> -- 
>>>>>>> Traceback (most recent call last):
>>>>>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>>>>>>  mpdboot()
>>>>>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>>>>>>  handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>>>>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>>>>>>  for line in fd.readlines():    # handle output from shells that 
>>>>>>> echo stuff
>>>>>>> KeyboardInterrupt
>>>>>>> -- 
>>>>>>> which may be irrelevant.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>    Ben
>>>>>>> -- 
>>>>>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>>>>>>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>>>>>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>>>>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> -- 
>>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>>>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> -- 
>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
School of Physics and Astronomy  Fax:              +972-3-640 7932
Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs


More information about the mpich-discuss mailing list