[mpich-discuss] mpdboot hangs, but ...
Dave Goodell
goodell at mcs.anl.gov
Wed Dec 9 13:30:11 CST 2009
The regular user hang might be because you don't have a proper
~/.mpd.conf when running as the regular user. Can you try just
running "mpd &" as the regular user and see if you get any error
messages out?
Appendix A of this document also can be helpful for mpd problems: http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.2.1-installguide.pdf
If you CTRL-C the hung mpdboot process when running as a regular user,
what traceback to you get?
-Dave
On Dec 6, 2009, at 9:19 AM, Benjamin Svetitsky wrote:
> If I run mpdboot as a regular user then it hangs WITHOUT starting up
> the daemons. -Ben
>
> Rajeev Thakur wrote:
>> Are you running as root? Can you try running it as regular user
>> first?
>> Rajeev
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov
>>> ] On Behalf Of Benjamin Svetitsky
>>> Sent: Sunday, December 06, 2009 8:02 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] mpdboot hangs, but ...
>>>
>>> Hi Dave,
>>>
>>> Thanks for the quick work. But the problem is still there. I
>>> downloaded the file and put it where you said; then I did:
>>> configure
>>> make
>>> make install
>>> and verified that the new copy of mpd.py is in /usr/local/bin. But
>>> mpdboot -n 4 -f /root/mpd.hosts
>>> still doesn't exit after starting up the daemons.
>>>
>>> -Ben
>>>
>>> Dave Goodell wrote:
>>>> This has been fixed in the trunk. Anyone who needs a fix
>>> in the short
>>>> term should be able to download the following copy of
>>> mpd.py and drop
>>>> it into src/pm/mpd/ in their MPICH2 source tree (and then
>>> re-install MPICH2):
>>>>
>>> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/
>>>> pm/mpd/mpd.py
>>>>
>>>>
>>>> -Dave
>>>>
>>>> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
>>>>
>>>>> Hi Ben,
>>>>>
>>>>> This looks very similar to ticket #963: https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>>>>
>>>>> Please feel free to add yourself to the CC list if you
>>> would like to
>>>>> receive progress updates. Thanks for letting us know that you
>>>>> are having trouble.
>>>>>
>>>>> Thinking about it this morning, I just had an idea of what
>>> might be
>>>>> going on. I'll spend some time on it today and see if I
>>> can reproduce
>>>>> it and work up a fix.
>>>>>
>>>>> In the mean time, you can either use the hydra process
>>> manager (built
>>>>> by default as "mpiexec.hydra") or copy the mpd.py script
>>> from 1.1.1p1
>>>>> as a workaround.
>>>>>
>>>>> -Dave
>>>>>
>>>>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>>>>
>>>>>> Hello everybody,
>>>>>>
>>>>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1. When I run
>>>>>>
>>>>>> mpdboot -n 4 -f /root/mpd.hosts
>>>>>>
>>>>>> as root, the command just hangs until I hit ^C. Nonetheless,
>>>>>> it starts the daemons successfully and I can run MPI jobs as
>>> usual (so
>>>>>> far). A subsequent mpdallexit kills the saemons without
>>>>>> complaint
>>>>>>
>>>>>> Details:
>>>>>>
>>>>>> I am running four Intel quad-cores under CentOS:
>>>>>> Linux version 2.6.18-164.6.1.el5.centos.plus The file /root/
>>>>>> mpd.hosts contains:
>>>>>> --
>>>>>> nodeA
>>>>>> nodeB
>>>>>> nodeC
>>>>>> nodeD
>>>>>> --
>>>>>> and I executed mpdboot on nodeC.
>>>>>> I compiled the MPICH2 source without any config options.
>>>>>> After mpdboot hangs for several minutes and I hit ^C, it
>>>>>> responds:
>>>>>> --
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>>>>> mpdboot()
>>>>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>>>>> handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>>>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>>>>> for line in fd.readlines(): # handle output from shells
>>>>>> that echo stuff
>>>>>> KeyboardInterrupt
>>>>>> --
>>>>>> which may be irrelevant.
>>>>>>
>>>>>> Thanks,
>>>>>> Ben
>>>>>> --
>>>>>> Prof. Benjamin Svetitsky Phone: +972-3-640
>>>>>> 8870
>>>>>> School of Physics and Astronomy Fax: +972-3-640
>>>>>> 7932
>>>>>> Tel Aviv University E-mail:
>>>>>> bqs at julian.tau.ac.il
>>>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/
>>>>>> ~bqs
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> --
>>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>>> School of Physics and Astronomy Fax: +972-3-640 7932
>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
> School of Physics and Astronomy Fax: +972-3-640 7932
> Tel Aviv University E-mail: bqs at julian.tau.ac.il
> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list