[mpich-discuss] mpdboot hangs, but ...
Scott Atchley
atchley at myri.com
Mon Dec 14 09:17:57 CST 2009
Dave,
It encountered this problem on Friday as well. I hit ^C and the MPDs
were running. I should have attached and gotten a stacktrace.
If I encounter it again this week, I will try to get more info.
Scott
On Dec 14, 2009, at 10:10 AM, Dave Goodell wrote:
> Unfortunately, I haven't been able to reproduce this hang here when
> using the updated version of mpd.py that I posted earlier, which
> makes it difficult to debug and fix your problem.
>
> I would recommend trying the hydra process manager instead. It will
> be the default process manager in the next release and is where we
> are putting most of our PM development effort.
>
> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
>
> If you really need MPD specifically, I might be able to figure out
> what's going on from some "lsof" output on your system together with
> a tweaked mpd.py/mpdboot.py. But we should probably open a ticket
> to track this at that point.
>
> -Dave
>
> On Dec 10, 2009, at 9:33 AM, Benjamin Svetitsky wrote:
>
>> Quite right. I had a line in my ~/.mpd.conf that said
>> MPD_USE_ROOT_MPD=1
>>
>> When I changed the 1 to 0 I was able to run mpdboot as a regular
>> user. The result was exactly the same as when I ran mpdboot as
>> root: It started the daemons and then hung until I hit ^C,
>> whereupon the traceback was the same as noted below. After the ^C
>> I was able to run MPI jobs.
>>
>> Ben
>>
>>
>> Dave Goodell wrote:
>>> The regular user hang might be because you don't have a proper
>>> ~/.mpd.conf when running as the regular user. Can you try just
>>> running "mpd &" as the regular user and see if you get any error
>>> messages out?
>>> Appendix A of this document also can be helpful for mpd problems: http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.2.1-installguide.pdf
>>> If you CTRL-C the hung mpdboot process when running as a regular
>>> user, what traceback to you get?
>>> -Dave
>>> On Dec 6, 2009, at 9:19 AM, Benjamin Svetitsky wrote:
>>>> If I run mpdboot as a regular user then it hangs WITHOUT starting
>>>> up the daemons. -Ben
>>>>
>>>> Rajeev Thakur wrote:
>>>>> Are you running as root? Can you try running it as regular user
>>>>> first?
>>>>> Rajeev
>>>>>> -----Original Message-----
>>>>>> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov
>>>>>> ] On Behalf Of Benjamin Svetitsky
>>>>>> Sent: Sunday, December 06, 2009 8:02 AM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: Re: [mpich-discuss] mpdboot hangs, but ...
>>>>>>
>>>>>> Hi Dave,
>>>>>>
>>>>>> Thanks for the quick work. But the problem is still there. I
>>>>>> downloaded the file and put it where you said; then I did:
>>>>>> configure
>>>>>> make
>>>>>> make install
>>>>>> and verified that the new copy of mpd.py is in /usr/local/bin.
>>>>>> But
>>>>>> mpdboot -n 4 -f /root/mpd.hosts
>>>>>> still doesn't exit after starting up the daemons.
>>>>>>
>>>>>> -Ben
>>>>>>
>>>>>> Dave Goodell wrote:
>>>>>>> This has been fixed in the trunk. Anyone who needs a fix
>>>>>> in the short
>>>>>>> term should be able to download the following copy of
>>>>>> mpd.py and drop
>>>>>>> it into src/pm/mpd/ in their MPICH2 source tree (and then
>>>>>> re-install MPICH2):
>>>>>>>
>>>>>> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/
>>>>>>> pm/mpd/mpd.py
>>>>>>>
>>>>>>>
>>>>>>> -Dave
>>>>>>>
>>>>>>> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
>>>>>>>
>>>>>>>> Hi Ben,
>>>>>>>>
>>>>>>>> This looks very similar to ticket #963: https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>>>>>>>
>>>>>>>> Please feel free to add yourself to the CC list if you
>>>>>> would like to
>>>>>>>> receive progress updates. Thanks for letting us know that
>>>>>>>> you are having trouble.
>>>>>>>>
>>>>>>>> Thinking about it this morning, I just had an idea of what
>>>>>> might be
>>>>>>>> going on. I'll spend some time on it today and see if I
>>>>>> can reproduce
>>>>>>>> it and work up a fix.
>>>>>>>>
>>>>>>>> In the mean time, you can either use the hydra process
>>>>>> manager (built
>>>>>>>> by default as "mpiexec.hydra") or copy the mpd.py script
>>>>>> from 1.1.1p1
>>>>>>>> as a workaround.
>>>>>>>>
>>>>>>>> -Dave
>>>>>>>>
>>>>>>>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>>>>>>>
>>>>>>>>> Hello everybody,
>>>>>>>>>
>>>>>>>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1. When I run
>>>>>>>>>
>>>>>>>>> mpdboot -n 4 -f /root/mpd.hosts
>>>>>>>>>
>>>>>>>>> as root, the command just hangs until I hit ^C.
>>>>>>>>> Nonetheless, it starts the daemons successfully and I can
>>>>>>>>> run MPI jobs as
>>>>>> usual (so
>>>>>>>>> far). A subsequent mpdallexit kills the saemons without
>>>>>>>>> complaint
>>>>>>>>>
>>>>>>>>> Details:
>>>>>>>>>
>>>>>>>>> I am running four Intel quad-cores under CentOS:
>>>>>>>>> Linux version 2.6.18-164.6.1.el5.centos.plus The file /root/
>>>>>>>>> mpd.hosts contains:
>>>>>>>>> --
>>>>>>>>> nodeA
>>>>>>>>> nodeB
>>>>>>>>> nodeC
>>>>>>>>> nodeD
>>>>>>>>> --
>>>>>>>>> and I executed mpdboot on nodeC.
>>>>>>>>> I compiled the MPICH2 source without any config options.
>>>>>>>>> After mpdboot hangs for several minutes and I hit ^C, it
>>>>>>>>> responds:
>>>>>>>>> --
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>>>>>>>> mpdboot()
>>>>>>>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>>>>>>>> handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>>>>>>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>>>>>>>> for line in fd.readlines(): # handle output from shells
>>>>>>>>> that echo stuff
>>>>>>>>> KeyboardInterrupt
>>>>>>>>> --
>>>>>>>>> which may be irrelevant.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ben
>>>>>>>>> --
>>>>>>>>> Prof. Benjamin Svetitsky Phone:
>>>>>>>>> +972-3-640 8870
>>>>>>>>> School of Physics and Astronomy Fax:
>>>>>>>>> +972-3-640 7932
>>>>>>>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>>>>>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list
>>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list
>>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> --
>>>>>> Prof. Benjamin Svetitsky Phone: +972-3-640
>>>>>> 8870
>>>>>> School of Physics and Astronomy Fax: +972-3-640
>>>>>> 7932
>>>>>> Tel Aviv University E-mail:
>>>>>> bqs at julian.tau.ac.il
>>>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/
>>>>>> ~bqs
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> --
>>>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>>>> School of Physics and Astronomy Fax: +972-3-640 7932
>>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> --
>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>> School of Physics and Astronomy Fax: +972-3-640 7932
>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list