[mpich-discuss] mpdboot hangs, but ...
Benjamin Svetitsky
bqs at julian.tau.ac.il
Thu Dec 10 09:33:22 CST 2009
Quite right. I had a line in my ~/.mpd.conf that said
MPD_USE_ROOT_MPD=1
When I changed the 1 to 0 I was able to run mpdboot as a regular user.
The result was exactly the same as when I ran mpdboot as root: It
started the daemons and then hung until I hit ^C, whereupon the
traceback was the same as noted below. After the ^C I was able to run
MPI jobs.
Ben
Dave Goodell wrote:
> The regular user hang might be because you don't have a proper
> ~/.mpd.conf when running as the regular user. Can you try just running
> "mpd &" as the regular user and see if you get any error messages out?
>
> Appendix A of this document also can be helpful for mpd problems:
> http://www.mcs.anl.gov/research/projects/mpich2/documentation/files/mpich2-1.2.1-installguide.pdf
>
>
> If you CTRL-C the hung mpdboot process when running as a regular user,
> what traceback to you get?
>
> -Dave
>
> On Dec 6, 2009, at 9:19 AM, Benjamin Svetitsky wrote:
>
>> If I run mpdboot as a regular user then it hangs WITHOUT starting up
>> the daemons. -Ben
>>
>> Rajeev Thakur wrote:
>>> Are you running as root? Can you try running it as regular user first?
>>> Rajeev
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Benjamin
>>>> Svetitsky
>>>> Sent: Sunday, December 06, 2009 8:02 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] mpdboot hangs, but ...
>>>>
>>>> Hi Dave,
>>>>
>>>> Thanks for the quick work. But the problem is still there. I
>>>> downloaded the file and put it where you said; then I did:
>>>> configure
>>>> make
>>>> make install
>>>> and verified that the new copy of mpd.py is in /usr/local/bin. But
>>>> mpdboot -n 4 -f /root/mpd.hosts
>>>> still doesn't exit after starting up the daemons.
>>>>
>>>> -Ben
>>>>
>>>> Dave Goodell wrote:
>>>>> This has been fixed in the trunk. Anyone who needs a fix
>>>> in the short
>>>>> term should be able to download the following copy of
>>>> mpd.py and drop
>>>>> it into src/pm/mpd/ in their MPICH2 source tree (and then
>>>> re-install MPICH2):
>>>>>
>>>> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/
>>>>> pm/mpd/mpd.py
>>>>>
>>>>>
>>>>> -Dave
>>>>>
>>>>> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
>>>>>
>>>>>> Hi Ben,
>>>>>>
>>>>>> This looks very similar to ticket #963:
>>>>>> https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>>>>>
>>>>>> Please feel free to add yourself to the CC list if you
>>>> would like to
>>>>>> receive progress updates. Thanks for letting us know that you are
>>>>>> having trouble.
>>>>>>
>>>>>> Thinking about it this morning, I just had an idea of what
>>>> might be
>>>>>> going on. I'll spend some time on it today and see if I
>>>> can reproduce
>>>>>> it and work up a fix.
>>>>>>
>>>>>> In the mean time, you can either use the hydra process
>>>> manager (built
>>>>>> by default as "mpiexec.hydra") or copy the mpd.py script
>>>> from 1.1.1p1
>>>>>> as a workaround.
>>>>>>
>>>>>> -Dave
>>>>>>
>>>>>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>>>>>
>>>>>>> Hello everybody,
>>>>>>>
>>>>>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1. When I run
>>>>>>>
>>>>>>> mpdboot -n 4 -f /root/mpd.hosts
>>>>>>>
>>>>>>> as root, the command just hangs until I hit ^C. Nonetheless, it
>>>>>>> starts the daemons successfully and I can run MPI jobs as
>>>> usual (so
>>>>>>> far). A subsequent mpdallexit kills the saemons without complaint
>>>>>>>
>>>>>>> Details:
>>>>>>>
>>>>>>> I am running four Intel quad-cores under CentOS:
>>>>>>> Linux version 2.6.18-164.6.1.el5.centos.plus The file
>>>>>>> /root/mpd.hosts contains:
>>>>>>> --
>>>>>>> nodeA
>>>>>>> nodeB
>>>>>>> nodeC
>>>>>>> nodeD
>>>>>>> --
>>>>>>> and I executed mpdboot on nodeC.
>>>>>>> I compiled the MPICH2 source without any config options.
>>>>>>> After mpdboot hangs for several minutes and I hit ^C, it responds:
>>>>>>> --
>>>>>>> Traceback (most recent call last):
>>>>>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>>>>>> mpdboot()
>>>>>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>>>>>> handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>>>>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>>>>>> for line in fd.readlines(): # handle output from shells that
>>>>>>> echo stuff
>>>>>>> KeyboardInterrupt
>>>>>>> --
>>>>>>> which may be irrelevant.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ben
>>>>>>> --
>>>>>>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>>>>>>> School of Physics and Astronomy Fax: +972-3-640 7932
>>>>>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>>>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list
>>>>>>> mpich-discuss at mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> --
>>>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>>>> School of Physics and Astronomy Fax: +972-3-640 7932
>>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> --
>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>> School of Physics and Astronomy Fax: +972-3-640 7932
>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Prof. Benjamin Svetitsky Phone: +972-3-640 8870
School of Physics and Astronomy Fax: +972-3-640 7932
Tel Aviv University E-mail: bqs at julian.tau.ac.il
69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
More information about the mpich-discuss
mailing list