[mpich-discuss] mpdboot hangs, but ...
Benjamin Svetitsky
bqs at julian.tau.ac.il
Sun Dec 6 08:02:06 CST 2009
Hi Dave,
Thanks for the quick work. But the problem is still there. I
downloaded the file and put it where you said; then I did:
configure
make
make install
and verified that the new copy of mpd.py is in /usr/local/bin. But
mpdboot -n 4 -f /root/mpd.hosts
still doesn't exit after starting up the daemons.
-Ben
Dave Goodell wrote:
> This has been fixed in the trunk. Anyone who needs a fix in the short
> term should be able to download the following copy of mpd.py and drop it
> into src/pm/mpd/ in their MPICH2 source tree (and then re-install MPICH2):
>
> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/pm/mpd/mpd.py
>
>
> -Dave
>
> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
>
>> Hi Ben,
>>
>> This looks very similar to ticket #963:
>> https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>
>> Please feel free to add yourself to the CC list if you would like to
>> receive progress updates. Thanks for letting us know that you are
>> having trouble.
>>
>> Thinking about it this morning, I just had an idea of what might be
>> going on. I'll spend some time on it today and see if I can reproduce
>> it and work up a fix.
>>
>> In the mean time, you can either use the hydra process manager (built
>> by default as "mpiexec.hydra") or copy the mpd.py script from 1.1.1p1
>> as a workaround.
>>
>> -Dave
>>
>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>
>>> Hello everybody,
>>>
>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1. When I run
>>>
>>> mpdboot -n 4 -f /root/mpd.hosts
>>>
>>> as root, the command just hangs until I hit ^C. Nonetheless, it
>>> starts the daemons successfully and I can run MPI jobs as usual (so
>>> far). A subsequent mpdallexit kills the saemons without complaint
>>>
>>> Details:
>>>
>>> I am running four Intel quad-cores under CentOS:
>>> Linux version 2.6.18-164.6.1.el5.centos.plus
>>> The file /root/mpd.hosts contains:
>>> --
>>> nodeA
>>> nodeB
>>> nodeC
>>> nodeD
>>> --
>>> and I executed mpdboot on nodeC.
>>> I compiled the MPICH2 source without any config options.
>>> After mpdboot hangs for several minutes and I hit ^C, it responds:
>>> --
>>> Traceback (most recent call last):
>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>> mpdboot()
>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>> handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>> for line in fd.readlines(): # handle output from shells that
>>> echo stuff
>>> KeyboardInterrupt
>>> --
>>> which may be irrelevant.
>>>
>>> Thanks,
>>> Ben
>>> --
>>> Prof. Benjamin Svetitsky Phone: +972-3-640 8870
>>> School of Physics and Astronomy Fax: +972-3-640 7932
>>> Tel Aviv University E-mail: bqs at julian.tau.ac.il
>>> 69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Prof. Benjamin Svetitsky Phone: +972-3-640 8870
School of Physics and Astronomy Fax: +972-3-640 7932
Tel Aviv University E-mail: bqs at julian.tau.ac.il
69978 Tel Aviv, Israel WWW: http://julian.tau.ac.il/~bqs
More information about the mpich-discuss
mailing list