[mpich-discuss] mpdboot hangs, but ...

Benjamin Svetitsky bqs at julian.tau.ac.il
Sun Dec 6 08:02:06 CST 2009


Hi Dave,

Thanks for the quick work.  But the problem is still there.  I 
downloaded the file and put it where you said; then I did:
  configure
  make
  make install
and verified that the new copy of mpd.py is in /usr/local/bin.  But
  mpdboot -n 4 -f /root/mpd.hosts
still doesn't exit after starting up the daemons.

		-Ben

Dave Goodell wrote:
> This has been fixed in the trunk.  Anyone who needs a fix in the short 
> term should be able to download the following copy of mpd.py and drop it 
> into src/pm/mpd/ in their MPICH2 source tree (and then re-install MPICH2):
> 
> https://trac.mcs.anl.gov/projects/mpich2/export/5923/mpich2/trunk/src/pm/mpd/mpd.py 
> 
> 
> -Dave
> 
> On Dec 4, 2009, at 9:28 AM, Dave Goodell wrote:
> 
>> Hi Ben,
>>
>> This looks very similar to ticket #963: 
>> https://trac.mcs.anl.gov/projects/mpich2/ticket/963
>>
>> Please feel free to add yourself to the CC list if you would like to 
>> receive progress updates.  Thanks for letting us know that you are 
>> having trouble.
>>
>> Thinking about it this morning, I just had an idea of what might be 
>> going on. I'll spend some time on it today and see if I can reproduce 
>> it and work up a fix.
>>
>> In the mean time, you can either use the hydra process manager (built 
>> by default as "mpiexec.hydra") or copy the mpd.py script from 1.1.1p1 
>> as a workaround.
>>
>> -Dave
>>
>> On Dec 4, 2009, at 5:38 AM, Benjamin Svetitsky wrote:
>>
>>> Hello everybody,
>>>
>>> I just upgraded from mpich2-1.0.8 to mpich2-1.2.1.  When I run
>>>
>>>     mpdboot -n 4 -f /root/mpd.hosts
>>>
>>> as root, the command just hangs until I hit ^C.  Nonetheless, it 
>>> starts the daemons successfully and I can run MPI jobs as usual (so 
>>> far).  A subsequent mpdallexit kills the saemons without complaint
>>>
>>> Details:
>>>
>>> I am running four Intel quad-cores under CentOS:
>>> Linux version 2.6.18-164.6.1.el5.centos.plus
>>> The file /root/mpd.hosts contains:
>>> -- 
>>> nodeA
>>> nodeB
>>> nodeC
>>> nodeD
>>> -- 
>>> and I executed mpdboot on nodeC.
>>> I compiled the MPICH2 source without any config options.
>>> After mpdboot hangs for several minutes and I hit ^C, it responds:
>>> -- 
>>> Traceback (most recent call last):
>>> File "/usr/local/bin/mpdboot", line 476, in ?
>>>   mpdboot()
>>> File "/usr/local/bin/mpdboot", line 347, in mpdboot
>>>   handle_mpd_output(fd,fd2idx,hostsAndInfo)
>>> File "/usr/local/bin/mpdboot", line 385, in handle_mpd_output
>>>   for line in fd.readlines():    # handle output from shells that 
>>> echo stuff
>>> KeyboardInterrupt
>>> -- 
>>> which may be irrelevant.
>>>
>>> Thanks,
>>>     Ben
>>> -- 
>>> Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
>>> School of Physics and Astronomy  Fax:              +972-3-640 7932
>>> Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
>>> 69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Prof. Benjamin Svetitsky         Phone:            +972-3-640 8870
School of Physics and Astronomy  Fax:              +972-3-640 7932
Tel Aviv University              E-mail:      bqs at julian.tau.ac.il
69978 Tel Aviv, Israel           WWW: http://julian.tau.ac.il/~bqs


More information about the mpich-discuss mailing list