[mpich-discuss] mpdboot hangs, but ...

Dave Goodell goodell at mcs.anl.gov
Tue Dec 15 15:49:26 CST 2009


That's good information, Scott.  Thanks for passing it along.  I have  
my suspicions about where the bug is, but we're having some  
environment issues on our end getting centos going.  So it will be  
hard to fix the problem until I can get it reproduced.

I've created a ticket to track this to reduce the on-list traffic when  
discussing this.  I've added you and Benjamin to the CC list for that  
ticket. Anyone else who is interested should feel free to add  
themselves to the CC list:

https://trac.mcs.anl.gov/projects/mpich2/ticket/974

-Dave

On Dec 15, 2009, at 7:52 AM, Scott Atchley wrote:

> Dave,
>
> If it matters, I am using Centos5.3. I have not noticed this  
> behavior before when using Ubuntu 9.4.
>
> Scott
>
> On Dec 14, 2009, at 11:49 AM, Scott Atchley wrote:
>
>> Dave,
>>
>> After ^C, it reports:
>>
>> ^[[CTraceback (most recent call last):
>> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/ 
>> bin/mpdboot", line 476, in ?
>>   mpdboot()
>> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/ 
>> bin/mpdboot", line 347, in mpdboot
>>   handle_mpd_output(fd,fd2idx,hostsAndInfo)
>> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/ 
>> bin/mpdboot", line 385, in handle_mpd_output
>>   for line in fd.readlines():    # handle output from shells that  
>> echo stuff
>> KeyboardInterrupt
>>
>> Let me know if you need more info.
>>
>> Scott
>>
>> On Dec 14, 2009, at 11:04 AM, Dave Goodell wrote:
>>
>>> Thanks Scott.  I'll take another crack at reproducing this locally.
>>>
>>> -Dave
>>>
>>> On Dec 14, 2009, at 9:37 AM, Scott Atchley wrote:
>>>
>>>> Dave,
>>>>
>>>> I can reproduce it at will. More info below. I am using two host  
>>>> (shower03 and shower04) with four cores each. I am launching from  
>>>> shower03. My hosts file is:
>>>>
>>>> % cat hosts.mpd
>>>> shower03:4
>>>> shower04:4
>>>>
>>>> I am calling mpdboot with:
>>>>
>>>> % mpdboot -n 2 -f hosts.mpd --ncpus=4 --mpd=`which mpd` --rsh=ssh  
>>>> -v
>>>> running mpdallexit on shower04
>>>> LAUNCHED mpd on shower04  via
>>>> RUNNING: mpd on shower04
>>>> LAUNCHED mpd on shower03  via  shower04
>>>>
>>>> I have strace and gdb backtrace for mpdboot below. It is still  
>>>> hung. Let me know if you want backtraces from either of the mpds.
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>> (gdb) info threads
>>>> * 1 Thread 0x2aaaaaab7f90 (LWP 23351)  0x0000003d508c5f00 in  
>>>> __read_nocancel () from /lib64/libc.so.6
>>>> (gdb) bt
>>>> #0  0x0000003d508c5f00 in __read_nocancel () from /lib64/libc.so.6
>>>> #1  0x0000003d5086b853 in _IO_file_xsgetn_internal () from /lib64/ 
>>>> libc.so.6
>>>> #2  0x0000003d50861c82 in fread () from /lib64/libc.so.6
>>>> #3  0x0000003d51846507 in ?? () from /usr/lib64/libpython2.4.so.1.0
>>>> #4  0x0000003d5189497a in PyEval_EvalFrame () from /usr/lib64/ 
>>>> libpython2.4.so.1.0
>>>> #5  0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/ 
>>>> libpython2.4.so.1.0
>>>> #6  0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/ 
>>>> libpython2.4.so.1.0
>>>> #7  0x0000003d518958a5 in PyEval_EvalCodeEx () from /usr/lib64/ 
>>>> libpython2.4.so.1.0
>>>> #8  0x0000003d518958f2 in PyEval_EvalCode () from /usr/lib64/ 
>>>> libpython2.4.so.1.0
>>>> #9  0x0000003d518b1f29 in ?? () from /usr/lib64/libpython2.4.so.1.0
>>>> #10 0x0000003d518b33d8 in PyRun_SimpleFileExFlags () from /usr/ 
>>>> lib64/libpython2.4.so.1.0
>>>> #11 0x0000003d518b980d in Py_Main () from /usr/lib64/ 
>>>> libpython2.4.so.1.0
>>>> #12 0x0000003d5081d994 in __libc_start_main () from /lib64/ 
>>>> libc.so.6
>>>> #13 0x0000000000400629 in _start ()
>>>> (gdb)
>>>>
>>>>
>>>>
>>>> <bt.txt.gz>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list