[mpich-discuss] mpdboot hangs, but ...
Dave Goodell
goodell at mcs.anl.gov
Tue Dec 15 15:49:26 CST 2009
That's good information, Scott. Thanks for passing it along. I have
my suspicions about where the bug is, but we're having some
environment issues on our end getting centos going. So it will be
hard to fix the problem until I can get it reproduced.
I've created a ticket to track this to reduce the on-list traffic when
discussing this. I've added you and Benjamin to the CC list for that
ticket. Anyone else who is interested should feel free to add
themselves to the CC list:
https://trac.mcs.anl.gov/projects/mpich2/ticket/974
-Dave
On Dec 15, 2009, at 7:52 AM, Scott Atchley wrote:
> Dave,
>
> If it matters, I am using Centos5.3. I have not noticed this
> behavior before when using Ubuntu 9.4.
>
> Scott
>
> On Dec 14, 2009, at 11:49 AM, Scott Atchley wrote:
>
>> Dave,
>>
>> After ^C, it reports:
>>
>> ^[[CTraceback (most recent call last):
>> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/
>> bin/mpdboot", line 476, in ?
>> mpdboot()
>> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/
>> bin/mpdboot", line 347, in mpdboot
>> handle_mpd_output(fd,fd2idx,hostsAndInfo)
>> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/
>> bin/mpdboot", line 385, in handle_mpd_output
>> for line in fd.readlines(): # handle output from shells that
>> echo stuff
>> KeyboardInterrupt
>>
>> Let me know if you need more info.
>>
>> Scott
>>
>> On Dec 14, 2009, at 11:04 AM, Dave Goodell wrote:
>>
>>> Thanks Scott. I'll take another crack at reproducing this locally.
>>>
>>> -Dave
>>>
>>> On Dec 14, 2009, at 9:37 AM, Scott Atchley wrote:
>>>
>>>> Dave,
>>>>
>>>> I can reproduce it at will. More info below. I am using two host
>>>> (shower03 and shower04) with four cores each. I am launching from
>>>> shower03. My hosts file is:
>>>>
>>>> % cat hosts.mpd
>>>> shower03:4
>>>> shower04:4
>>>>
>>>> I am calling mpdboot with:
>>>>
>>>> % mpdboot -n 2 -f hosts.mpd --ncpus=4 --mpd=`which mpd` --rsh=ssh
>>>> -v
>>>> running mpdallexit on shower04
>>>> LAUNCHED mpd on shower04 via
>>>> RUNNING: mpd on shower04
>>>> LAUNCHED mpd on shower03 via shower04
>>>>
>>>> I have strace and gdb backtrace for mpdboot below. It is still
>>>> hung. Let me know if you want backtraces from either of the mpds.
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>> (gdb) info threads
>>>> * 1 Thread 0x2aaaaaab7f90 (LWP 23351) 0x0000003d508c5f00 in
>>>> __read_nocancel () from /lib64/libc.so.6
>>>> (gdb) bt
>>>> #0 0x0000003d508c5f00 in __read_nocancel () from /lib64/libc.so.6
>>>> #1 0x0000003d5086b853 in _IO_file_xsgetn_internal () from /lib64/
>>>> libc.so.6
>>>> #2 0x0000003d50861c82 in fread () from /lib64/libc.so.6
>>>> #3 0x0000003d51846507 in ?? () from /usr/lib64/libpython2.4.so.1.0
>>>> #4 0x0000003d5189497a in PyEval_EvalFrame () from /usr/lib64/
>>>> libpython2.4.so.1.0
>>>> #5 0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/
>>>> libpython2.4.so.1.0
>>>> #6 0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/
>>>> libpython2.4.so.1.0
>>>> #7 0x0000003d518958a5 in PyEval_EvalCodeEx () from /usr/lib64/
>>>> libpython2.4.so.1.0
>>>> #8 0x0000003d518958f2 in PyEval_EvalCode () from /usr/lib64/
>>>> libpython2.4.so.1.0
>>>> #9 0x0000003d518b1f29 in ?? () from /usr/lib64/libpython2.4.so.1.0
>>>> #10 0x0000003d518b33d8 in PyRun_SimpleFileExFlags () from /usr/
>>>> lib64/libpython2.4.so.1.0
>>>> #11 0x0000003d518b980d in Py_Main () from /usr/lib64/
>>>> libpython2.4.so.1.0
>>>> #12 0x0000003d5081d994 in __libc_start_main () from /lib64/
>>>> libc.so.6
>>>> #13 0x0000000000400629 in _start ()
>>>> (gdb)
>>>>
>>>>
>>>>
>>>> <bt.txt.gz>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list