[mpich-discuss] mpdboot hangs, but ...
Scott Atchley
atchley at myri.com
Tue Dec 15 07:52:50 CST 2009
Dave,
If it matters, I am using Centos5.3. I have not noticed this behavior
before when using Ubuntu 9.4.
Scott
On Dec 14, 2009, at 11:49 AM, Scott Atchley wrote:
> Dave,
>
> After ^C, it reports:
>
> ^[[CTraceback (most recent call last):
> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/
> bin/mpdboot", line 476, in ?
> mpdboot()
> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/
> bin/mpdboot", line 347, in mpdboot
> handle_mpd_output(fd,fd2idx,hostsAndInfo)
> File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/
> bin/mpdboot", line 385, in handle_mpd_output
> for line in fd.readlines(): # handle output from shells that
> echo stuff
> KeyboardInterrupt
>
> Let me know if you need more info.
>
> Scott
>
> On Dec 14, 2009, at 11:04 AM, Dave Goodell wrote:
>
>> Thanks Scott. I'll take another crack at reproducing this locally.
>>
>> -Dave
>>
>> On Dec 14, 2009, at 9:37 AM, Scott Atchley wrote:
>>
>>> Dave,
>>>
>>> I can reproduce it at will. More info below. I am using two host
>>> (shower03 and shower04) with four cores each. I am launching from
>>> shower03. My hosts file is:
>>>
>>> % cat hosts.mpd
>>> shower03:4
>>> shower04:4
>>>
>>> I am calling mpdboot with:
>>>
>>> % mpdboot -n 2 -f hosts.mpd --ncpus=4 --mpd=`which mpd` --rsh=ssh -v
>>> running mpdallexit on shower04
>>> LAUNCHED mpd on shower04 via
>>> RUNNING: mpd on shower04
>>> LAUNCHED mpd on shower03 via shower04
>>>
>>> I have strace and gdb backtrace for mpdboot below. It is still
>>> hung. Let me know if you want backtraces from either of the mpds.
>>>
>>> Scott
>>>
>>>
>>>
>>> (gdb) info threads
>>> * 1 Thread 0x2aaaaaab7f90 (LWP 23351) 0x0000003d508c5f00 in
>>> __read_nocancel () from /lib64/libc.so.6
>>> (gdb) bt
>>> #0 0x0000003d508c5f00 in __read_nocancel () from /lib64/libc.so.6
>>> #1 0x0000003d5086b853 in _IO_file_xsgetn_internal () from /lib64/
>>> libc.so.6
>>> #2 0x0000003d50861c82 in fread () from /lib64/libc.so.6
>>> #3 0x0000003d51846507 in ?? () from /usr/lib64/libpython2.4.so.1.0
>>> #4 0x0000003d5189497a in PyEval_EvalFrame () from /usr/lib64/
>>> libpython2.4.so.1.0
>>> #5 0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/
>>> libpython2.4.so.1.0
>>> #6 0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/
>>> libpython2.4.so.1.0
>>> #7 0x0000003d518958a5 in PyEval_EvalCodeEx () from /usr/lib64/
>>> libpython2.4.so.1.0
>>> #8 0x0000003d518958f2 in PyEval_EvalCode () from /usr/lib64/
>>> libpython2.4.so.1.0
>>> #9 0x0000003d518b1f29 in ?? () from /usr/lib64/libpython2.4.so.1.0
>>> #10 0x0000003d518b33d8 in PyRun_SimpleFileExFlags () from /usr/
>>> lib64/libpython2.4.so.1.0
>>> #11 0x0000003d518b980d in Py_Main () from /usr/lib64/
>>> libpython2.4.so.1.0
>>> #12 0x0000003d5081d994 in __libc_start_main () from /lib64/libc.so.6
>>> #13 0x0000000000400629 in _start ()
>>> (gdb)
>>>
>>>
>>>
>>> <bt.txt.gz>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list