[mpich-discuss] mpdboot hangs, but ...

Scott Atchley atchley at myri.com
Mon Dec 14 10:49:44 CST 2009


Dave,

After ^C, it reports:

^[[CTraceback (most recent call last):
   File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/ 
bin/mpdboot", line 476, in ?
     mpdboot()
   File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/ 
bin/mpdboot", line 347, in mpdboot
     handle_mpd_output(fd,fd2idx,hostsAndInfo)
   File "/nfs/home/atchley/projects/mpich2-mx-1.2.1..6/build/shower/ 
bin/mpdboot", line 385, in handle_mpd_output
     for line in fd.readlines():    # handle output from shells that  
echo stuff
KeyboardInterrupt

Let me know if you need more info.

Scott

On Dec 14, 2009, at 11:04 AM, Dave Goodell wrote:

> Thanks Scott.  I'll take another crack at reproducing this locally.
>
> -Dave
>
> On Dec 14, 2009, at 9:37 AM, Scott Atchley wrote:
>
>> Dave,
>>
>> I can reproduce it at will. More info below. I am using two host  
>> (shower03 and shower04) with four cores each. I am launching from  
>> shower03. My hosts file is:
>>
>> % cat hosts.mpd
>> shower03:4
>> shower04:4
>>
>> I am calling mpdboot with:
>>
>> % mpdboot -n 2 -f hosts.mpd --ncpus=4 --mpd=`which mpd` --rsh=ssh -v
>> running mpdallexit on shower04
>> LAUNCHED mpd on shower04  via
>> RUNNING: mpd on shower04
>> LAUNCHED mpd on shower03  via  shower04
>>
>> I have strace and gdb backtrace for mpdboot below. It is still  
>> hung. Let me know if you want backtraces from either of the mpds.
>>
>> Scott
>>
>>
>>
>> (gdb) info threads
>> * 1 Thread 0x2aaaaaab7f90 (LWP 23351)  0x0000003d508c5f00 in  
>> __read_nocancel () from /lib64/libc.so.6
>> (gdb) bt
>> #0  0x0000003d508c5f00 in __read_nocancel () from /lib64/libc.so.6
>> #1  0x0000003d5086b853 in _IO_file_xsgetn_internal () from /lib64/ 
>> libc.so.6
>> #2  0x0000003d50861c82 in fread () from /lib64/libc.so.6
>> #3  0x0000003d51846507 in ?? () from /usr/lib64/libpython2.4.so.1.0
>> #4  0x0000003d5189497a in PyEval_EvalFrame () from /usr/lib64/ 
>> libpython2.4.so.1.0
>> #5  0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/ 
>> libpython2.4.so.1.0
>> #6  0x0000003d51894426 in PyEval_EvalFrame () from /usr/lib64/ 
>> libpython2.4.so.1.0
>> #7  0x0000003d518958a5 in PyEval_EvalCodeEx () from /usr/lib64/ 
>> libpython2.4.so.1.0
>> #8  0x0000003d518958f2 in PyEval_EvalCode () from /usr/lib64/ 
>> libpython2.4.so.1.0
>> #9  0x0000003d518b1f29 in ?? () from /usr/lib64/libpython2.4.so.1.0
>> #10 0x0000003d518b33d8 in PyRun_SimpleFileExFlags () from /usr/ 
>> lib64/libpython2.4.so.1.0
>> #11 0x0000003d518b980d in Py_Main () from /usr/lib64/ 
>> libpython2.4.so.1.0
>> #12 0x0000003d5081d994 in __libc_start_main () from /lib64/libc.so.6
>> #13 0x0000000000400629 in _start ()
>> (gdb)
>>
>>
>>
>> <bt.txt.gz>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list