[ExM Users] debugging suggestions for non-static main-wrap segfault

Justin M Wozniak wozniak at mcs.anl.gov
Thu Jul 31 14:56:16 CDT 2014


Can you check in vanilla-g++ ?

On 07/31/2014 02:29 PM, Ketan Maheshwari wrote:
> Yes indeed, I am loading from a shared library which is causing 
> segfault. I tested this with a single line tcl as you suggested:
>
> load ./libdock_wrap.so
>
> $ tclsh8.5 test.tcl
> Segmentation fault (core dumped)
>
> I do not know why should this happen and possible root cause. This is 
> how the .so is generated:
>
> g++ -O2 -shared -o libdock_wrap.so extension.o  dock_wrap.o 
> objfiles/*.o -L /home/ketan/tcl-install/lib -ltcl8.5 -lm -lpthread 
> -Wl,-rpath -Wl,/home/ketan/tcl-install/lib
>
> Where the objfiles/*.o are the object files required by the 
> application. These object files are generated with the application's 
> config, make except that I added the -fPIC compilation flag as 
> required for generating shared lib.
>
> Do you see anything suspicious in the above line by any chance?
>
> Thanks,
> Ketan
>
>
>
>
> On Tue, Jul 29, 2014 at 4:20 PM, Tim Armstrong 
> <tim.g.armstrong at gmail.com <mailto:tim.g.armstrong at gmail.com>> wrote:
>
>     The next logical step would be to look at what is actually
>     happening when you are loading the package.  I don't know exactly
>     how the package is set up.  However, you can look in pkgIndex.tcl
>     to see what commands are run (separated by newlines) to load the
>     package.  The stack trace also told use that it happened in
>     Tcl_LoadObjCmd, so it's probably happened in a load command.  E.g.
>     in the turbine pkgIndex.tcl you have this: [list load [file join
>     $dir libtclturbine.so]]
>
>     Are you loading the library from a shared library?  There appear
>     to be multiple ways to load a library.
>
>     If you extract that out into a runnable Tcl file and edit paths
>     according you'll probably have an even more minimal example, e.g.
>
>     load "./libwhatever.so"
>
>     - Tim
>
>
>
>     On Tue, Jul 29, 2014 at 4:01 PM, Ketan Maheshwari
>     <ketan at mcs.anl.gov <mailto:ketan at mcs.anl.gov>> wrote:
>
>         I tried a minimal tcl and find the segfault occurs at:
>
>         package require leaf_main 0.0
>
>
>
>         On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong
>         <tim.g.armstrong at gmail.com <mailto:tim.g.armstrong at gmail.com>>
>         wrote:
>
>             Well, anyway, let's backtrack.  The stacktrace already
>             told us that the segfault is happening in a package
>             require statement.
>
>             I compiled apps/dock/user-code.swift and looked at the
>             code. There are two package requires:
>
>             package require turbine 0.5.0
>             package require leaf_main 0.0
>
>             They are up the top before anything else really runs.  So
>             if the problem is in loading one of those packages,
>             whatever happens later is irrelevant.
>
>             So how about just running user-code.tcl, or even creating
>             a minimal tcl file with those two package require lines.
>
>             You may need to set TCLLIBPATH (a space-separated list:
>             http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to
>             the directories with the turbine and leaf_main
>             pkgIndex.tcl files.
>
>             - Tim
>
>
>             On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari
>             <ketan at mcs.anl.gov <mailto:ketan at mcs.anl.gov>> wrote:
>
>                 We are trying to narrow down the cause of segfault by
>                 running the tcl out of turbine thus getting rid of the
>                 swift/T/tcl and turbine script. I suppose this is the
>                 tcl script that gets invoked which in turn invokes the
>                 application.
>
>
>                 On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong
>                 <tim.g.armstrong at gmail.com
>                 <mailto:tim.g.armstrong at gmail.com>> wrote:
>
>                     proc just defines the functions. You need to call
>                     them for it to run.
>
>                     What are we trying to achieve by running this file
>                     anyway? This look like a set of library functions
>                     rather than the entry point for a script.
>
>                     - Tim
>
>
>                     On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari
>                     <ketan at mcs.anl.gov <mailto:ketan at mcs.anl.gov>> wrote:
>
>                         Here is the tcl script with puts messages:
>
>                         package provide leaf_main 0.0
>
>                         # dnl Receive USER_LEAF from environment for
>                         m4 processing
>                         set USER_LEAF dock_wrap
>                         puts hello1
>
>                         namespace eval leaf_main {
>                         puts hello2
>
>                             proc leaf_main_wrap { rc A } {
>                         deeprule $A 1 0
>                         "leaf_main::leaf_main_wrap_impl $rc $A" type
>                         $::turbine::WORK
>                             }
>
>                             proc leaf_main_wrap_impl { rc A } {
>
>                         global USER_LEAF
>
>                         set length [ adlb::container_size $A ]
>                         set tds [ adlb::enumerate $A dict all 0 ]
>                         set argv [ list ]
>
>                         puts hello3
>
>                                 # Fill argv with blanks
>                         dict for { i v } $tds {
>                           lappend argv 0
>                                 }
>                                 # Set values at ordered list positions
>                         dict for { i v } $tds {
>                           lset argv $i $v
>                                 }
>                         set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>                         turbine::store_integer $rc $rc_value
>                         puts hello4
>                             }
>                             puts hello5
>                         }
>
>
>
>
>                         It prints:
>
>                         hello1
>                         hello2
>                         hello5
>
>                         I see that it is not going in the
>                         proc_leaf_main_wrap_impl but I am not familiar
>                         enough with TCL to understand why.
>
>
>
>                         On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong
>                         <tim.g.armstrong at gmail.com
>                         <mailto:tim.g.armstrong at gmail.com>> wrote:
>
>                             I don't see any reason why that invocation
>                             of tclsh would silently fail to run the
>                             tcl script.  Have you attempted to confirm
>                             your hypothesis that it's not running the
>                             script, for example by modifying the
>                             script to print something at the beginning
>                             or end?
>
>
>                             On Tue, Jul 29, 2014 at 1:42 PM, Ketan
>                             Maheshwari <ketan at mcs.anl.gov
>                             <mailto:ketan at mcs.anl.gov>> wrote:
>
>                                 I expect it to run the application or
>                                 crash on segfault. Nothing happens.
>
>
>
>                                 On Tue, Jul 29, 2014 at 1:39 PM, Tim
>                                 Armstrong <tim.g.armstrong at gmail.com
>                                 <mailto:tim.g.armstrong at gmail.com>> wrote:
>
>                                     That looks right, it should run
>                                     dock_wrap.tcl fine.  And it runs
>                                     successfully to completion with no
>                                     output?  Is that what you expected
>                                     it to do?
>
>                                     Backtracking to your original
>                                     problem, if you could work out
>                                     which "package require" statement
>                                     was failing and provide some info
>                                     about that package it might help
>                                     understand the issue.
>
>                                     - Tim
>
>
>                                     On Tue, Jul 29, 2014 at 1:32 PM,
>                                     Ketan Maheshwari
>                                     <ketan at mcs.anl.gov
>                                     <mailto:ketan at mcs.anl.gov>> wrote:
>
>                                         I run tclsh as follows:
>
>                                         /home/ketan/tcl-install/bin/tclsh8.5
>                                         dock_wrap.tcl -i rigid.in
>                                         <http://rigid.in>
>
>                                         and
>
>                                         mpiexec -n 3
>                                         /home/ketan/tcl-install/bin/tclsh8.5
>                                         dock_wrap.tcl -i rigid.in
>                                         <http://rigid.in>
>
>
>                                         On Tue, Jul 29, 2014 at 1:28
>                                         PM, Tim Armstrong
>                                         <tim.g.armstrong at gmail.com
>                                         <mailto:tim.g.armstrong at gmail.com>>
>                                         wrote:
>
>                                             I forgot to reply all
>                                             earlier, re-including the
>                                             list.
>
>                                             How are you running tclsh?
>
>
>                                             On Tue, Jul 29, 2014 at
>                                             11:53 AM, Ketan Maheshwari
>                                             <ketan at mcs.anl.gov
>                                             <mailto:ketan at mcs.anl.gov>> wrote:
>
>                                                 when I try tclsh, it
>                                                 does not do anything.
>                                                 Just returns with an
>                                                 exit status 0.
>
>
>                                                 On Tue, Jul 29, 2014
>                                                 at 11:02 AM, Tim
>                                                 Armstrong
>                                                 <tim.g.armstrong at gmail.com
>                                                 <mailto:tim.g.armstrong at gmail.com>>
>                                                 wrote:
>
>                                                     You can run it
>                                                     directly with
>                                                     tclsh or mpiexec
>                                                     tclsh, which is
>                                                     what turbine
>                                                     eventually does
>                                                     after setting up
>                                                     environment
>                                                     variables, etc.
>
>                                                     - Tim
>
>
>                                                     On Tue, Jul 29,
>                                                     2014 at 10:57 AM,
>                                                     Ketan Maheshwari
>                                                     <ketan at mcs.anl.gov
>                                                     <mailto:ketan at mcs.anl.gov>>
>                                                     wrote:
>
>                                                         Is it possible
>                                                         to run the
>                                                         dock_wrap.tcl
>                                                         outside of
>                                                         turbine just
>                                                         as in the case
>                                                         of static build?
>
>
>
>
>                                                         On Tue, Jul
>                                                         29, 2014 at
>                                                         10:45 AM,
>                                                         Wozniak,
>                                                         Justin M.
>                                                         <wozniak at mcs.anl.gov
>                                                         <mailto:wozniak at mcs.anl.gov>>
>                                                         wrote:
>
>
>                                                             Ok, it's
>                                                             in. The
>                                                             Swift/K
>                                                             SVN is
>                                                             apparently
>                                                             down so
>                                                             it's not
>                                                             on the web
>                                                             yet but
>                                                             see the
>                                                             asciidoc.
>
>                                                             On
>                                                             07/29/2014
>                                                             10:21 AM,
>                                                             Justin M
>                                                             Wozniak wrote:
>>
>>                                                             I thought
>>                                                             VALGRIND
>>                                                             was in
>>                                                             the
>>                                                             manual
>>                                                             already
>>                                                             but it
>>                                                             isn't.  I
>>                                                             will add
>>                                                             it now. 
>>                                                             I will
>>                                                             also talk
>>                                                             about our
>>                                                             GDB feature.
>>
>>                                                             On
>>                                                             07/29/2014 10:17
>>                                                             AM, Ketan
>>                                                             Maheshwari wrote:
>>>                                                             Thanks!
>>>                                                             Seems
>>>                                                             turbine
>>>                                                             script
>>>                                                             already
>>>                                                             had a
>>>                                                             placeholder
>>>                                                             for
>>>                                                             Valgrind
>>>                                                             so I
>>>                                                             tried
>>>                                                             that and
>>>                                                             from the
>>>                                                             output,
>>>                                                             it seems
>>>                                                             tcl
>>>                                                             libraries are
>>>                                                             causing
>>>                                                             segfault
>>>                                                             but I
>>>                                                             may be
>>>                                                             wrong.
>>>                                                             Attached
>>>                                                             is the
>>>                                                             Valgrind
>>>                                                             output.
>>>
>>>
>>>
>>>                                                             On Tue,
>>>                                                             Jul 29,
>>>                                                             2014 at
>>>                                                             10:05
>>>                                                             AM, Tim
>>>                                                             Armstrong <tim.g.armstrong at gmail.com
>>>                                                             <mailto:tim.g.armstrong at gmail.com>>
>>>                                                             wrote:
>>>
>>>                                                                 I
>>>                                                                 don't have
>>>                                                                 any
>>>                                                                 particular
>>>                                                                 insight
>>>                                                                 into
>>>                                                                 the
>>>                                                                 cause of
>>>                                                                 the
>>>                                                                 segfault,
>>>                                                                 I
>>>                                                                 can
>>>                                                                 help
>>>                                                                 with
>>>                                                                 the
>>>                                                                 debugger
>>>                                                                 though.
>>>
>>>                                                                 You
>>>                                                                 need
>>>                                                                 to
>>>                                                                 point gdb
>>>                                                                 at
>>>                                                                 the
>>>                                                                 tclsh that
>>>                                                                 is
>>>                                                                 being used
>>>                                                                 by
>>>                                                                 turbine
>>>                                                                 (which
>>>                                                                 is
>>>                                                                 just
>>>                                                                 a
>>>                                                                 shell script). 
>>>                                                                 You
>>>                                                                 can
>>>                                                                 locate
>>>                                                                 the
>>>                                                                 correct
>>>                                                                 tclsh by
>>>                                                                 looking
>>>                                                                 at
>>>                                                                 TCLSH in
>>>                                                                 scripts/turbine-config.sh
>>>                                                                 in
>>>                                                                 the
>>>                                                                 turbine
>>>                                                                 install
>>>                                                                 directory.
>>>
>>>                                                                 - TIm
>>>
>>>
>>>                                                                 On
>>>                                                                 Tue,
>>>                                                                 Jul
>>>                                                                 29,
>>>                                                                 2014
>>>                                                                 at
>>>                                                                 10:00 AM,
>>>                                                                 Ketan Maheshwari
>>>                                                                 <ketan at mcs.anl.gov
>>>                                                                 <mailto:ketan at mcs.anl.gov>>
>>>                                                                 wrote:
>>>
>>>                                                                     Hi,
>>>
>>>                                                                     Trying
>>>                                                                     to
>>>                                                                     main-wrap
>>>                                                                     DOCK
>>>                                                                     6.6
>>>                                                                     application
>>>                                                                     for
>>>                                                                     ATPESC,
>>>                                                                     I get
>>>                                                                     the
>>>                                                                     build
>>>                                                                     right
>>>                                                                     (seems)
>>>                                                                     but
>>>                                                                     things
>>>                                                                     fail
>>>                                                                     at
>>>                                                                     runtime
>>>                                                                     giving
>>>                                                                     segfault:
>>>
>>>                                                                     $ turbine
>>>                                                                     -n
>>>                                                                     4 user-code.tcl
>>>
>>>                                                                     ===================================================================================
>>>                                                                     =  
>>>                                                                     BAD
>>>                                                                     TERMINATION
>>>                                                                     OF
>>>                                                                     ONE
>>>                                                                     OF
>>>                                                                     YOUR
>>>                                                                     APPLICATION
>>>                                                                     PROCESSES
>>>                                                                     =  
>>>                                                                     EXIT
>>>                                                                     CODE:
>>>                                                                     139
>>>                                                                     = CLEANING
>>>                                                                     UP
>>>                                                                     REMAINING
>>>                                                                     PROCESSES
>>>                                                                     =  
>>>                                                                     YOU
>>>                                                                     CAN
>>>                                                                     IGNORE
>>>                                                                     THE
>>>                                                                     BELOW
>>>                                                                     CLEANUP
>>>                                                                     MESSAGES
>>>                                                                     ===================================================================================
>>>                                                                     YOUR
>>>                                                                     APPLICATION
>>>                                                                     TERMINATED
>>>                                                                     WITH
>>>                                                                     THE
>>>                                                                     EXIT
>>>                                                                     STRING:
>>>                                                                     Segmentation
>>>                                                                     fault
>>>                                                                     (signal
>>>                                                                     11)
>>>                                                                     This
>>>                                                                     typically
>>>                                                                     refers
>>>                                                                     to
>>>                                                                     a problem
>>>                                                                     with
>>>                                                                     your
>>>                                                                     application.
>>>                                                                     Please
>>>                                                                     see
>>>                                                                     the
>>>                                                                     FAQ
>>>                                                                     page
>>>                                                                     for
>>>                                                                     debugging
>>>                                                                     suggestions
>>>
>>>                                                                     This
>>>                                                                     is
>>>                                                                     on
>>>                                                                     MCS
>>>                                                                     machine.
>>>                                                                     Any
>>>                                                                     suggestion
>>>                                                                     to
>>>                                                                     debug
>>>                                                                     this?
>>>                                                                     I tried
>>>                                                                     gdb
>>>                                                                     but
>>>                                                                     it
>>>                                                                     gives:
>>>
>>>                                                                      "/nfs2/ketan/exm-install/turbine/bin/turbine":
>>>                                                                     not
>>>                                                                     in
>>>                                                                     executable
>>>                                                                     format:
>>>                                                                     File
>>>                                                                     format
>>>                                                                     not
>>>                                                                     recognized
>>>
>>>                                                                     With
>>>                                                                     strace,
>>>                                                                     I see
>>>                                                                     some
>>>                                                                     signs
>>>                                                                     of
>>>                                                                     missing
>>>                                                                     files
>>>                                                                     but
>>>                                                                     not
>>>                                                                     sure
>>>                                                                     if
>>>                                                                     that
>>>                                                                     is
>>>                                                                     the
>>>                                                                     cause
>>>                                                                     of
>>>                                                                     segfault.
>>>                                                                     Attached
>>>                                                                     is
>>>                                                                     the
>>>                                                                     strace
>>>                                                                     output
>>>                                                                     of:
>>>
>>>                                                                     strace
>>>                                                                     -o
>>>                                                                     strace.out
>>>                                                                     turbine
>>>                                                                     -n
>>>                                                                     4 user-code.tcl
>>>
>>>                                                                     The
>>>                                                                     code
>>>                                                                     has
>>>                                                                     some
>>>                                                                     MPI
>>>                                                                     and
>>>                                                                     pthread
>>>                                                                     elements
>>>                                                                     but
>>>                                                                     does
>>>                                                                     not
>>>                                                                     use
>>>                                                                     them
>>>                                                                     as
>>>                                                                     far
>>>                                                                     as
>>>                                                                     I understand.
>>>
>>>                                                                     Thanks
>>>                                                                     for
>>>                                                                     any
>>>                                                                     suggestions.
>>>
>>>                                                                     --
>>>                                                                     Ketan
>>>
>>>                                                                     _______________________________________________
>>>                                                                     ExM-user
>>>                                                                     mailing
>>>                                                                     list
>>>                                                                     ExM-user at lists.mcs.anl.gov
>>>                                                                     <mailto:ExM-user at lists.mcs.anl.gov>
>>>                                                                     https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>
>>>
>>>
>>>
>>>
>>>                                                             _______________________________________________
>>>                                                             ExM-user mailing list
>>>                                                             ExM-user at lists.mcs.anl.gov  <mailto:ExM-user at lists.mcs.anl.gov>
>>>                                                             https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>
>>
>>                                                             -- 
>>                                                             Justin M Wozniak
>
>
>                                                             -- 
>                                                             Justin M Wozniak
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> ExM-user mailing list
> ExM-user at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/exm-user


-- 
Justin M Wozniak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140731/03b7302b/attachment-0001.html>


More information about the ExM-user mailing list