[ExM Users] debugging suggestions for non-static main-wrap segfault

Ketan Maheshwari ketan at mcs.anl.gov
Thu Jul 31 14:57:02 CDT 2014


After some Googling, it looks like one possible cause is symbol name
collision among various libraries. In this case I think it is between
tcl*.so and libdock_wrap.so .. I will investigate more.


On Thu, Jul 31, 2014 at 2:45 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
wrote:

>   I'm not really sure, but there's nothing unusual about that output.
> The valgrind output earlier suggested it was jumping to an invalid
> address.  It might be interesting to know what that address is, e.g. if
> it's in the shared object code, or just a random address.  I think any
> information we can get out of the debugger would be helpful.  E.g. if you
> could even get line numbers for the Tcl code in Tcl_LoadObjCmd, that might
> reveal what's going on.
>
>  As far as what load does, we have the manual page:
> http://www.tcl.tk/man/tcl8.6/TclCmd/load.htm
>
>  One of the things it does is call the _Init proc for the module.  Since
> that's doing a jump to a computed location, maybe that's one place to look.
>
>  - Tim
>
>
> On Thu, Jul 31, 2014 at 2:29 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
> wrote:
>
>> Yes indeed, I am loading from a shared library which is causing segfault.
>> I tested this with a single line tcl as you suggested:
>>
>>  load ./libdock_wrap.so
>>
>>  $ tclsh8.5 test.tcl
>> Segmentation fault (core dumped)
>>
>>  I do not know why should this happen and possible root cause. This is
>> how the .so is generated:
>>
>>  g++ -O2 -shared -o libdock_wrap.so extension.o  dock_wrap.o
>> objfiles/*.o -L /home/ketan/tcl-install/lib -ltcl8.5 -lm -lpthread
>> -Wl,-rpath -Wl,/home/ketan/tcl-install/lib
>>
>>  Where the objfiles/*.o are the object files required by the
>> application. These object files are generated with the application's
>> config, make except that I added the -fPIC compilation flag as required for
>> generating shared lib.
>>
>>  Do you see anything suspicious in the above line by any chance?
>>
>>  Thanks,
>> Ketan
>>
>>
>>
>>
>>  On Tue, Jul 29, 2014 at 4:20 PM, Tim Armstrong <
>> tim.g.armstrong at gmail.com> wrote:
>>
>>>   The next logical step would be to look at what is actually happening
>>> when you are loading the package.  I don't know exactly how the package is
>>> set up.  However, you can look in pkgIndex.tcl to see what commands are run
>>> (separated by newlines) to load the package.  The stack trace also told use
>>> that it happened in Tcl_LoadObjCmd, so it's probably happened in a load
>>> command.  E.g. in the turbine pkgIndex.tcl you have this: [list load [file
>>> join $dir libtclturbine.so]]
>>>
>>>  Are you loading the library from a shared library?  There appear to be
>>> multiple ways to load a library.
>>>
>>>  If you extract that out into a runnable Tcl file and edit paths
>>> according you'll probably have an even more minimal example, e.g.
>>>
>>> load "./libwhatever.so"
>>>
>>>  - Tim
>>>
>>>
>>>
>>> On Tue, Jul 29, 2014 at 4:01 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>> wrote:
>>>
>>>> I tried a minimal tcl and find the segfault occurs at:
>>>>
>>>>  package require leaf_main 0.0
>>>>
>>>>
>>>>
>>>>  On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong <
>>>> tim.g.armstrong at gmail.com> wrote:
>>>>
>>>>>     Well, anyway, let's backtrack.  The stacktrace already told us
>>>>> that the segfault is happening in a package require statement.
>>>>>
>>>>>  I compiled apps/dock/user-code.swift and looked at the code.  There
>>>>> are two package requires:
>>>>>
>>>>> package require turbine 0.5.0
>>>>> package require leaf_main 0.0
>>>>>
>>>>>  They are up the top before anything else really runs.  So if the
>>>>> problem is in loading one of those packages, whatever happens later is
>>>>> irrelevant.
>>>>>
>>>>>  So how about just running user-code.tcl, or even creating a minimal
>>>>> tcl file with those two package require lines.
>>>>>
>>>>>  You may need to set TCLLIBPATH (a space-separated list:
>>>>> http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to the
>>>>> directories with the turbine and leaf_main pkgIndex.tcl files.
>>>>>
>>>>>  - Tim
>>>>>
>>>>>
>>>>> On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>> We are trying to narrow down the cause of segfault by running the tcl
>>>>>> out of turbine thus getting rid of the swift/T/tcl and turbine script. I
>>>>>> suppose this is the tcl script that gets invoked which in turn invokes the
>>>>>> application.
>>>>>>
>>>>>>
>>>>>>  On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong <
>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>
>>>>>>>   proc just defines the functions. You need to call them for it to
>>>>>>> run.
>>>>>>>
>>>>>>>  What are we trying to achieve by running this file anyway?  This
>>>>>>> look like a set of library functions rather than the entry point for a
>>>>>>> script.
>>>>>>>
>>>>>>>  - Tim
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari <ketan at mcs.anl.gov
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Here is the tcl script with puts messages:
>>>>>>>>
>>>>>>>>  package provide leaf_main 0.0
>>>>>>>>
>>>>>>>>  # dnl Receive USER_LEAF from environment for m4 processing
>>>>>>>> set USER_LEAF dock_wrap
>>>>>>>> puts hello1
>>>>>>>>
>>>>>>>>  namespace eval leaf_main {
>>>>>>>> puts hello2
>>>>>>>>
>>>>>>>>      proc leaf_main_wrap { rc A } {
>>>>>>>>     deeprule $A 1 0 "leaf_main::leaf_main_wrap_impl $rc $A" type
>>>>>>>> $::turbine::WORK
>>>>>>>>     }
>>>>>>>>
>>>>>>>>      proc leaf_main_wrap_impl { rc A } {
>>>>>>>>
>>>>>>>>          global USER_LEAF
>>>>>>>>
>>>>>>>>          set length [ adlb::container_size $A ]
>>>>>>>>         set tds [ adlb::enumerate $A dict all 0 ]
>>>>>>>>         set argv [ list ]
>>>>>>>>
>>>>>>>>          puts hello3
>>>>>>>>
>>>>>>>>          # Fill argv with blanks
>>>>>>>>         dict for { i v } $tds {
>>>>>>>>             lappend argv 0
>>>>>>>>         }
>>>>>>>>         # Set values at ordered list positions
>>>>>>>>         dict for { i v } $tds {
>>>>>>>>             lset argv $i $v
>>>>>>>>         }
>>>>>>>>         set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>>>>>>>>         turbine::store_integer $rc $rc_value
>>>>>>>>         puts hello4
>>>>>>>>     }
>>>>>>>>     puts hello5
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  It prints:
>>>>>>>>
>>>>>>>>  hello1
>>>>>>>> hello2
>>>>>>>> hello5
>>>>>>>>
>>>>>>>>  I see that it is not going in the proc_leaf_main_wrap_impl but I
>>>>>>>> am not familiar enough with TCL to understand why.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong <
>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>  I don't see any reason why that invocation of tclsh would
>>>>>>>>> silently fail to run the tcl script.  Have you attempted to confirm your
>>>>>>>>> hypothesis that it's not running the script, for example by modifying the
>>>>>>>>> script to print something at the beginning or end?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 29, 2014 at 1:42 PM, Ketan Maheshwari <
>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>>> I expect it to run the application or crash on segfault. Nothing
>>>>>>>>>> happens.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Tue, Jul 29, 2014 at 1:39 PM, Tim Armstrong <
>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>   That looks right, it should run dock_wrap.tcl fine.  And it
>>>>>>>>>>> runs successfully to completion with no output?  Is that what you expected
>>>>>>>>>>> it to do?
>>>>>>>>>>>
>>>>>>>>>>>  Backtracking to your original problem, if you could work out
>>>>>>>>>>> which "package require" statement was failing and provide some info about
>>>>>>>>>>> that package it might help understand the issue.
>>>>>>>>>>>
>>>>>>>>>>>  - Tim
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Tue, Jul 29, 2014 at 1:32 PM, Ketan Maheshwari <
>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  I run tclsh as follows:
>>>>>>>>>>>>
>>>>>>>>>>>>  /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i rigid.in
>>>>>>>>>>>>
>>>>>>>>>>>>  and
>>>>>>>>>>>>
>>>>>>>>>>>>  mpiexec -n 3 /home/ketan/tcl-install/bin/tclsh8.5
>>>>>>>>>>>> dock_wrap.tcl -i rigid.in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 1:28 PM, Tim Armstrong <
>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>   I forgot to reply all earlier, re-including the list.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  How are you running tclsh?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 11:53 AM, Ketan Maheshwari <
>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> when I try tclsh, it does not do anything. Just returns with
>>>>>>>>>>>>>> an exit status 0.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 11:02 AM, Tim Armstrong <
>>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   You can run it directly with tclsh or mpiexec tclsh,
>>>>>>>>>>>>>>> which is what turbine eventually does after setting up environment
>>>>>>>>>>>>>>> variables, etc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  - Tim
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:57 AM, Ketan Maheshwari <
>>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is it possible to run the dock_wrap.tcl outside of turbine
>>>>>>>>>>>>>>>> just as in the case of static build?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:45 AM, Wozniak, Justin M. <
>>>>>>>>>>>>>>>> wozniak at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ok, it's in.  The Swift/K SVN is apparently down so it's
>>>>>>>>>>>>>>>>> not on the web yet but see the asciidoc.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 07/29/2014 10:21 AM, Justin M Wozniak wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I thought VALGRIND was in the manual already but it
>>>>>>>>>>>>>>>>> isn't.  I will add it now.  I will also talk about our GDB feature.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 07/29/2014 10:17 AM, Ketan Maheshwari wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks! Seems turbine script already had a placeholder for
>>>>>>>>>>>>>>>>> Valgrind so I tried that and from the output, it seems tcl libraries are
>>>>>>>>>>>>>>>>> causing segfault but I may be wrong. Attached is the Valgrind output.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:05 AM, Tim Armstrong <
>>>>>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  I don't have any particular insight into the cause of
>>>>>>>>>>>>>>>>>> the segfault, I can help with the debugger though.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You need to point gdb at the tclsh that is being used by
>>>>>>>>>>>>>>>>>> turbine (which is just a shell script).  You can locate the correct tclsh
>>>>>>>>>>>>>>>>>> by looking at TCLSH in scripts/turbine-config.sh in the turbine install
>>>>>>>>>>>>>>>>>> directory.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  - TIm
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:00 AM, Ketan Maheshwari <
>>>>>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  Trying to main-wrap DOCK 6.6 application for ATPESC, I
>>>>>>>>>>>>>>>>>>> get the build right (seems) but things fail at runtime giving segfault:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  $ turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>>>>>> =   EXIT CODE: 139
>>>>>>>>>>>>>>>>>>> =   CLEANING UP REMAINING PROCESSES
>>>>>>>>>>>>>>>>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING:
>>>>>>>>>>>>>>>>>>> Segmentation fault (signal 11)
>>>>>>>>>>>>>>>>>>> This typically refers to a problem with your application.
>>>>>>>>>>>>>>>>>>> Please see the FAQ page for debugging suggestions
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  This is on MCS machine. Any suggestion to debug this?
>>>>>>>>>>>>>>>>>>> I tried gdb but it gives:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   "/nfs2/ketan/exm-install/turbine/bin/turbine": not in
>>>>>>>>>>>>>>>>>>> executable format: File format not recognized
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  With strace, I see some signs of missing files but not
>>>>>>>>>>>>>>>>>>> sure if that is the cause of segfault. Attached is the strace output of:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  strace -o strace.out turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  The code has some MPI and pthread elements but does
>>>>>>>>>>>>>>>>>>> not use them as far as I understand.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  Thanks for any suggestions.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>>>>>>> ExM-user mailing list
>>>>>>>>>>>>>>>>>>> ExM-user at lists.mcs.anl.gov
>>>>>>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   --
>>>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140731/a88506ec/attachment-0001.html>


More information about the ExM-user mailing list