[ExM Users] debugging suggestions for non-static main-wrap segfault

Tim Armstrong tim.g.armstrong at gmail.com
Thu Jul 31 14:45:40 CDT 2014


I'm not really sure, but there's nothing unusual about that output.  The
valgrind output earlier suggested it was jumping to an invalid address.  It
might be interesting to know what that address is, e.g. if it's in the
shared object code, or just a random address.  I think any information we
can get out of the debugger would be helpful.  E.g. if you could even get
line numbers for the Tcl code in Tcl_LoadObjCmd, that might reveal what's
going on.

As far as what load does, we have the manual page:
http://www.tcl.tk/man/tcl8.6/TclCmd/load.htm

One of the things it does is call the _Init proc for the module.  Since
that's doing a jump to a computed location, maybe that's one place to look.

- Tim


On Thu, Jul 31, 2014 at 2:29 PM, Ketan Maheshwari <ketan at mcs.anl.gov> wrote:

> Yes indeed, I am loading from a shared library which is causing segfault.
> I tested this with a single line tcl as you suggested:
>
> load ./libdock_wrap.so
>
> $ tclsh8.5 test.tcl
> Segmentation fault (core dumped)
>
> I do not know why should this happen and possible root cause. This is how
> the .so is generated:
>
> g++ -O2 -shared -o libdock_wrap.so extension.o  dock_wrap.o objfiles/*.o
> -L /home/ketan/tcl-install/lib -ltcl8.5 -lm -lpthread -Wl,-rpath
> -Wl,/home/ketan/tcl-install/lib
>
> Where the objfiles/*.o are the object files required by the application.
> These object files are generated with the application's config, make except
> that I added the -fPIC compilation flag as required for generating shared
> lib.
>
> Do you see anything suspicious in the above line by any chance?
>
> Thanks,
> Ketan
>
>
>
>
> On Tue, Jul 29, 2014 at 4:20 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
> wrote:
>
>>   The next logical step would be to look at what is actually happening
>> when you are loading the package.  I don't know exactly how the package is
>> set up.  However, you can look in pkgIndex.tcl to see what commands are run
>> (separated by newlines) to load the package.  The stack trace also told use
>> that it happened in Tcl_LoadObjCmd, so it's probably happened in a load
>> command.  E.g. in the turbine pkgIndex.tcl you have this: [list load [file
>> join $dir libtclturbine.so]]
>>
>>  Are you loading the library from a shared library?  There appear to be
>> multiple ways to load a library.
>>
>>  If you extract that out into a runnable Tcl file and edit paths
>> according you'll probably have an even more minimal example, e.g.
>>
>> load "./libwhatever.so"
>>
>>  - Tim
>>
>>
>>
>> On Tue, Jul 29, 2014 at 4:01 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>> wrote:
>>
>>> I tried a minimal tcl and find the segfault occurs at:
>>>
>>>  package require leaf_main 0.0
>>>
>>>
>>>
>>>  On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong <
>>> tim.g.armstrong at gmail.com> wrote:
>>>
>>>>     Well, anyway, let's backtrack.  The stacktrace already told us
>>>> that the segfault is happening in a package require statement.
>>>>
>>>>  I compiled apps/dock/user-code.swift and looked at the code.  There
>>>> are two package requires:
>>>>
>>>> package require turbine 0.5.0
>>>> package require leaf_main 0.0
>>>>
>>>>  They are up the top before anything else really runs.  So if the
>>>> problem is in loading one of those packages, whatever happens later is
>>>> irrelevant.
>>>>
>>>>  So how about just running user-code.tcl, or even creating a minimal
>>>> tcl file with those two package require lines.
>>>>
>>>>  You may need to set TCLLIBPATH (a space-separated list:
>>>> http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to the
>>>> directories with the turbine and leaf_main pkgIndex.tcl files.
>>>>
>>>>  - Tim
>>>>
>>>>
>>>> On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> We are trying to narrow down the cause of segfault by running the tcl
>>>>> out of turbine thus getting rid of the swift/T/tcl and turbine script. I
>>>>> suppose this is the tcl script that gets invoked which in turn invokes the
>>>>> application.
>>>>>
>>>>>
>>>>>  On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong <
>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>
>>>>>>   proc just defines the functions. You need to call them for it to
>>>>>> run.
>>>>>>
>>>>>>  What are we trying to achieve by running this file anyway?  This
>>>>>> look like a set of library functions rather than the entry point for a
>>>>>> script.
>>>>>>
>>>>>>  - Tim
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is the tcl script with puts messages:
>>>>>>>
>>>>>>>  package provide leaf_main 0.0
>>>>>>>
>>>>>>>  # dnl Receive USER_LEAF from environment for m4 processing
>>>>>>> set USER_LEAF dock_wrap
>>>>>>> puts hello1
>>>>>>>
>>>>>>>  namespace eval leaf_main {
>>>>>>> puts hello2
>>>>>>>
>>>>>>>      proc leaf_main_wrap { rc A } {
>>>>>>>     deeprule $A 1 0 "leaf_main::leaf_main_wrap_impl $rc $A" type
>>>>>>> $::turbine::WORK
>>>>>>>     }
>>>>>>>
>>>>>>>      proc leaf_main_wrap_impl { rc A } {
>>>>>>>
>>>>>>>          global USER_LEAF
>>>>>>>
>>>>>>>          set length [ adlb::container_size $A ]
>>>>>>>         set tds [ adlb::enumerate $A dict all 0 ]
>>>>>>>         set argv [ list ]
>>>>>>>
>>>>>>>          puts hello3
>>>>>>>
>>>>>>>          # Fill argv with blanks
>>>>>>>         dict for { i v } $tds {
>>>>>>>             lappend argv 0
>>>>>>>         }
>>>>>>>         # Set values at ordered list positions
>>>>>>>         dict for { i v } $tds {
>>>>>>>             lset argv $i $v
>>>>>>>         }
>>>>>>>         set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>>>>>>>         turbine::store_integer $rc $rc_value
>>>>>>>         puts hello4
>>>>>>>     }
>>>>>>>     puts hello5
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  It prints:
>>>>>>>
>>>>>>>  hello1
>>>>>>> hello2
>>>>>>> hello5
>>>>>>>
>>>>>>>  I see that it is not going in the proc_leaf_main_wrap_impl but I
>>>>>>> am not familiar enough with TCL to understand why.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong <
>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>
>>>>>>>>  I don't see any reason why that invocation of tclsh would
>>>>>>>> silently fail to run the tcl script.  Have you attempted to confirm your
>>>>>>>> hypothesis that it's not running the script, for example by modifying the
>>>>>>>> script to print something at the beginning or end?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 29, 2014 at 1:42 PM, Ketan Maheshwari <
>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>>> I expect it to run the application or crash on segfault. Nothing
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  On Tue, Jul 29, 2014 at 1:39 PM, Tim Armstrong <
>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>   That looks right, it should run dock_wrap.tcl fine.  And it
>>>>>>>>>> runs successfully to completion with no output?  Is that what you expected
>>>>>>>>>> it to do?
>>>>>>>>>>
>>>>>>>>>>  Backtracking to your original problem, if you could work out
>>>>>>>>>> which "package require" statement was failing and provide some info about
>>>>>>>>>> that package it might help understand the issue.
>>>>>>>>>>
>>>>>>>>>>  - Tim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Tue, Jul 29, 2014 at 1:32 PM, Ketan Maheshwari <
>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>>  I run tclsh as follows:
>>>>>>>>>>>
>>>>>>>>>>>  /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i rigid.in
>>>>>>>>>>>
>>>>>>>>>>>  and
>>>>>>>>>>>
>>>>>>>>>>>  mpiexec -n 3 /home/ketan/tcl-install/bin/tclsh8.5
>>>>>>>>>>> dock_wrap.tcl -i rigid.in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Tue, Jul 29, 2014 at 1:28 PM, Tim Armstrong <
>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>   I forgot to reply all earlier, re-including the list.
>>>>>>>>>>>>
>>>>>>>>>>>>  How are you running tclsh?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 29, 2014 at 11:53 AM, Ketan Maheshwari <
>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> when I try tclsh, it does not do anything. Just returns with
>>>>>>>>>>>>> an exit status 0.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 11:02 AM, Tim Armstrong <
>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>   You can run it directly with tclsh or mpiexec tclsh, which
>>>>>>>>>>>>>> is what turbine eventually does after setting up environment variables, etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  - Tim
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:57 AM, Ketan Maheshwari <
>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is it possible to run the dock_wrap.tcl outside of turbine
>>>>>>>>>>>>>>> just as in the case of static build?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:45 AM, Wozniak, Justin M. <
>>>>>>>>>>>>>>> wozniak at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok, it's in.  The Swift/K SVN is apparently down so it's
>>>>>>>>>>>>>>>> not on the web yet but see the asciidoc.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 07/29/2014 10:21 AM, Justin M Wozniak wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I thought VALGRIND was in the manual already but it isn't.
>>>>>>>>>>>>>>>> I will add it now.  I will also talk about our GDB feature.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 07/29/2014 10:17 AM, Ketan Maheshwari wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks! Seems turbine script already had a placeholder for
>>>>>>>>>>>>>>>> Valgrind so I tried that and from the output, it seems tcl libraries are
>>>>>>>>>>>>>>>> causing segfault but I may be wrong. Attached is the Valgrind output.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:05 AM, Tim Armstrong <
>>>>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  I don't have any particular insight into the cause of
>>>>>>>>>>>>>>>>> the segfault, I can help with the debugger though.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You need to point gdb at the tclsh that is being used by
>>>>>>>>>>>>>>>>> turbine (which is just a shell script).  You can locate the correct tclsh
>>>>>>>>>>>>>>>>> by looking at TCLSH in scripts/turbine-config.sh in the turbine install
>>>>>>>>>>>>>>>>> directory.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  - TIm
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:00 AM, Ketan Maheshwari <
>>>>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Trying to main-wrap DOCK 6.6 application for ATPESC, I
>>>>>>>>>>>>>>>>>> get the build right (seems) but things fail at runtime giving segfault:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  $ turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>>>>> =   EXIT CODE: 139
>>>>>>>>>>>>>>>>>> =   CLEANING UP REMAINING PROCESSES
>>>>>>>>>>>>>>>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING:
>>>>>>>>>>>>>>>>>> Segmentation fault (signal 11)
>>>>>>>>>>>>>>>>>> This typically refers to a problem with your application.
>>>>>>>>>>>>>>>>>> Please see the FAQ page for debugging suggestions
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  This is on MCS machine. Any suggestion to debug this? I
>>>>>>>>>>>>>>>>>> tried gdb but it gives:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "/nfs2/ketan/exm-install/turbine/bin/turbine": not in
>>>>>>>>>>>>>>>>>> executable format: File format not recognized
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  With strace, I see some signs of missing files but not
>>>>>>>>>>>>>>>>>> sure if that is the cause of segfault. Attached is the strace output of:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  strace -o strace.out turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  The code has some MPI and pthread elements but does not
>>>>>>>>>>>>>>>>>> use them as far as I understand.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Thanks for any suggestions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>>>>>> ExM-user mailing list
>>>>>>>>>>>>>>>>>> ExM-user at lists.mcs.anl.gov
>>>>>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   --
>>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140731/9e061987/attachment-0001.html>


More information about the ExM-user mailing list