[ExM Users] debugging suggestions for non-static main-wrap segfault
Ketan Maheshwari
ketan at mcs.anl.gov
Thu Jul 31 14:29:29 CDT 2014
Yes indeed, I am loading from a shared library which is causing segfault. I
tested this with a single line tcl as you suggested:
load ./libdock_wrap.so
$ tclsh8.5 test.tcl
Segmentation fault (core dumped)
I do not know why should this happen and possible root cause. This is how
the .so is generated:
g++ -O2 -shared -o libdock_wrap.so extension.o dock_wrap.o objfiles/*.o -L
/home/ketan/tcl-install/lib -ltcl8.5 -lm -lpthread -Wl,-rpath
-Wl,/home/ketan/tcl-install/lib
Where the objfiles/*.o are the object files required by the application.
These object files are generated with the application's config, make except
that I added the -fPIC compilation flag as required for generating shared
lib.
Do you see anything suspicious in the above line by any chance?
Thanks,
Ketan
On Tue, Jul 29, 2014 at 4:20 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
wrote:
> The next logical step would be to look at what is actually happening
> when you are loading the package. I don't know exactly how the package is
> set up. However, you can look in pkgIndex.tcl to see what commands are run
> (separated by newlines) to load the package. The stack trace also told use
> that it happened in Tcl_LoadObjCmd, so it's probably happened in a load
> command. E.g. in the turbine pkgIndex.tcl you have this: [list load [file
> join $dir libtclturbine.so]]
>
> Are you loading the library from a shared library? There appear to be
> multiple ways to load a library.
>
> If you extract that out into a runnable Tcl file and edit paths according
> you'll probably have an even more minimal example, e.g.
>
> load "./libwhatever.so"
>
> - Tim
>
>
>
> On Tue, Jul 29, 2014 at 4:01 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
> wrote:
>
>> I tried a minimal tcl and find the segfault occurs at:
>>
>> package require leaf_main 0.0
>>
>>
>>
>> On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong <
>> tim.g.armstrong at gmail.com> wrote:
>>
>>> Well, anyway, let's backtrack. The stacktrace already told us that
>>> the segfault is happening in a package require statement.
>>>
>>> I compiled apps/dock/user-code.swift and looked at the code. There are
>>> two package requires:
>>>
>>> package require turbine 0.5.0
>>> package require leaf_main 0.0
>>>
>>> They are up the top before anything else really runs. So if the
>>> problem is in loading one of those packages, whatever happens later is
>>> irrelevant.
>>>
>>> So how about just running user-code.tcl, or even creating a minimal tcl
>>> file with those two package require lines.
>>>
>>> You may need to set TCLLIBPATH (a space-separated list:
>>> http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to the directories
>>> with the turbine and leaf_main pkgIndex.tcl files.
>>>
>>> - Tim
>>>
>>>
>>> On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>> wrote:
>>>
>>>> We are trying to narrow down the cause of segfault by running the tcl
>>>> out of turbine thus getting rid of the swift/T/tcl and turbine script. I
>>>> suppose this is the tcl script that gets invoked which in turn invokes the
>>>> application.
>>>>
>>>>
>>>> On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong <
>>>> tim.g.armstrong at gmail.com> wrote:
>>>>
>>>>> proc just defines the functions. You need to call them for it to
>>>>> run.
>>>>>
>>>>> What are we trying to achieve by running this file anyway? This
>>>>> look like a set of library functions rather than the entry point for a
>>>>> script.
>>>>>
>>>>> - Tim
>>>>>
>>>>>
>>>>> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>> Here is the tcl script with puts messages:
>>>>>>
>>>>>> package provide leaf_main 0.0
>>>>>>
>>>>>> # dnl Receive USER_LEAF from environment for m4 processing
>>>>>> set USER_LEAF dock_wrap
>>>>>> puts hello1
>>>>>>
>>>>>> namespace eval leaf_main {
>>>>>> puts hello2
>>>>>>
>>>>>> proc leaf_main_wrap { rc A } {
>>>>>> deeprule $A 1 0 "leaf_main::leaf_main_wrap_impl $rc $A" type
>>>>>> $::turbine::WORK
>>>>>> }
>>>>>>
>>>>>> proc leaf_main_wrap_impl { rc A } {
>>>>>>
>>>>>> global USER_LEAF
>>>>>>
>>>>>> set length [ adlb::container_size $A ]
>>>>>> set tds [ adlb::enumerate $A dict all 0 ]
>>>>>> set argv [ list ]
>>>>>>
>>>>>> puts hello3
>>>>>>
>>>>>> # Fill argv with blanks
>>>>>> dict for { i v } $tds {
>>>>>> lappend argv 0
>>>>>> }
>>>>>> # Set values at ordered list positions
>>>>>> dict for { i v } $tds {
>>>>>> lset argv $i $v
>>>>>> }
>>>>>> set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>>>>>> turbine::store_integer $rc $rc_value
>>>>>> puts hello4
>>>>>> }
>>>>>> puts hello5
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> It prints:
>>>>>>
>>>>>> hello1
>>>>>> hello2
>>>>>> hello5
>>>>>>
>>>>>> I see that it is not going in the proc_leaf_main_wrap_impl but I am
>>>>>> not familiar enough with TCL to understand why.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong <
>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>
>>>>>>> I don't see any reason why that invocation of tclsh would silently
>>>>>>> fail to run the tcl script. Have you attempted to confirm your hypothesis
>>>>>>> that it's not running the script, for example by modifying the script to
>>>>>>> print something at the beginning or end?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 29, 2014 at 1:42 PM, Ketan Maheshwari <ketan at mcs.anl.gov
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> I expect it to run the application or crash on segfault. Nothing
>>>>>>>> happens.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 29, 2014 at 1:39 PM, Tim Armstrong <
>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> That looks right, it should run dock_wrap.tcl fine. And it
>>>>>>>>> runs successfully to completion with no output? Is that what you expected
>>>>>>>>> it to do?
>>>>>>>>>
>>>>>>>>> Backtracking to your original problem, if you could work out
>>>>>>>>> which "package require" statement was failing and provide some info about
>>>>>>>>> that package it might help understand the issue.
>>>>>>>>>
>>>>>>>>> - Tim
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 29, 2014 at 1:32 PM, Ketan Maheshwari <
>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>>> I run tclsh as follows:
>>>>>>>>>>
>>>>>>>>>> /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i rigid.in
>>>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> mpiexec -n 3 /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl
>>>>>>>>>> -i rigid.in
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 29, 2014 at 1:28 PM, Tim Armstrong <
>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I forgot to reply all earlier, re-including the list.
>>>>>>>>>>>
>>>>>>>>>>> How are you running tclsh?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 29, 2014 at 11:53 AM, Ketan Maheshwari <
>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> when I try tclsh, it does not do anything. Just returns with an
>>>>>>>>>>>> exit status 0.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 29, 2014 at 11:02 AM, Tim Armstrong <
>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You can run it directly with tclsh or mpiexec tclsh, which
>>>>>>>>>>>>> is what turbine eventually does after setting up environment variables, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Tim
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:57 AM, Ketan Maheshwari <
>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is it possible to run the dock_wrap.tcl outside of turbine
>>>>>>>>>>>>>> just as in the case of static build?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:45 AM, Wozniak, Justin M. <
>>>>>>>>>>>>>> wozniak at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ok, it's in. The Swift/K SVN is apparently down so it's not
>>>>>>>>>>>>>>> on the web yet but see the asciidoc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 07/29/2014 10:21 AM, Justin M Wozniak wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I thought VALGRIND was in the manual already but it isn't.
>>>>>>>>>>>>>>> I will add it now. I will also talk about our GDB feature.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 07/29/2014 10:17 AM, Ketan Maheshwari wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks! Seems turbine script already had a placeholder for
>>>>>>>>>>>>>>> Valgrind so I tried that and from the output, it seems tcl libraries are
>>>>>>>>>>>>>>> causing segfault but I may be wrong. Attached is the Valgrind output.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:05 AM, Tim Armstrong <
>>>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't have any particular insight into the cause of the
>>>>>>>>>>>>>>>> segfault, I can help with the debugger though.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You need to point gdb at the tclsh that is being used by
>>>>>>>>>>>>>>>> turbine (which is just a shell script). You can locate the correct tclsh
>>>>>>>>>>>>>>>> by looking at TCLSH in scripts/turbine-config.sh in the turbine install
>>>>>>>>>>>>>>>> directory.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - TIm
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:00 AM, Ketan Maheshwari <
>>>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Trying to main-wrap DOCK 6.6 application for ATPESC, I
>>>>>>>>>>>>>>>>> get the build right (seems) but things fail at runtime giving segfault:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> $ turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>>>> = EXIT CODE: 139
>>>>>>>>>>>>>>>>> = CLEANING UP REMAINING PROCESSES
>>>>>>>>>>>>>>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING:
>>>>>>>>>>>>>>>>> Segmentation fault (signal 11)
>>>>>>>>>>>>>>>>> This typically refers to a problem with your application.
>>>>>>>>>>>>>>>>> Please see the FAQ page for debugging suggestions
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is on MCS machine. Any suggestion to debug this? I
>>>>>>>>>>>>>>>>> tried gdb but it gives:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "/nfs2/ketan/exm-install/turbine/bin/turbine": not in
>>>>>>>>>>>>>>>>> executable format: File format not recognized
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With strace, I see some signs of missing files but not
>>>>>>>>>>>>>>>>> sure if that is the cause of segfault. Attached is the strace output of:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> strace -o strace.out turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The code has some MPI and pthread elements but does not
>>>>>>>>>>>>>>>>> use them as far as I understand.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for any suggestions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> ExM-user mailing list
>>>>>>>>>>>>>>>>> ExM-user at lists.mcs.anl.gov
>>>>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140731/e1a35013/attachment-0001.html>
More information about the ExM-user
mailing list