[ExM Users] debugging suggestions for non-static main-wrap segfault

Ketan Maheshwari ketan at mcs.anl.gov
Thu Jul 31 14:58:27 CDT 2014


On Thu, Jul 31, 2014 at 2:56 PM, Wozniak, Justin M. <wozniak at mcs.anl.gov>
wrote:

>
> Can you check in vanilla-g++ ?
>

Done.


>
> On 07/31/2014 02:29 PM, Ketan Maheshwari wrote:
>
> Yes indeed, I am loading from a shared library which is causing segfault.
> I tested this with a single line tcl as you suggested:
>
>  load ./libdock_wrap.so
>
>  $ tclsh8.5 test.tcl
> Segmentation fault (core dumped)
>
>  I do not know why should this happen and possible root cause. This is
> how the .so is generated:
>
>  g++ -O2 -shared -o libdock_wrap.so extension.o  dock_wrap.o objfiles/*.o
> -L /home/ketan/tcl-install/lib -ltcl8.5 -lm -lpthread -Wl,-rpath
> -Wl,/home/ketan/tcl-install/lib
>
>  Where the objfiles/*.o are the object files required by the application.
> These object files are generated with the application's config, make except
> that I added the -fPIC compilation flag as required for generating shared
> lib.
>
>  Do you see anything suspicious in the above line by any chance?
>
>  Thanks,
> Ketan
>
>
>
>
> On Tue, Jul 29, 2014 at 4:20 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
> wrote:
>
>>   The next logical step would be to look at what is actually happening
>> when you are loading the package.  I don't know exactly how the package is
>> set up.  However, you can look in pkgIndex.tcl to see what commands are run
>> (separated by newlines) to load the package.  The stack trace also told use
>> that it happened in Tcl_LoadObjCmd, so it's probably happened in a load
>> command.  E.g. in the turbine pkgIndex.tcl you have this: [list load [file
>> join $dir libtclturbine.so]]
>>
>>  Are you loading the library from a shared library?  There appear to be
>> multiple ways to load a library.
>>
>>  If you extract that out into a runnable Tcl file and edit paths
>> according you'll probably have an even more minimal example, e.g.
>>
>> load "./libwhatever.so"
>>
>>  - Tim
>>
>>
>>
>> On Tue, Jul 29, 2014 at 4:01 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>> wrote:
>>
>>> I tried a minimal tcl and find the segfault occurs at:
>>>
>>>  package require leaf_main 0.0
>>>
>>>
>>>
>>>  On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong <
>>> tim.g.armstrong at gmail.com> wrote:
>>>
>>>>     Well, anyway, let's backtrack.  The stacktrace already told us
>>>> that the segfault is happening in a package require statement.
>>>>
>>>>  I compiled apps/dock/user-code.swift and looked at the code.  There
>>>> are two package requires:
>>>>
>>>> package require turbine 0.5.0
>>>> package require leaf_main 0.0
>>>>
>>>>  They are up the top before anything else really runs.  So if the
>>>> problem is in loading one of those packages, whatever happens later is
>>>> irrelevant.
>>>>
>>>>  So how about just running user-code.tcl, or even creating a minimal
>>>> tcl file with those two package require lines.
>>>>
>>>>  You may need to set TCLLIBPATH (a space-separated list:
>>>> http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to the
>>>> directories with the turbine and leaf_main pkgIndex.tcl files.
>>>>
>>>>  - Tim
>>>>
>>>>
>>>> On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> We are trying to narrow down the cause of segfault by running the tcl
>>>>> out of turbine thus getting rid of the swift/T/tcl and turbine script. I
>>>>> suppose this is the tcl script that gets invoked which in turn invokes the
>>>>> application.
>>>>>
>>>>>
>>>>>  On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong <
>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>
>>>>>>   proc just defines the functions. You need to call them for it to
>>>>>> run.
>>>>>>
>>>>>>  What are we trying to achieve by running this file anyway?  This
>>>>>> look like a set of library functions rather than the entry point for a
>>>>>> script.
>>>>>>
>>>>>>  - Tim
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is the tcl script with puts messages:
>>>>>>>
>>>>>>>  package provide leaf_main 0.0
>>>>>>>
>>>>>>>  # dnl Receive USER_LEAF from environment for m4 processing
>>>>>>> set USER_LEAF dock_wrap
>>>>>>> puts hello1
>>>>>>>
>>>>>>>  namespace eval leaf_main {
>>>>>>> puts hello2
>>>>>>>
>>>>>>>      proc leaf_main_wrap { rc A } {
>>>>>>>     deeprule $A 1 0 "leaf_main::leaf_main_wrap_impl $rc $A" type
>>>>>>> $::turbine::WORK
>>>>>>>     }
>>>>>>>
>>>>>>>      proc leaf_main_wrap_impl { rc A } {
>>>>>>>
>>>>>>>          global USER_LEAF
>>>>>>>
>>>>>>>          set length [ adlb::container_size $A ]
>>>>>>>         set tds [ adlb::enumerate $A dict all 0 ]
>>>>>>>         set argv [ list ]
>>>>>>>
>>>>>>>          puts hello3
>>>>>>>
>>>>>>>          # Fill argv with blanks
>>>>>>>         dict for { i v } $tds {
>>>>>>>             lappend argv 0
>>>>>>>         }
>>>>>>>         # Set values at ordered list positions
>>>>>>>         dict for { i v } $tds {
>>>>>>>             lset argv $i $v
>>>>>>>         }
>>>>>>>         set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>>>>>>>         turbine::store_integer $rc $rc_value
>>>>>>>         puts hello4
>>>>>>>     }
>>>>>>>     puts hello5
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  It prints:
>>>>>>>
>>>>>>>  hello1
>>>>>>> hello2
>>>>>>> hello5
>>>>>>>
>>>>>>>  I see that it is not going in the proc_leaf_main_wrap_impl but I
>>>>>>> am not familiar enough with TCL to understand why.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong <
>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>
>>>>>>>>  I don't see any reason why that invocation of tclsh would
>>>>>>>> silently fail to run the tcl script.  Have you attempted to confirm your
>>>>>>>> hypothesis that it's not running the script, for example by modifying the
>>>>>>>> script to print something at the beginning or end?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 29, 2014 at 1:42 PM, Ketan Maheshwari <
>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>>> I expect it to run the application or crash on segfault. Nothing
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  On Tue, Jul 29, 2014 at 1:39 PM, Tim Armstrong <
>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>   That looks right, it should run dock_wrap.tcl fine.  And it
>>>>>>>>>> runs successfully to completion with no output?  Is that what you expected
>>>>>>>>>> it to do?
>>>>>>>>>>
>>>>>>>>>>  Backtracking to your original problem, if you could work out
>>>>>>>>>> which "package require" statement was failing and provide some info about
>>>>>>>>>> that package it might help understand the issue.
>>>>>>>>>>
>>>>>>>>>>  - Tim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Tue, Jul 29, 2014 at 1:32 PM, Ketan Maheshwari <
>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>>  I run tclsh as follows:
>>>>>>>>>>>
>>>>>>>>>>>  /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i rigid.in
>>>>>>>>>>>
>>>>>>>>>>>  and
>>>>>>>>>>>
>>>>>>>>>>>  mpiexec -n 3 /home/ketan/tcl-install/bin/tclsh8.5
>>>>>>>>>>> dock_wrap.tcl -i rigid.in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Tue, Jul 29, 2014 at 1:28 PM, Tim Armstrong <
>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>   I forgot to reply all earlier, re-including the list.
>>>>>>>>>>>>
>>>>>>>>>>>>  How are you running tclsh?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 29, 2014 at 11:53 AM, Ketan Maheshwari <
>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> when I try tclsh, it does not do anything. Just returns with
>>>>>>>>>>>>> an exit status 0.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 11:02 AM, Tim Armstrong <
>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>   You can run it directly with tclsh or mpiexec tclsh, which
>>>>>>>>>>>>>> is what turbine eventually does after setting up environment variables, etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  - Tim
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:57 AM, Ketan Maheshwari <
>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is it possible to run the dock_wrap.tcl outside of turbine
>>>>>>>>>>>>>>> just as in the case of static build?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:45 AM, Wozniak, Justin M. <
>>>>>>>>>>>>>>> wozniak at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok, it's in.  The Swift/K SVN is apparently down so it's
>>>>>>>>>>>>>>>> not on the web yet but see the asciidoc.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 07/29/2014 10:21 AM, Justin M Wozniak wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I thought VALGRIND was in the manual already but it isn't.
>>>>>>>>>>>>>>>> I will add it now.  I will also talk about our GDB feature.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 07/29/2014 10:17 AM, Ketan Maheshwari wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks! Seems turbine script already had a placeholder for
>>>>>>>>>>>>>>>> Valgrind so I tried that and from the output, it seems tcl libraries are
>>>>>>>>>>>>>>>> causing segfault but I may be wrong. Attached is the Valgrind output.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:05 AM, Tim Armstrong <
>>>>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  I don't have any particular insight into the cause of
>>>>>>>>>>>>>>>>> the segfault, I can help with the debugger though.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You need to point gdb at the tclsh that is being used by
>>>>>>>>>>>>>>>>> turbine (which is just a shell script).  You can locate the correct tclsh
>>>>>>>>>>>>>>>>> by looking at TCLSH in scripts/turbine-config.sh in the turbine install
>>>>>>>>>>>>>>>>> directory.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  - TIm
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:00 AM, Ketan Maheshwari <
>>>>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Trying to main-wrap DOCK 6.6 application for ATPESC, I
>>>>>>>>>>>>>>>>>> get the build right (seems) but things fail at runtime giving segfault:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  $ turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>>>>> =   EXIT CODE: 139
>>>>>>>>>>>>>>>>>> =   CLEANING UP REMAINING PROCESSES
>>>>>>>>>>>>>>>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING:
>>>>>>>>>>>>>>>>>> Segmentation fault (signal 11)
>>>>>>>>>>>>>>>>>> This typically refers to a problem with your application.
>>>>>>>>>>>>>>>>>> Please see the FAQ page for debugging suggestions
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  This is on MCS machine. Any suggestion to debug this? I
>>>>>>>>>>>>>>>>>> tried gdb but it gives:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   "/nfs2/ketan/exm-install/turbine/bin/turbine": not in
>>>>>>>>>>>>>>>>>> executable format: File format not recognized
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  With strace, I see some signs of missing files but not
>>>>>>>>>>>>>>>>>> sure if that is the cause of segfault. Attached is the strace output of:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  strace -o strace.out turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  The code has some MPI and pthread elements but does not
>>>>>>>>>>>>>>>>>> use them as far as I understand.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Thanks for any suggestions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>>>>>> ExM-user mailing list
>>>>>>>>>>>>>>>>>> ExM-user at lists.mcs.anl.gov
>>>>>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   --
>>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> _______________________________________________
> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>
>
>
> --
> Justin M Wozniak
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140731/c8f1a446/attachment-0001.html>


More information about the ExM-user mailing list