[ExM Users] debugging suggestions for non-static main-wrap segfault
Justin M Wozniak
wozniak at mcs.anl.gov
Thu Jul 31 14:56:16 CDT 2014
Can you check in vanilla-g++ ?
On 07/31/2014 02:29 PM, Ketan Maheshwari wrote:
> Yes indeed, I am loading from a shared library which is causing
> segfault. I tested this with a single line tcl as you suggested:
>
> load ./libdock_wrap.so
>
> $ tclsh8.5 test.tcl
> Segmentation fault (core dumped)
>
> I do not know why should this happen and possible root cause. This is
> how the .so is generated:
>
> g++ -O2 -shared -o libdock_wrap.so extension.o dock_wrap.o
> objfiles/*.o -L /home/ketan/tcl-install/lib -ltcl8.5 -lm -lpthread
> -Wl,-rpath -Wl,/home/ketan/tcl-install/lib
>
> Where the objfiles/*.o are the object files required by the
> application. These object files are generated with the application's
> config, make except that I added the -fPIC compilation flag as
> required for generating shared lib.
>
> Do you see anything suspicious in the above line by any chance?
>
> Thanks,
> Ketan
>
>
>
>
> On Tue, Jul 29, 2014 at 4:20 PM, Tim Armstrong
> <tim.g.armstrong at gmail.com <mailto:tim.g.armstrong at gmail.com>> wrote:
>
> The next logical step would be to look at what is actually
> happening when you are loading the package. I don't know exactly
> how the package is set up. However, you can look in pkgIndex.tcl
> to see what commands are run (separated by newlines) to load the
> package. The stack trace also told use that it happened in
> Tcl_LoadObjCmd, so it's probably happened in a load command. E.g.
> in the turbine pkgIndex.tcl you have this: [list load [file join
> $dir libtclturbine.so]]
>
> Are you loading the library from a shared library? There appear
> to be multiple ways to load a library.
>
> If you extract that out into a runnable Tcl file and edit paths
> according you'll probably have an even more minimal example, e.g.
>
> load "./libwhatever.so"
>
> - Tim
>
>
>
> On Tue, Jul 29, 2014 at 4:01 PM, Ketan Maheshwari
> <ketan at mcs.anl.gov <mailto:ketan at mcs.anl.gov>> wrote:
>
> I tried a minimal tcl and find the segfault occurs at:
>
> package require leaf_main 0.0
>
>
>
> On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong
> <tim.g.armstrong at gmail.com <mailto:tim.g.armstrong at gmail.com>>
> wrote:
>
> Well, anyway, let's backtrack. The stacktrace already
> told us that the segfault is happening in a package
> require statement.
>
> I compiled apps/dock/user-code.swift and looked at the
> code. There are two package requires:
>
> package require turbine 0.5.0
> package require leaf_main 0.0
>
> They are up the top before anything else really runs. So
> if the problem is in loading one of those packages,
> whatever happens later is irrelevant.
>
> So how about just running user-code.tcl, or even creating
> a minimal tcl file with those two package require lines.
>
> You may need to set TCLLIBPATH (a space-separated list:
> http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to
> the directories with the turbine and leaf_main
> pkgIndex.tcl files.
>
> - Tim
>
>
> On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari
> <ketan at mcs.anl.gov <mailto:ketan at mcs.anl.gov>> wrote:
>
> We are trying to narrow down the cause of segfault by
> running the tcl out of turbine thus getting rid of the
> swift/T/tcl and turbine script. I suppose this is the
> tcl script that gets invoked which in turn invokes the
> application.
>
>
> On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong
> <tim.g.armstrong at gmail.com
> <mailto:tim.g.armstrong at gmail.com>> wrote:
>
> proc just defines the functions. You need to call
> them for it to run.
>
> What are we trying to achieve by running this file
> anyway? This look like a set of library functions
> rather than the entry point for a script.
>
> - Tim
>
>
> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari
> <ketan at mcs.anl.gov <mailto:ketan at mcs.anl.gov>> wrote:
>
> Here is the tcl script with puts messages:
>
> package provide leaf_main 0.0
>
> # dnl Receive USER_LEAF from environment for
> m4 processing
> set USER_LEAF dock_wrap
> puts hello1
>
> namespace eval leaf_main {
> puts hello2
>
> proc leaf_main_wrap { rc A } {
> deeprule $A 1 0
> "leaf_main::leaf_main_wrap_impl $rc $A" type
> $::turbine::WORK
> }
>
> proc leaf_main_wrap_impl { rc A } {
>
> global USER_LEAF
>
> set length [ adlb::container_size $A ]
> set tds [ adlb::enumerate $A dict all 0 ]
> set argv [ list ]
>
> puts hello3
>
> # Fill argv with blanks
> dict for { i v } $tds {
> lappend argv 0
> }
> # Set values at ordered list positions
> dict for { i v } $tds {
> lset argv $i $v
> }
> set rc_value [ ${USER_LEAF}_extension {*}$argv ]
> turbine::store_integer $rc $rc_value
> puts hello4
> }
> puts hello5
> }
>
>
>
>
> It prints:
>
> hello1
> hello2
> hello5
>
> I see that it is not going in the
> proc_leaf_main_wrap_impl but I am not familiar
> enough with TCL to understand why.
>
>
>
> On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong
> <tim.g.armstrong at gmail.com
> <mailto:tim.g.armstrong at gmail.com>> wrote:
>
> I don't see any reason why that invocation
> of tclsh would silently fail to run the
> tcl script. Have you attempted to confirm
> your hypothesis that it's not running the
> script, for example by modifying the
> script to print something at the beginning
> or end?
>
>
> On Tue, Jul 29, 2014 at 1:42 PM, Ketan
> Maheshwari <ketan at mcs.anl.gov
> <mailto:ketan at mcs.anl.gov>> wrote:
>
> I expect it to run the application or
> crash on segfault. Nothing happens.
>
>
>
> On Tue, Jul 29, 2014 at 1:39 PM, Tim
> Armstrong <tim.g.armstrong at gmail.com
> <mailto:tim.g.armstrong at gmail.com>> wrote:
>
> That looks right, it should run
> dock_wrap.tcl fine. And it runs
> successfully to completion with no
> output? Is that what you expected
> it to do?
>
> Backtracking to your original
> problem, if you could work out
> which "package require" statement
> was failing and provide some info
> about that package it might help
> understand the issue.
>
> - Tim
>
>
> On Tue, Jul 29, 2014 at 1:32 PM,
> Ketan Maheshwari
> <ketan at mcs.anl.gov
> <mailto:ketan at mcs.anl.gov>> wrote:
>
> I run tclsh as follows:
>
> /home/ketan/tcl-install/bin/tclsh8.5
> dock_wrap.tcl -i rigid.in
> <http://rigid.in>
>
> and
>
> mpiexec -n 3
> /home/ketan/tcl-install/bin/tclsh8.5
> dock_wrap.tcl -i rigid.in
> <http://rigid.in>
>
>
> On Tue, Jul 29, 2014 at 1:28
> PM, Tim Armstrong
> <tim.g.armstrong at gmail.com
> <mailto:tim.g.armstrong at gmail.com>>
> wrote:
>
> I forgot to reply all
> earlier, re-including the
> list.
>
> How are you running tclsh?
>
>
> On Tue, Jul 29, 2014 at
> 11:53 AM, Ketan Maheshwari
> <ketan at mcs.anl.gov
> <mailto:ketan at mcs.anl.gov>> wrote:
>
> when I try tclsh, it
> does not do anything.
> Just returns with an
> exit status 0.
>
>
> On Tue, Jul 29, 2014
> at 11:02 AM, Tim
> Armstrong
> <tim.g.armstrong at gmail.com
> <mailto:tim.g.armstrong at gmail.com>>
> wrote:
>
> You can run it
> directly with
> tclsh or mpiexec
> tclsh, which is
> what turbine
> eventually does
> after setting up
> environment
> variables, etc.
>
> - Tim
>
>
> On Tue, Jul 29,
> 2014 at 10:57 AM,
> Ketan Maheshwari
> <ketan at mcs.anl.gov
> <mailto:ketan at mcs.anl.gov>>
> wrote:
>
> Is it possible
> to run the
> dock_wrap.tcl
> outside of
> turbine just
> as in the case
> of static build?
>
>
>
>
> On Tue, Jul
> 29, 2014 at
> 10:45 AM,
> Wozniak,
> Justin M.
> <wozniak at mcs.anl.gov
> <mailto:wozniak at mcs.anl.gov>>
> wrote:
>
>
> Ok, it's
> in. The
> Swift/K
> SVN is
> apparently
> down so
> it's not
> on the web
> yet but
> see the
> asciidoc.
>
> On
> 07/29/2014
> 10:21 AM,
> Justin M
> Wozniak wrote:
>>
>> I thought
>> VALGRIND
>> was in
>> the
>> manual
>> already
>> but it
>> isn't. I
>> will add
>> it now.
>> I will
>> also talk
>> about our
>> GDB feature.
>>
>> On
>> 07/29/2014 10:17
>> AM, Ketan
>> Maheshwari wrote:
>>> Thanks!
>>> Seems
>>> turbine
>>> script
>>> already
>>> had a
>>> placeholder
>>> for
>>> Valgrind
>>> so I
>>> tried
>>> that and
>>> from the
>>> output,
>>> it seems
>>> tcl
>>> libraries are
>>> causing
>>> segfault
>>> but I
>>> may be
>>> wrong.
>>> Attached
>>> is the
>>> Valgrind
>>> output.
>>>
>>>
>>>
>>> On Tue,
>>> Jul 29,
>>> 2014 at
>>> 10:05
>>> AM, Tim
>>> Armstrong <tim.g.armstrong at gmail.com
>>> <mailto:tim.g.armstrong at gmail.com>>
>>> wrote:
>>>
>>> I
>>> don't have
>>> any
>>> particular
>>> insight
>>> into
>>> the
>>> cause of
>>> the
>>> segfault,
>>> I
>>> can
>>> help
>>> with
>>> the
>>> debugger
>>> though.
>>>
>>> You
>>> need
>>> to
>>> point gdb
>>> at
>>> the
>>> tclsh that
>>> is
>>> being used
>>> by
>>> turbine
>>> (which
>>> is
>>> just
>>> a
>>> shell script).
>>> You
>>> can
>>> locate
>>> the
>>> correct
>>> tclsh by
>>> looking
>>> at
>>> TCLSH in
>>> scripts/turbine-config.sh
>>> in
>>> the
>>> turbine
>>> install
>>> directory.
>>>
>>> - TIm
>>>
>>>
>>> On
>>> Tue,
>>> Jul
>>> 29,
>>> 2014
>>> at
>>> 10:00 AM,
>>> Ketan Maheshwari
>>> <ketan at mcs.anl.gov
>>> <mailto:ketan at mcs.anl.gov>>
>>> wrote:
>>>
>>> Hi,
>>>
>>> Trying
>>> to
>>> main-wrap
>>> DOCK
>>> 6.6
>>> application
>>> for
>>> ATPESC,
>>> I get
>>> the
>>> build
>>> right
>>> (seems)
>>> but
>>> things
>>> fail
>>> at
>>> runtime
>>> giving
>>> segfault:
>>>
>>> $ turbine
>>> -n
>>> 4 user-code.tcl
>>>
>>> ===================================================================================
>>> =
>>> BAD
>>> TERMINATION
>>> OF
>>> ONE
>>> OF
>>> YOUR
>>> APPLICATION
>>> PROCESSES
>>> =
>>> EXIT
>>> CODE:
>>> 139
>>> = CLEANING
>>> UP
>>> REMAINING
>>> PROCESSES
>>> =
>>> YOU
>>> CAN
>>> IGNORE
>>> THE
>>> BELOW
>>> CLEANUP
>>> MESSAGES
>>> ===================================================================================
>>> YOUR
>>> APPLICATION
>>> TERMINATED
>>> WITH
>>> THE
>>> EXIT
>>> STRING:
>>> Segmentation
>>> fault
>>> (signal
>>> 11)
>>> This
>>> typically
>>> refers
>>> to
>>> a problem
>>> with
>>> your
>>> application.
>>> Please
>>> see
>>> the
>>> FAQ
>>> page
>>> for
>>> debugging
>>> suggestions
>>>
>>> This
>>> is
>>> on
>>> MCS
>>> machine.
>>> Any
>>> suggestion
>>> to
>>> debug
>>> this?
>>> I tried
>>> gdb
>>> but
>>> it
>>> gives:
>>>
>>> "/nfs2/ketan/exm-install/turbine/bin/turbine":
>>> not
>>> in
>>> executable
>>> format:
>>> File
>>> format
>>> not
>>> recognized
>>>
>>> With
>>> strace,
>>> I see
>>> some
>>> signs
>>> of
>>> missing
>>> files
>>> but
>>> not
>>> sure
>>> if
>>> that
>>> is
>>> the
>>> cause
>>> of
>>> segfault.
>>> Attached
>>> is
>>> the
>>> strace
>>> output
>>> of:
>>>
>>> strace
>>> -o
>>> strace.out
>>> turbine
>>> -n
>>> 4 user-code.tcl
>>>
>>> The
>>> code
>>> has
>>> some
>>> MPI
>>> and
>>> pthread
>>> elements
>>> but
>>> does
>>> not
>>> use
>>> them
>>> as
>>> far
>>> as
>>> I understand.
>>>
>>> Thanks
>>> for
>>> any
>>> suggestions.
>>>
>>> --
>>> Ketan
>>>
>>> _______________________________________________
>>> ExM-user
>>> mailing
>>> list
>>> ExM-user at lists.mcs.anl.gov
>>> <mailto:ExM-user at lists.mcs.anl.gov>
>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ExM-user mailing list
>>> ExM-user at lists.mcs.anl.gov <mailto:ExM-user at lists.mcs.anl.gov>
>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>
>>
>> --
>> Justin M Wozniak
>
>
> --
> Justin M Wozniak
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> ExM-user mailing list
> ExM-user at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
--
Justin M Wozniak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140731/03b7302b/attachment-0001.html>
More information about the ExM-user
mailing list