[ExM Users] debugging suggestions for non-static main-wrap segfault

Tim Armstrong tim.g.armstrong at gmail.com
Tue Jul 29 16:20:52 CDT 2014


The next logical step would be to look at what is actually happening when
you are loading the package.  I don't know exactly how the package is set
up.  However, you can look in pkgIndex.tcl to see what commands are run
(separated by newlines) to load the package.  The stack trace also told use
that it happened in Tcl_LoadObjCmd, so it's probably happened in a load
command.  E.g. in the turbine pkgIndex.tcl you have this: [list load [file
join $dir libtclturbine.so]]

Are you loading the library from a shared library?  There appear to be
multiple ways to load a library.

If you extract that out into a runnable Tcl file and edit paths according
you'll probably have an even more minimal example, e.g.

load "./libwhatever.so"

- Tim


On Tue, Jul 29, 2014 at 4:01 PM, Ketan Maheshwari <ketan at mcs.anl.gov> wrote:

> I tried a minimal tcl and find the segfault occurs at:
>
> package require leaf_main 0.0
>
>
>
> On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
> wrote:
>
>>    Well, anyway, let's backtrack.  The stacktrace already told us that
>> the segfault is happening in a package require statement.
>>
>>  I compiled apps/dock/user-code.swift and looked at the code.  There are
>> two package requires:
>>
>> package require turbine 0.5.0
>> package require leaf_main 0.0
>>
>>  They are up the top before anything else really runs.  So if the problem
>> is in loading one of those packages, whatever happens later is irrelevant.
>>
>>  So how about just running user-code.tcl, or even creating a minimal tcl
>> file with those two package require lines.
>>
>>  You may need to set TCLLIBPATH (a space-separated list:
>> http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to the directories
>> with the turbine and leaf_main pkgIndex.tcl files.
>>
>>  - Tim
>>
>>
>> On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>> wrote:
>>
>>> We are trying to narrow down the cause of segfault by running the tcl
>>> out of turbine thus getting rid of the swift/T/tcl and turbine script. I
>>> suppose this is the tcl script that gets invoked which in turn invokes the
>>> application.
>>>
>>>
>>>  On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong <
>>> tim.g.armstrong at gmail.com> wrote:
>>>
>>>>   proc just defines the functions. You need to call them for it to run.
>>>>
>>>>  What are we trying to achieve by running this file anyway?  This look
>>>> like a set of library functions rather than the entry point for a script.
>>>>
>>>>  - Tim
>>>>
>>>>
>>>> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>> Here is the tcl script with puts messages:
>>>>>
>>>>>  package provide leaf_main 0.0
>>>>>
>>>>>  # dnl Receive USER_LEAF from environment for m4 processing
>>>>> set USER_LEAF dock_wrap
>>>>> puts hello1
>>>>>
>>>>>  namespace eval leaf_main {
>>>>> puts hello2
>>>>>
>>>>>      proc leaf_main_wrap { rc A } {
>>>>>     deeprule $A 1 0 "leaf_main::leaf_main_wrap_impl $rc $A" type
>>>>> $::turbine::WORK
>>>>>     }
>>>>>
>>>>>      proc leaf_main_wrap_impl { rc A } {
>>>>>
>>>>>          global USER_LEAF
>>>>>
>>>>>          set length [ adlb::container_size $A ]
>>>>>         set tds [ adlb::enumerate $A dict all 0 ]
>>>>>         set argv [ list ]
>>>>>
>>>>>          puts hello3
>>>>>
>>>>>          # Fill argv with blanks
>>>>>         dict for { i v } $tds {
>>>>>             lappend argv 0
>>>>>         }
>>>>>         # Set values at ordered list positions
>>>>>         dict for { i v } $tds {
>>>>>             lset argv $i $v
>>>>>         }
>>>>>         set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>>>>>         turbine::store_integer $rc $rc_value
>>>>>         puts hello4
>>>>>     }
>>>>>     puts hello5
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  It prints:
>>>>>
>>>>>  hello1
>>>>> hello2
>>>>> hello5
>>>>>
>>>>>  I see that it is not going in the proc_leaf_main_wrap_impl but I am
>>>>> not familiar enough with TCL to understand why.
>>>>>
>>>>>
>>>>>
>>>>>  On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong <
>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>
>>>>>>  I don't see any reason why that invocation of tclsh would silently
>>>>>> fail to run the tcl script.  Have you attempted to confirm your hypothesis
>>>>>> that it's not running the script, for example by modifying the script to
>>>>>> print something at the beginning or end?
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 29, 2014 at 1:42 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> I expect it to run the application or crash on segfault. Nothing
>>>>>>> happens.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  On Tue, Jul 29, 2014 at 1:39 PM, Tim Armstrong <
>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>
>>>>>>>>   That looks right, it should run dock_wrap.tcl fine.  And it runs
>>>>>>>> successfully to completion with no output?  Is that what you expected it to
>>>>>>>> do?
>>>>>>>>
>>>>>>>>  Backtracking to your original problem, if you could work out
>>>>>>>> which "package require" statement was failing and provide some info about
>>>>>>>> that package it might help understand the issue.
>>>>>>>>
>>>>>>>>  - Tim
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Tue, Jul 29, 2014 at 1:32 PM, Ketan Maheshwari <
>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>>>  I run tclsh as follows:
>>>>>>>>>
>>>>>>>>>  /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i rigid.in
>>>>>>>>>
>>>>>>>>>  and
>>>>>>>>>
>>>>>>>>>  mpiexec -n 3 /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl
>>>>>>>>> -i rigid.in
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  On Tue, Jul 29, 2014 at 1:28 PM, Tim Armstrong <
>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>   I forgot to reply all earlier, re-including the list.
>>>>>>>>>>
>>>>>>>>>>  How are you running tclsh?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 29, 2014 at 11:53 AM, Ketan Maheshwari <
>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>> when I try tclsh, it does not do anything. Just returns with an
>>>>>>>>>>> exit status 0.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Tue, Jul 29, 2014 at 11:02 AM, Tim Armstrong <
>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>   You can run it directly with tclsh or mpiexec tclsh, which
>>>>>>>>>>>> is what turbine eventually does after setting up environment variables, etc.
>>>>>>>>>>>>
>>>>>>>>>>>>  - Tim
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:57 AM, Ketan Maheshwari <
>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Is it possible to run the dock_wrap.tcl outside of turbine
>>>>>>>>>>>>> just as in the case of static build?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:45 AM, Wozniak, Justin M. <
>>>>>>>>>>>>> wozniak at mcs.anl.gov> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, it's in.  The Swift/K SVN is apparently down so it's not
>>>>>>>>>>>>>> on the web yet but see the asciidoc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07/29/2014 10:21 AM, Justin M Wozniak wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I thought VALGRIND was in the manual already but it isn't.  I
>>>>>>>>>>>>>> will add it now.  I will also talk about our GDB feature.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 07/29/2014 10:17 AM, Ketan Maheshwari wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks! Seems turbine script already had a placeholder for
>>>>>>>>>>>>>> Valgrind so I tried that and from the output, it seems tcl libraries are
>>>>>>>>>>>>>> causing segfault but I may be wrong. Attached is the Valgrind output.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:05 AM, Tim Armstrong <
>>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  I don't have any particular insight into the cause of the
>>>>>>>>>>>>>>> segfault, I can help with the debugger though.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You need to point gdb at the tclsh that is being used by
>>>>>>>>>>>>>>> turbine (which is just a shell script).  You can locate the correct tclsh
>>>>>>>>>>>>>>> by looking at TCLSH in scripts/turbine-config.sh in the turbine install
>>>>>>>>>>>>>>> directory.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  - TIm
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:00 AM, Ketan Maheshwari <
>>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  Trying to main-wrap DOCK 6.6 application for ATPESC, I
>>>>>>>>>>>>>>>> get the build right (seems) but things fail at runtime giving segfault:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  $ turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>>> =   EXIT CODE: 139
>>>>>>>>>>>>>>>> =   CLEANING UP REMAINING PROCESSES
>>>>>>>>>>>>>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING:
>>>>>>>>>>>>>>>> Segmentation fault (signal 11)
>>>>>>>>>>>>>>>> This typically refers to a problem with your application.
>>>>>>>>>>>>>>>> Please see the FAQ page for debugging suggestions
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  This is on MCS machine. Any suggestion to debug this? I
>>>>>>>>>>>>>>>> tried gdb but it gives:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "/nfs2/ketan/exm-install/turbine/bin/turbine": not in
>>>>>>>>>>>>>>>> executable format: File format not recognized
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  With strace, I see some signs of missing files but not
>>>>>>>>>>>>>>>> sure if that is the cause of segfault. Attached is the strace output of:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  strace -o strace.out turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  The code has some MPI and pthread elements but does not
>>>>>>>>>>>>>>>> use them as far as I understand.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  Thanks for any suggestions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>>>> ExM-user mailing list
>>>>>>>>>>>>>>>> ExM-user at lists.mcs.anl.gov
>>>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   --
>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140729/a3e41b21/attachment-0001.html>


More information about the ExM-user mailing list