[ExM Users] debugging suggestions for non-static main-wrap segfault
Ketan Maheshwari
ketan at mcs.anl.gov
Tue Jul 29 16:01:16 CDT 2014
I tried a minimal tcl and find the segfault occurs at:
package require leaf_main 0.0
On Tue, Jul 29, 2014 at 3:39 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
wrote:
> Well, anyway, let's backtrack. The stacktrace already told us that
> the segfault is happening in a package require statement.
>
> I compiled apps/dock/user-code.swift and looked at the code. There are
> two package requires:
>
> package require turbine 0.5.0
> package require leaf_main 0.0
>
> They are up the top before anything else really runs. So if the problem
> is in loading one of those packages, whatever happens later is irrelevant.
>
> So how about just running user-code.tcl, or even creating a minimal tcl
> file with those two package require lines.
>
> You may need to set TCLLIBPATH (a space-separated list:
> http://www.tcl.tk/man/tcl8.6/TclCmd/library.htm#M27) to the directories
> with the turbine and leaf_main pkgIndex.tcl files.
>
> - Tim
>
>
> On Tue, Jul 29, 2014 at 3:16 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
> wrote:
>
>> We are trying to narrow down the cause of segfault by running the tcl out
>> of turbine thus getting rid of the swift/T/tcl and turbine script. I
>> suppose this is the tcl script that gets invoked which in turn invokes the
>> application.
>>
>>
>> On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong <
>> tim.g.armstrong at gmail.com> wrote:
>>
>>> proc just defines the functions. You need to call them for it to run.
>>>
>>> What are we trying to achieve by running this file anyway? This look
>>> like a set of library functions rather than the entry point for a script.
>>>
>>> - Tim
>>>
>>>
>>> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>> wrote:
>>>
>>>> Here is the tcl script with puts messages:
>>>>
>>>> package provide leaf_main 0.0
>>>>
>>>> # dnl Receive USER_LEAF from environment for m4 processing
>>>> set USER_LEAF dock_wrap
>>>> puts hello1
>>>>
>>>> namespace eval leaf_main {
>>>> puts hello2
>>>>
>>>> proc leaf_main_wrap { rc A } {
>>>> deeprule $A 1 0 "leaf_main::leaf_main_wrap_impl $rc $A" type
>>>> $::turbine::WORK
>>>> }
>>>>
>>>> proc leaf_main_wrap_impl { rc A } {
>>>>
>>>> global USER_LEAF
>>>>
>>>> set length [ adlb::container_size $A ]
>>>> set tds [ adlb::enumerate $A dict all 0 ]
>>>> set argv [ list ]
>>>>
>>>> puts hello3
>>>>
>>>> # Fill argv with blanks
>>>> dict for { i v } $tds {
>>>> lappend argv 0
>>>> }
>>>> # Set values at ordered list positions
>>>> dict for { i v } $tds {
>>>> lset argv $i $v
>>>> }
>>>> set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>>>> turbine::store_integer $rc $rc_value
>>>> puts hello4
>>>> }
>>>> puts hello5
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> It prints:
>>>>
>>>> hello1
>>>> hello2
>>>> hello5
>>>>
>>>> I see that it is not going in the proc_leaf_main_wrap_impl but I am
>>>> not familiar enough with TCL to understand why.
>>>>
>>>>
>>>>
>>>> On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong <
>>>> tim.g.armstrong at gmail.com> wrote:
>>>>
>>>>> I don't see any reason why that invocation of tclsh would silently
>>>>> fail to run the tcl script. Have you attempted to confirm your hypothesis
>>>>> that it's not running the script, for example by modifying the script to
>>>>> print something at the beginning or end?
>>>>>
>>>>>
>>>>> On Tue, Jul 29, 2014 at 1:42 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>> I expect it to run the application or crash on segfault. Nothing
>>>>>> happens.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 29, 2014 at 1:39 PM, Tim Armstrong <
>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>
>>>>>>> That looks right, it should run dock_wrap.tcl fine. And it runs
>>>>>>> successfully to completion with no output? Is that what you expected it to
>>>>>>> do?
>>>>>>>
>>>>>>> Backtracking to your original problem, if you could work out which
>>>>>>> "package require" statement was failing and provide some info about that
>>>>>>> package it might help understand the issue.
>>>>>>>
>>>>>>> - Tim
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 29, 2014 at 1:32 PM, Ketan Maheshwari <
>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>>> I run tclsh as follows:
>>>>>>>>
>>>>>>>> /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i rigid.in
>>>>>>>>
>>>>>>>> and
>>>>>>>>
>>>>>>>> mpiexec -n 3 /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl
>>>>>>>> -i rigid.in
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 29, 2014 at 1:28 PM, Tim Armstrong <
>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I forgot to reply all earlier, re-including the list.
>>>>>>>>>
>>>>>>>>> How are you running tclsh?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 29, 2014 at 11:53 AM, Ketan Maheshwari <
>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>>> when I try tclsh, it does not do anything. Just returns with an
>>>>>>>>>> exit status 0.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 29, 2014 at 11:02 AM, Tim Armstrong <
>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> You can run it directly with tclsh or mpiexec tclsh, which is
>>>>>>>>>>> what turbine eventually does after setting up environment variables, etc.
>>>>>>>>>>>
>>>>>>>>>>> - Tim
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:57 AM, Ketan Maheshwari <
>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Is it possible to run the dock_wrap.tcl outside of turbine just
>>>>>>>>>>>> as in the case of static build?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:45 AM, Wozniak, Justin M. <
>>>>>>>>>>>> wozniak at mcs.anl.gov> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, it's in. The Swift/K SVN is apparently down so it's not
>>>>>>>>>>>>> on the web yet but see the asciidoc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/29/2014 10:21 AM, Justin M Wozniak wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I thought VALGRIND was in the manual already but it isn't. I
>>>>>>>>>>>>> will add it now. I will also talk about our GDB feature.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/29/2014 10:17 AM, Ketan Maheshwari wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks! Seems turbine script already had a placeholder for
>>>>>>>>>>>>> Valgrind so I tried that and from the output, it seems tcl libraries are
>>>>>>>>>>>>> causing segfault but I may be wrong. Attached is the Valgrind output.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:05 AM, Tim Armstrong <
>>>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't have any particular insight into the cause of the
>>>>>>>>>>>>>> segfault, I can help with the debugger though.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You need to point gdb at the tclsh that is being used by
>>>>>>>>>>>>>> turbine (which is just a shell script). You can locate the correct tclsh
>>>>>>>>>>>>>> by looking at TCLSH in scripts/turbine-config.sh in the turbine install
>>>>>>>>>>>>>> directory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - TIm
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:00 AM, Ketan Maheshwari <
>>>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Trying to main-wrap DOCK 6.6 application for ATPESC, I get
>>>>>>>>>>>>>>> the build right (seems) but things fail at runtime giving segfault:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>>>> = EXIT CODE: 139
>>>>>>>>>>>>>>> = CLEANING UP REMAINING PROCESSES
>>>>>>>>>>>>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING:
>>>>>>>>>>>>>>> Segmentation fault (signal 11)
>>>>>>>>>>>>>>> This typically refers to a problem with your application.
>>>>>>>>>>>>>>> Please see the FAQ page for debugging suggestions
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is on MCS machine. Any suggestion to debug this? I
>>>>>>>>>>>>>>> tried gdb but it gives:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "/nfs2/ketan/exm-install/turbine/bin/turbine": not in
>>>>>>>>>>>>>>> executable format: File format not recognized
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With strace, I see some signs of missing files but not
>>>>>>>>>>>>>>> sure if that is the cause of segfault. Attached is the strace output of:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> strace -o strace.out turbine -n 4 user-code.tcl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The code has some MPI and pthread elements but does not
>>>>>>>>>>>>>>> use them as far as I understand.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for any suggestions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> ExM-user mailing list
>>>>>>>>>>>>>>> ExM-user at lists.mcs.anl.gov
>>>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140729/4a1ad0b1/attachment-0001.html>
More information about the ExM-user
mailing list