[ExM Users] debugging suggestions for non-static main-wrap segfault

Ketan Maheshwari ketan at mcs.anl.gov
Tue Jul 29 15:16:58 CDT 2014


We are trying to narrow down the cause of segfault by running the tcl out
of turbine thus getting rid of the swift/T/tcl and turbine script. I
suppose this is the tcl script that gets invoked which in turn invokes the
application.


On Tue, Jul 29, 2014 at 3:14 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
wrote:

>  proc just defines the functions. You need to call them for it to run.
>
>  What are we trying to achieve by running this file anyway?  This look
> like a set of library functions rather than the entry point for a script.
>
>  - Tim
>
>
> On Tue, Jul 29, 2014 at 3:08 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
> wrote:
>
>> Here is the tcl script with puts messages:
>>
>>  package provide leaf_main 0.0
>>
>>  # dnl Receive USER_LEAF from environment for m4 processing
>> set USER_LEAF dock_wrap
>> puts hello1
>>
>>  namespace eval leaf_main {
>> puts hello2
>>
>>      proc leaf_main_wrap { rc A } {
>>     deeprule $A 1 0 "leaf_main::leaf_main_wrap_impl $rc $A" type
>> $::turbine::WORK
>>     }
>>
>>      proc leaf_main_wrap_impl { rc A } {
>>
>>          global USER_LEAF
>>
>>          set length [ adlb::container_size $A ]
>>         set tds [ adlb::enumerate $A dict all 0 ]
>>         set argv [ list ]
>>
>>          puts hello3
>>
>>          # Fill argv with blanks
>>         dict for { i v } $tds {
>>             lappend argv 0
>>         }
>>         # Set values at ordered list positions
>>         dict for { i v } $tds {
>>             lset argv $i $v
>>         }
>>         set rc_value [ ${USER_LEAF}_extension {*}$argv ]
>>         turbine::store_integer $rc $rc_value
>>         puts hello4
>>     }
>>     puts hello5
>> }
>>
>>
>>
>>
>>  It prints:
>>
>>  hello1
>> hello2
>> hello5
>>
>>  I see that it is not going in the proc_leaf_main_wrap_impl but I am not
>> familiar enough with TCL to understand why.
>>
>>
>>
>>  On Tue, Jul 29, 2014 at 2:41 PM, Tim Armstrong <
>> tim.g.armstrong at gmail.com> wrote:
>>
>>>  I don't see any reason why that invocation of tclsh would silently
>>> fail to run the tcl script.  Have you attempted to confirm your hypothesis
>>> that it's not running the script, for example by modifying the script to
>>> print something at the beginning or end?
>>>
>>>
>>> On Tue, Jul 29, 2014 at 1:42 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>> wrote:
>>>
>>>> I expect it to run the application or crash on segfault. Nothing
>>>> happens.
>>>>
>>>>
>>>>
>>>>  On Tue, Jul 29, 2014 at 1:39 PM, Tim Armstrong <
>>>> tim.g.armstrong at gmail.com> wrote:
>>>>
>>>>>   That looks right, it should run dock_wrap.tcl fine.  And it runs
>>>>> successfully to completion with no output?  Is that what you expected it to
>>>>> do?
>>>>>
>>>>>  Backtracking to your original problem, if you could work out which
>>>>> "package require" statement was failing and provide some info about that
>>>>> package it might help understand the issue.
>>>>>
>>>>>  - Tim
>>>>>
>>>>>
>>>>>  On Tue, Jul 29, 2014 at 1:32 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>>  I run tclsh as follows:
>>>>>>
>>>>>>  /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i rigid.in
>>>>>>
>>>>>>  and
>>>>>>
>>>>>>  mpiexec -n 3 /home/ketan/tcl-install/bin/tclsh8.5 dock_wrap.tcl -i
>>>>>> rigid.in
>>>>>>
>>>>>>
>>>>>>  On Tue, Jul 29, 2014 at 1:28 PM, Tim Armstrong <
>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>
>>>>>>>   I forgot to reply all earlier, re-including the list.
>>>>>>>
>>>>>>>  How are you running tclsh?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 29, 2014 at 11:53 AM, Ketan Maheshwari <
>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>>> when I try tclsh, it does not do anything. Just returns with an
>>>>>>>> exit status 0.
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Tue, Jul 29, 2014 at 11:02 AM, Tim Armstrong <
>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>   You can run it directly with tclsh or mpiexec tclsh, which is
>>>>>>>>> what turbine eventually does after setting up environment variables, etc.
>>>>>>>>>
>>>>>>>>>  - Tim
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 29, 2014 at 10:57 AM, Ketan Maheshwari <
>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>
>>>>>>>>>> Is it possible to run the dock_wrap.tcl outside of turbine just
>>>>>>>>>> as in the case of static build?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:45 AM, Wozniak, Justin M. <
>>>>>>>>>> wozniak at mcs.anl.gov> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ok, it's in.  The Swift/K SVN is apparently down so it's not on
>>>>>>>>>>> the web yet but see the asciidoc.
>>>>>>>>>>>
>>>>>>>>>>> On 07/29/2014 10:21 AM, Justin M Wozniak wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I thought VALGRIND was in the manual already but it isn't.  I
>>>>>>>>>>> will add it now.  I will also talk about our GDB feature.
>>>>>>>>>>>
>>>>>>>>>>> On 07/29/2014 10:17 AM, Ketan Maheshwari wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks! Seems turbine script already had a placeholder for
>>>>>>>>>>> Valgrind so I tried that and from the output, it seems tcl libraries are
>>>>>>>>>>> causing segfault but I may be wrong. Attached is the Valgrind output.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 29, 2014 at 10:05 AM, Tim Armstrong <
>>>>>>>>>>> tim.g.armstrong at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  I don't have any particular insight into the cause of the
>>>>>>>>>>>> segfault, I can help with the debugger though.
>>>>>>>>>>>>
>>>>>>>>>>>> You need to point gdb at the tclsh that is being used by
>>>>>>>>>>>> turbine (which is just a shell script).  You can locate the correct tclsh
>>>>>>>>>>>> by looking at TCLSH in scripts/turbine-config.sh in the turbine install
>>>>>>>>>>>> directory.
>>>>>>>>>>>>
>>>>>>>>>>>>  - TIm
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  On Tue, Jul 29, 2014 at 10:00 AM, Ketan Maheshwari <
>>>>>>>>>>>> ketan at mcs.anl.gov> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Trying to main-wrap DOCK 6.6 application for ATPESC, I get
>>>>>>>>>>>>> the build right (seems) but things fail at runtime giving segfault:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  $ turbine -n 4 user-code.tcl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>>>>>>>>>>> =   EXIT CODE: 139
>>>>>>>>>>>>> =   CLEANING UP REMAINING PROCESSES
>>>>>>>>>>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>>>>>>>>>>>
>>>>>>>>>>>>> ===================================================================================
>>>>>>>>>>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>>>>>>>>>>>> fault (signal 11)
>>>>>>>>>>>>> This typically refers to a problem with your application.
>>>>>>>>>>>>> Please see the FAQ page for debugging suggestions
>>>>>>>>>>>>>
>>>>>>>>>>>>>  This is on MCS machine. Any suggestion to debug this? I
>>>>>>>>>>>>> tried gdb but it gives:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   "/nfs2/ketan/exm-install/turbine/bin/turbine": not in
>>>>>>>>>>>>> executable format: File format not recognized
>>>>>>>>>>>>>
>>>>>>>>>>>>>  With strace, I see some signs of missing files but not sure
>>>>>>>>>>>>> if that is the cause of segfault. Attached is the strace output of:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  strace -o strace.out turbine -n 4 user-code.tcl
>>>>>>>>>>>>>
>>>>>>>>>>>>>  The code has some MPI and pthread elements but does not use
>>>>>>>>>>>>> them as far as I understand.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Thanks for any suggestions.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  --
>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>>
>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>> ExM-user mailing list
>>>>>>>>>>>>> ExM-user at lists.mcs.anl.gov
>>>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ExM-user mailing listExM-user at lists.mcs.anl.govhttps://lists.mcs.anl.gov/mailman/listinfo/exm-user
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>   --
>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Justin M Wozniak
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140729/54115a3f/attachment-0001.html>


More information about the ExM-user mailing list