[Swift-devel] [Bug 275] New: Document the Swift Hang Checker and improve its messages

bugzilla-daemon at mcs.anl.gov bugzilla-daemon at mcs.anl.gov
Sat Mar 26 07:03:46 CDT 2011


https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=275

           Summary: Document the Swift Hang Checker and improve its
                    messages
           Product: Swift
           Version: 0.93
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: ketan at mcs.anl.gov
        ReportedBy: wilde at mcs.anl.gov


was: [Swift-devel] Re: Workflow waiting on condition hang

I missed this when it was announced Mar 6 (email below). Sounds very useful. 

We should add a User Guide entry for this, with a few Swift deadlock examples
and show users how to use the information to identify and correct the deadlock.

How close to the Swift source code can we make the hang-checker messages, so
that the user can relate it to Swift functions, expressions, and ideally source
code lines?

Ketan, please add this to the list of "cookbook" entries to merge into the User
Guide, and I will file it in bugzilla.

- Mike



The current Hang Checker output is actually *very* nice and useful already:

Registered futures:
Rupture[] rups  Closed, 1 elements, 0 listeners
Variation vars - Closed, no listeners
SgtDim sub - Open, 1 listeners
string site  Closed, no listeners
Variation[] vars  Closed, 72 elements, 0 listeners

Is it possible (and sensible) to add to this a dump or summary of the current
Swift threads and the function call or expression they are running?

Eg, from the output above, would one conclude that there is only one function
hanging at the moment in this code:

  SgtDim sub - Open, 1 listeners

Would knowing what expression (and line of code) is waiting on the variable
"sub" be helpful? And possible to print?

- Mike


----- Forwarded Message -----
From: "Mihael Hategan" <hategan at mcs.anl.gov>
To: "Jonathan Monette" <jon.monette at gmail.com>
Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
Sent: Sunday, March 6, 2011 3:46:44 PM
Subject: [Swift-devel] Re: Workflow waiting on condition hang

Given that this does not seem to be a java deadlock, I added a hang
checker to swift. If nothing is being executed inside karajan and no
jobs are running in any ten second interval, it will dump future and
thread information to the log file.

This is in swift trunk r4171.

Can you give that a try and report back the details?

Mihael

On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote:
> Yes.  It always seems to hang at the same place.
> 
> Attached is my montage script.  It hangs in the mFitBatch function at 
> the mConcatFit app call.  All other files have been created up to that 
> step but that app never runs.
> 
> On 2/17/11 3:39 PM, Mihael Hategan wrote:
> > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote:
> >> Hello,
> >>       My workflow seems to be hanging.  This is trunk swift-r4107 and
> >> cog-r3051.  Attached is a compressed log file and the jstack output for
> >> my workflow.  The jstack file says it is waiting for a condition and my
> >> workflow hangs.
> > There's lots of stuff waiting because that's what they do when they
> > don't have anything else to do. So I don't see a problem there.
> >
> > There are no jobs going to the coaster service, so clearly things aren't
> > progressing.
> >
> > So now the question is: does this happen every time you run it or just
> > some times?
> >
> > Also, please send the swift script.
> >
> > Mihael
> >
> >
============

Here is an example of its current output:




----- Forwarded Message -----
From: "Allan Espinosa" <aespinosa at cs.uchicago.edu>
To: "swift-devel" <swift-devel at ci.uchicago.edu>
Sent: Friday, March 25, 2011 7:42:30 PM
Subject: [Swift-devel] hang checker fun

this has been occurring for 70 times already.  What i expect is for
the app with SgtDim sub to run and close the future.

2011-03-25 19:40:12,217-0500 WARN  HangChecker No events in 10s.
2011-03-25 19:40:12,217-0500 WARN  HangChecker
Registered futures:
Rupture[] rups  Closed, 1 elements, 0 listeners
Variation vars - Closed, no listeners
SgtDim sub - Open, 1 listeners
string site  Closed, no listeners
Variation[] vars  Closed, 72 elements, 0 listeners
----

Waiting threads:
0-13
0-13-0-7
0-13-0-8-1-1
----


-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory

----------------

and more:

-- 
Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the reporter.



More information about the Swift-devel mailing list