[Swift-devel] [Bug 275] New: Document the Swift Hang Checker and improve its messages
bugzilla-daemon at mcs.anl.gov
bugzilla-daemon at mcs.anl.gov
Sat Mar 26 07:03:46 CDT 2011
https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=275
Summary: Document the Swift Hang Checker and improve its
messages
Product: Swift
Version: 0.93
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Documentation
AssignedTo: ketan at mcs.anl.gov
ReportedBy: wilde at mcs.anl.gov
was: [Swift-devel] Re: Workflow waiting on condition hang
I missed this when it was announced Mar 6 (email below). Sounds very useful.
We should add a User Guide entry for this, with a few Swift deadlock examples
and show users how to use the information to identify and correct the deadlock.
How close to the Swift source code can we make the hang-checker messages, so
that the user can relate it to Swift functions, expressions, and ideally source
code lines?
Ketan, please add this to the list of "cookbook" entries to merge into the User
Guide, and I will file it in bugzilla.
- Mike
The current Hang Checker output is actually *very* nice and useful already:
Registered futures:
Rupture[] rups Closed, 1 elements, 0 listeners
Variation vars - Closed, no listeners
SgtDim sub - Open, 1 listeners
string site Closed, no listeners
Variation[] vars Closed, 72 elements, 0 listeners
Is it possible (and sensible) to add to this a dump or summary of the current
Swift threads and the function call or expression they are running?
Eg, from the output above, would one conclude that there is only one function
hanging at the moment in this code:
SgtDim sub - Open, 1 listeners
Would knowing what expression (and line of code) is waiting on the variable
"sub" be helpful? And possible to print?
- Mike
----- Forwarded Message -----
From: "Mihael Hategan" <hategan at mcs.anl.gov>
To: "Jonathan Monette" <jon.monette at gmail.com>
Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
Sent: Sunday, March 6, 2011 3:46:44 PM
Subject: [Swift-devel] Re: Workflow waiting on condition hang
Given that this does not seem to be a java deadlock, I added a hang
checker to swift. If nothing is being executed inside karajan and no
jobs are running in any ten second interval, it will dump future and
thread information to the log file.
This is in swift trunk r4171.
Can you give that a try and report back the details?
Mihael
On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote:
> Yes. It always seems to hang at the same place.
>
> Attached is my montage script. It hangs in the mFitBatch function at
> the mConcatFit app call. All other files have been created up to that
> step but that app never runs.
>
> On 2/17/11 3:39 PM, Mihael Hategan wrote:
> > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote:
> >> Hello,
> >> My workflow seems to be hanging. This is trunk swift-r4107 and
> >> cog-r3051. Attached is a compressed log file and the jstack output for
> >> my workflow. The jstack file says it is waiting for a condition and my
> >> workflow hangs.
> > There's lots of stuff waiting because that's what they do when they
> > don't have anything else to do. So I don't see a problem there.
> >
> > There are no jobs going to the coaster service, so clearly things aren't
> > progressing.
> >
> > So now the question is: does this happen every time you run it or just
> > some times?
> >
> > Also, please send the swift script.
> >
> > Mihael
> >
> >
============
Here is an example of its current output:
----- Forwarded Message -----
From: "Allan Espinosa" <aespinosa at cs.uchicago.edu>
To: "swift-devel" <swift-devel at ci.uchicago.edu>
Sent: Friday, March 25, 2011 7:42:30 PM
Subject: [Swift-devel] hang checker fun
this has been occurring for 70 times already. What i expect is for
the app with SgtDim sub to run and close the future.
2011-03-25 19:40:12,217-0500 WARN HangChecker No events in 10s.
2011-03-25 19:40:12,217-0500 WARN HangChecker
Registered futures:
Rupture[] rups Closed, 1 elements, 0 listeners
Variation vars - Closed, no listeners
SgtDim sub - Open, 1 listeners
string site Closed, no listeners
Variation[] vars Closed, 72 elements, 0 listeners
----
Waiting threads:
0-13
0-13-0-7
0-13-0-8-1-1
----
--
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
----------------
and more:
--
Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the reporter.
More information about the Swift-devel
mailing list