[Swift-devel] Swift hang checker

Michael Wilde wilde at mcs.anl.gov
Sat Mar 26 07:02:42 CDT 2011


As I was filing the bug I realized I mis-took Allan's posting of jstack output for the Hang Checker.

The current Hang Checker output is actually *very* nice and useful already:

Registered futures:
Rupture[] rups  Closed, 1 elements, 0 listeners
Variation vars - Closed, no listeners
SgtDim sub - Open, 1 listeners
string site  Closed, no listeners
Variation[] vars  Closed, 72 elements, 0 listeners

Is it possible (and sensible) to add to this a dump or summary of the current Swift threads and the function call or expression they are running?

Eg, from the output above, would one conclude that there is only one function hanging at the moment in this code:

  SgtDim sub - Open, 1 listeners

Would knowing what expression (and line of code) is waiting on the variable "sub" be helpful? And possible to print?

- Mike





----- Original Message -----
> was: [Swift-devel] Re: Workflow waiting on condition hang
> 
> I missed this when it was announced Mar 6 (email below). Sounds very
> useful.
> 
> We should add a User Guide entry for this, with a few Swift deadlock
> examples and show users how to use the information to identify and
> correct the deadlock.
> 
> How close to the Swift source code can we make the hang-checker
> messages, so that the user can relate it to Swift functions,
> expressions, and ideally source code lines?
> 
> Ketan, please add this to the list of "cookbook" entries to merge into
> the User Guide, and I will file it in bugzilla.
> 
> - Mike
> 
> 
> 
> ----- Forwarded Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "Jonathan Monette" <jon.monette at gmail.com>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Sunday, March 6, 2011 3:46:44 PM
> Subject: [Swift-devel] Re: Workflow waiting on condition hang
> 
> Given that this does not seem to be a java deadlock, I added a hang
> checker to swift. If nothing is being executed inside karajan and no
> jobs are running in any ten second interval, it will dump future and
> thread information to the log file.
> 
> This is in swift trunk r4171.
> 
> Can you give that a try and report back the details?
> 
> Mihael
> 
> On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote:
> > Yes. It always seems to hang at the same place.
> >
> > Attached is my montage script. It hangs in the mFitBatch function at
> > the mConcatFit app call. All other files have been created up to
> > that
> > step but that app never runs.
> >
> > On 2/17/11 3:39 PM, Mihael Hategan wrote:
> > > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote:
> > >> Hello,
> > >>       My workflow seems to be hanging. This is trunk swift-r4107
> > >>       and
> > >> cog-r3051. Attached is a compressed log file and the jstack
> > >> output for
> > >> my workflow. The jstack file says it is waiting for a condition
> > >> and my
> > >> workflow hangs.
> > > There's lots of stuff waiting because that's what they do when
> > > they
> > > don't have anything else to do. So I don't see a problem there.
> > >
> > > There are no jobs going to the coaster service, so clearly things
> > > aren't
> > > progressing.
> > >
> > > So now the question is: does this happen every time you run it or
> > > just
> > > some times?
> > >
> > > Also, please send the swift script.
> > >
> > > Mihael
> > >
> > >
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list