[Swift-devel] Swift hang checker

Michael Wilde wilde at mcs.anl.gov
Sat Mar 26 06:44:47 CDT 2011


was: [Swift-devel] Re: Workflow waiting on condition hang

I missed this when it was announced Mar 6 (email below). Sounds very useful. 

We should add a User Guide entry for this, with a few Swift deadlock examples and show users how to use the information to identify and correct the deadlock.

How close to the Swift source code can we make the hang-checker messages, so that the user can relate it to Swift functions, expressions, and ideally source code lines?

Ketan, please add this to the list of "cookbook" entries to merge into the User Guide, and I will file it in bugzilla.

- Mike



----- Forwarded Message -----
From: "Mihael Hategan" <hategan at mcs.anl.gov>
To: "Jonathan Monette" <jon.monette at gmail.com>
Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
Sent: Sunday, March 6, 2011 3:46:44 PM
Subject: [Swift-devel] Re: Workflow waiting on condition hang

Given that this does not seem to be a java deadlock, I added a hang
checker to swift. If nothing is being executed inside karajan and no
jobs are running in any ten second interval, it will dump future and
thread information to the log file.

This is in swift trunk r4171.

Can you give that a try and report back the details?

Mihael

On Sat, 2011-02-19 at 14:54 -0600, Jonathan Monette wrote:
> Yes.  It always seems to hang at the same place.
> 
> Attached is my montage script.  It hangs in the mFitBatch function at 
> the mConcatFit app call.  All other files have been created up to that 
> step but that app never runs.
> 
> On 2/17/11 3:39 PM, Mihael Hategan wrote:
> > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote:
> >> Hello,
> >>       My workflow seems to be hanging.  This is trunk swift-r4107 and
> >> cog-r3051.  Attached is a compressed log file and the jstack output for
> >> my workflow.  The jstack file says it is waiting for a condition and my
> >> workflow hangs.
> > There's lots of stuff waiting because that's what they do when they
> > don't have anything else to do. So I don't see a problem there.
> >
> > There are no jobs going to the coaster service, so clearly things aren't
> > progressing.
> >
> > So now the question is: does this happen every time you run it or just
> > some times?
> >
> > Also, please send the swift script.
> >
> > Mihael
> >
> >


_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list