[Swift-devel] swift 0.93 deadlock

David Kelly davidk at ci.uchicago.edu
Thu Sep 15 12:29:03 CDT 2011


I narrowed down the problem a bit. Last night I ran jstack on the wrong java process which is why it didn't report a deadlock.

Papia and I are seeing the same issue.

My jstack: http://www.ci.uchicago.edu/~davidk/swat2/jstack.log
Papia's jstack: http://www.ci.uchicago.edu/~davidk/swat2/papia-jstack.log

It happens in the same place:

org.griphyn.vdl.karajan.lib.cache.File.lock(File.java:100)
org.griphyn.vdl.karajan.lib.cache.LRUFileCache.addAndLockEntry(LRUFileCache.java:24)

Filed as bug #559

David

----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> <hategan at mcs.anl.gov>
> Sent: Thursday, September 15, 2011 11:46:59 AM
> Subject: Re: [Swift-devel] swift 0.93 deadlock
> David, it sounds like more analysis is needed here. If the SWAT runs
> are not showing a deadlock (but your runs are) then likely we have two
> different problems here.
> 
> Another case we saw in 0.93 with scripts failing to progress is due to
> the overAllocation parameter problem that Mihael fixed yesterday. The
> symptom there is that Swift starts a coaster with a time slot too
> small for the apps in the script, and no apps wind up running. I think
> that situation in general merits a separate ticket, and may have been
> discussed on swift-devel (but quite a while ago).
> 
> Can you determine if indeed Papia's SWAT runs are hanging for a reason
> other than a Java deadlock?
> 
> - Mike
> 
> 
> ----- Original Message -----
> > From: "David Kelly" <davidk at ci.uchicago.edu>
> > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > <hategan at mcs.anl.gov>
> > Sent: Thursday, September 15, 2011 8:03:09 AM
> > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > The jstack log corresponds to the most recent log file -
> > http://www.ci.uchicago.edu/~davidk/swat/cce_ua-20110914-1934-frd3thja.log.
> > jstack does not report any deadlocks, but I thought it might be
> > useful
> > so I included it. Swift was not making any progress for about 5
> > hours
> > before I sent the logs. I am running the latest 0.93 branch. I will
> > try again today.
> >
> > David
> >
> > ----- Original Message -----
> > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > <hategan at mcs.anl.gov>
> > > Sent: Thursday, September 15, 2011 5:54:11 AM
> > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > David, which of the many Swift logs in that /swat dir does the
> > > jstack.log pertain to? How many of these runs deadlocked?
> > >
> > > And, did you verify that you (and Papia) are running on the latest
> > > rev
> > > of the 0.93 branch?
> > >
> > > - Mike
> > >
> > > ----- Original Message -----
> > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > > Rizwan" <papia.rizwan at gmail.com>, "Michael Wilde"
> > > > <wilde at mcs.anl.gov>
> > > > Sent: Wednesday, September 14, 2011 11:04:41 PM
> > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > I was able to reproduce the problem with persistent coasters on
> > > > the
> > > > MCS servers.
> > > >
> > > > The jstack output is at
> > > > http://www.ci.uchicago.edu/~davidk/swat/jstack.log
> > > >
> > > > The full collection of logs are at
> > > > http://www.ci.uchicago.edu/~davidk/swat.
> > > >
> > > > David
> > > >
> > > > ----- Original Message -----
> > > > > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > > > Rizwan" <papia.rizwan at gmail.com>
> > > > > Sent: Wednesday, September 14, 2011 10:30:48 PM
> > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > Could you also forward the attachments please?
> > > > >
> > > > > Mihael
> > > > >
> > > > > On Wed, 2011-09-14 at 14:46 -0500, Michael Wilde wrote:
> > > > > > I think I am seeing a similar deadlock on 0.93 in the ParVis
> > > > > > script,
> > > > > > and am trying to get a clean log and jstack to confirm.
> > > > > >
> > > > > > As far as I can tell, Papia is running the correct 0.93
> > > > > > code,
> > > > > > but
> > > > > > please verify.
> > > > > >
> > > > > > David will try to replicate this problem as well.
> > > > > >
> > > > > > - Mike
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > From: "Papia Rizwan" <papia.rizwan at gmail.com>
> > > > > > > To: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > > > > > "Michael
> > > > > > > Wilde" <wilde at mcs.anl.gov>, "Michael P. Shields"
> > > > > > > <mpshields at anl.gov>
> > > > > > > Sent: Wednesday, September 14, 2011 1:56:13 PM
> > > > > > > Subject: swift 0.93 deadlock
> > > > > > > Attached are the jstack output and the log file.
> > > > > > >
> > > > > > > --
> > > > > > > Papia Rizwan
> > > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > > --
> > > Michael Wilde
> > > Computation Institute, University of Chicago
> > > Mathematics and Computer Science Division
> > > Argonne National Laboratory
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory



More information about the Swift-devel mailing list