[Swift-devel] swift 0.93 deadlock

Michael Wilde wilde at mcs.anl.gov
Thu Sep 15 12:37:13 CDT 2011


Excellent, thanks - thats good.  I also just verified that Papia is not using the overAllocation tags in the sites file, so this problem is clearly a Java deadlock and has nothing to do with the scheduling problem that the (now fixed) overAllocation problem was causing..

My understanding is that this SWAT script is failing under trunk because of the recent token case handling issue (I think the camel-case one). Can you work with Papia to see if either that issue is now fixed, or if her script can be changed to avoid that, so that you can both test the SWAT script with trunk, to see if the deadlock still occurs?

Thanks,

- MIke


----- Original Message -----
> From: "David Kelly" <davidk at ci.uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> <hategan at mcs.anl.gov>
> Sent: Thursday, September 15, 2011 12:29:03 PM
> Subject: Re: [Swift-devel] swift 0.93 deadlock
> I narrowed down the problem a bit. Last night I ran jstack on the
> wrong java process which is why it didn't report a deadlock.
> 
> Papia and I are seeing the same issue.
> 
> My jstack: http://www.ci.uchicago.edu/~davidk/swat2/jstack.log
> Papia's jstack:
> http://www.ci.uchicago.edu/~davidk/swat2/papia-jstack.log
> 
> It happens in the same place:
> 
> org.griphyn.vdl.karajan.lib.cache.File.lock(File.java:100)
> org.griphyn.vdl.karajan.lib.cache.LRUFileCache.addAndLockEntry(LRUFileCache.java:24)
> 
> Filed as bug #559
> 
> David
> 
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "David Kelly" <davidk at ci.uchicago.edu>
> > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > <hategan at mcs.anl.gov>
> > Sent: Thursday, September 15, 2011 11:46:59 AM
> > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > David, it sounds like more analysis is needed here. If the SWAT runs
> > are not showing a deadlock (but your runs are) then likely we have
> > two
> > different problems here.
> >
> > Another case we saw in 0.93 with scripts failing to progress is due
> > to
> > the overAllocation parameter problem that Mihael fixed yesterday.
> > The
> > symptom there is that Swift starts a coaster with a time slot too
> > small for the apps in the script, and no apps wind up running. I
> > think
> > that situation in general merits a separate ticket, and may have
> > been
> > discussed on swift-devel (but quite a while ago).
> >
> > Can you determine if indeed Papia's SWAT runs are hanging for a
> > reason
> > other than a Java deadlock?
> >
> > - Mike
> >
> >
> > ----- Original Message -----
> > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > <hategan at mcs.anl.gov>
> > > Sent: Thursday, September 15, 2011 8:03:09 AM
> > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > The jstack log corresponds to the most recent log file -
> > > http://www.ci.uchicago.edu/~davidk/swat/cce_ua-20110914-1934-frd3thja.log.
> > > jstack does not report any deadlocks, but I thought it might be
> > > useful
> > > so I included it. Swift was not making any progress for about 5
> > > hours
> > > before I sent the logs. I am running the latest 0.93 branch. I
> > > will
> > > try again today.
> > >
> > > David
> > >
> > > ----- Original Message -----
> > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > <hategan at mcs.anl.gov>
> > > > Sent: Thursday, September 15, 2011 5:54:11 AM
> > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > David, which of the many Swift logs in that /swat dir does the
> > > > jstack.log pertain to? How many of these runs deadlocked?
> > > >
> > > > And, did you verify that you (and Papia) are running on the
> > > > latest
> > > > rev
> > > > of the 0.93 branch?
> > > >
> > > > - Mike
> > > >
> > > > ----- Original Message -----
> > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > > > Rizwan" <papia.rizwan at gmail.com>, "Michael Wilde"
> > > > > <wilde at mcs.anl.gov>
> > > > > Sent: Wednesday, September 14, 2011 11:04:41 PM
> > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > I was able to reproduce the problem with persistent coasters
> > > > > on
> > > > > the
> > > > > MCS servers.
> > > > >
> > > > > The jstack output is at
> > > > > http://www.ci.uchicago.edu/~davidk/swat/jstack.log
> > > > >
> > > > > The full collection of logs are at
> > > > > http://www.ci.uchicago.edu/~davidk/swat.
> > > > >
> > > > > David
> > > > >
> > > > > ----- Original Message -----
> > > > > > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > > > > "Papia
> > > > > > Rizwan" <papia.rizwan at gmail.com>
> > > > > > Sent: Wednesday, September 14, 2011 10:30:48 PM
> > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > Could you also forward the attachments please?
> > > > > >
> > > > > > Mihael
> > > > > >
> > > > > > On Wed, 2011-09-14 at 14:46 -0500, Michael Wilde wrote:
> > > > > > > I think I am seeing a similar deadlock on 0.93 in the
> > > > > > > ParVis
> > > > > > > script,
> > > > > > > and am trying to get a clean log and jstack to confirm.
> > > > > > >
> > > > > > > As far as I can tell, Papia is running the correct 0.93
> > > > > > > code,
> > > > > > > but
> > > > > > > please verify.
> > > > > > >
> > > > > > > David will try to replicate this problem as well.
> > > > > > >
> > > > > > > - Mike
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > > From: "Papia Rizwan" <papia.rizwan at gmail.com>
> > > > > > > > To: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > > > > > > "Michael
> > > > > > > > Wilde" <wilde at mcs.anl.gov>, "Michael P. Shields"
> > > > > > > > <mpshields at anl.gov>
> > > > > > > > Sent: Wednesday, September 14, 2011 1:56:13 PM
> > > > > > > > Subject: swift 0.93 deadlock
> > > > > > > > Attached are the jstack output and the log file.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Papia Rizwan
> > > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Swift-devel mailing list
> > > > > > Swift-devel at ci.uchicago.edu
> > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >
> > > > --
> > > > Michael Wilde
> > > > Computation Institute, University of Chicago
> > > > Mathematics and Computer Science Division
> > > > Argonne National Laboratory
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list