[Swift-devel] swift 0.93 deadlock

Mihael Hategan hategan at mcs.anl.gov
Tue Sep 20 21:07:52 CDT 2011


Fixed in r 5141.

On Mon, 2011-09-19 at 22:20 -0500, David Kelly wrote:
> I tried today with the 0.93 update. It ran for approximately 7 hours before freezing. It looks to be happening in a different place this time.
> 
> http://www.ci.uchicago.edu/~davidk/swat4/jstack.log
> http://www.ci.uchicago.edu/~davidk/swat4/cce_ua-20110919-1955-h7t8iui2.log
> 
> David
> 
> 
> ----- Original Message -----
> > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > Cc: "David Kelly" <davidk at ci.uchicago.edu>, "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia Rizwan"
> > <papia.rizwan at gmail.com>
> > Sent: Saturday, September 17, 2011 11:36:25 PM
> > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > I have a tentative fix in the branch and trunk. Revisions 5123 and
> > 5124,
> > respectively. Please let me know how that works out.
> > 
> > Mihae
> > 
> > On Fri, 2011-09-16 at 11:50 -0500, Michael Wilde wrote:
> > > David and Papia, can you report to the list what the status is of
> > > running the SWAT app?
> > >
> > > - I understand that Mihael will work on the 0.93 deadlock fix this
> > > weekend, which is great.
> > >
> > > - I understand that its happening on trunk as well
> > >
> > > - Papia, can you try to "perturb" the Swift code in the hopes that
> > > some equivalent but different code doesnt trip into the same bug? Ie
> > > try a different mapper, different variable strategy (ie arrays vs
> > > scalars, structs vs separate vars) just to see if you can work
> > > around this? Or, put in some shell logic to catch the hang and kill
> > > and re-run (or resume) Swift? if you just kill a hung script and
> > > then resume it, will it work? We could maybe alter the hang checker
> > > to kill swift on its own, with a return code or message that you
> > > could use to trigger a resume.
> > >
> > > Mike
> > >
> > >
> > > ----- Original Message -----
> > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > > Rizwan" <papia.rizwan at gmail.com>
> > > > Sent: Thursday, September 15, 2011 4:34:02 PM
> > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > I was able to get it running on PADS with trunk. I ran into the
> > > > same
> > > > issue.
> > > >
> > > > http://www.ci.uchicago.edu/~davidk/swat3/jstack.log
> > > > http://www.ci.uchicago.edu/~davidk/swat3/cce_ua-20110915-1617-sd4svyo2.log
> > > >
> > > > David
> > > >
> > > > ----- Original Message -----
> > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > > > Rizwan" <papia.rizwan at gmail.com>
> > > > > Sent: Thursday, September 15, 2011 2:39:47 PM
> > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > The sites.xml in /homes/papia/SwiftSCE2 seems to be using
> > > > > passive
> > > > > persistent coasters. Is there a way to use automatic coasters on
> > > > > the
> > > > > MCS workstations? I'll try copying this over to PADS and running
> > > > > there
> > > > > to see if I can reproduce it.
> > > > >
> > > > > David
> > > > >
> > > > > ----- Original Message -----
> > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Papia
> > > > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > > > <hategan at mcs.anl.gov>
> > > > > > Sent: Thursday, September 15, 2011 2:18:17 PM
> > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > Can you make SWAT run under trunk, as Papia is testing using
> > > > > > standard
> > > > > > auto coasters, and doesnt need any of the missing
> > > > > > coaster-service
> > > > > > options.
> > > > > >
> > > > > > - Mike
> > > > > >
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > > > > > "Papia
> > > > > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > > > > <hategan at mcs.anl.gov>
> > > > > > > Sent: Thursday, September 15, 2011 2:15:36 PM
> > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > I got past the compilation errors by renaming the all
> > > > > > > functions
> > > > > > > with
> > > > > > > capitalization, but ran into an issue with coaster-service.
> > > > > > > Last
> > > > > > > week
> > > > > > > I noticed coaster-service was missing options for dynamic
> > > > > > > ports.
> > > > > > > I
> > > > > > > found today that it is also missing -passive. I'll try to
> > > > > > > track
> > > > > > > down
> > > > > > > where this changed and restore the previous version.
> > > > > > >
> > > > > > > David
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > > > > > > "Papia
> > > > > > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > > > > > <hategan at mcs.anl.gov>
> > > > > > > > Sent: Thursday, September 15, 2011 12:37:13 PM
> > > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > > Excellent, thanks - thats good. I also just verified that
> > > > > > > > Papia
> > > > > > > > is
> > > > > > > > not
> > > > > > > > using the overAllocation tags in the sites file, so this
> > > > > > > > problem
> > > > > > > > is
> > > > > > > > clearly a Java deadlock and has nothing to do with the
> > > > > > > > scheduling
> > > > > > > > problem that the (now fixed) overAllocation problem was
> > > > > > > > causing..
> > > > > > > >
> > > > > > > > My understanding is that this SWAT script is failing under
> > > > > > > > trunk
> > > > > > > > because of the recent token case handling issue (I think
> > > > > > > > the
> > > > > > > > camel-case one). Can you work with Papia to see if either
> > > > > > > > that
> > > > > > > > issue
> > > > > > > > is now fixed, or if her script can be changed to avoid
> > > > > > > > that,
> > > > > > > > so
> > > > > > > > that
> > > > > > > > you can both test the SWAT script with trunk, to see if
> > > > > > > > the
> > > > > > > > deadlock
> > > > > > > > still occurs?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > - MIke
> > > > > > > >
> > > > > > > >
> > > > > > > > ----- Original Message -----
> > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > > > > > > > "Papia
> > > > > > > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > > > > > > <hategan at mcs.anl.gov>
> > > > > > > > > Sent: Thursday, September 15, 2011 12:29:03 PM
> > > > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > > > I narrowed down the problem a bit. Last night I ran
> > > > > > > > > jstack
> > > > > > > > > on
> > > > > > > > > the
> > > > > > > > > wrong java process which is why it didn't report a
> > > > > > > > > deadlock.
> > > > > > > > >
> > > > > > > > > Papia and I are seeing the same issue.
> > > > > > > > >
> > > > > > > > > My jstack:
> > > > > > > > > http://www.ci.uchicago.edu/~davidk/swat2/jstack.log
> > > > > > > > > Papia's jstack:
> > > > > > > > > http://www.ci.uchicago.edu/~davidk/swat2/papia-jstack.log
> > > > > > > > >
> > > > > > > > > It happens in the same place:
> > > > > > > > >
> > > > > > > > > org.griphyn.vdl.karajan.lib.cache.File.lock(File.java:100)
> > > > > > > > > org.griphyn.vdl.karajan.lib.cache.LRUFileCache.addAndLockEntry(LRUFileCache.java:24)
> > > > > > > > >
> > > > > > > > > Filed as bug #559
> > > > > > > > >
> > > > > > > > > David
> > > > > > > > >
> > > > > > > > > ----- Original Message -----
> > > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > > > > > > > > "Papia
> > > > > > > > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > > > > > > > <hategan at mcs.anl.gov>
> > > > > > > > > > Sent: Thursday, September 15, 2011 11:46:59 AM
> > > > > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > > > > David, it sounds like more analysis is needed here. If
> > > > > > > > > > the
> > > > > > > > > > SWAT
> > > > > > > > > > runs
> > > > > > > > > > are not showing a deadlock (but your runs are) then
> > > > > > > > > > likely
> > > > > > > > > > we
> > > > > > > > > > have
> > > > > > > > > > two
> > > > > > > > > > different problems here.
> > > > > > > > > >
> > > > > > > > > > Another case we saw in 0.93 with scripts failing to
> > > > > > > > > > progress
> > > > > > > > > > is
> > > > > > > > > > due
> > > > > > > > > > to
> > > > > > > > > > the overAllocation parameter problem that Mihael fixed
> > > > > > > > > > yesterday.
> > > > > > > > > > The
> > > > > > > > > > symptom there is that Swift starts a coaster with a
> > > > > > > > > > time
> > > > > > > > > > slot
> > > > > > > > > > too
> > > > > > > > > > small for the apps in the script, and no apps wind up
> > > > > > > > > > running.
> > > > > > > > > > I
> > > > > > > > > > think
> > > > > > > > > > that situation in general merits a separate ticket,
> > > > > > > > > > and
> > > > > > > > > > may
> > > > > > > > > > have
> > > > > > > > > > been
> > > > > > > > > > discussed on swift-devel (but quite a while ago).
> > > > > > > > > >
> > > > > > > > > > Can you determine if indeed Papia's SWAT runs are
> > > > > > > > > > hanging
> > > > > > > > > > for
> > > > > > > > > > a
> > > > > > > > > > reason
> > > > > > > > > > other than a Java deadlock?
> > > > > > > > > >
> > > > > > > > > > - Mike
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > Cc: "swift-devel Devel"
> > > > > > > > > > > <swift-devel at ci.uchicago.edu>,
> > > > > > > > > > > "Papia
> > > > > > > > > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > > > > > > > > <hategan at mcs.anl.gov>
> > > > > > > > > > > Sent: Thursday, September 15, 2011 8:03:09 AM
> > > > > > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > > > > > The jstack log corresponds to the most recent log
> > > > > > > > > > > file -
> > > > > > > > > > > http://www.ci.uchicago.edu/~davidk/swat/cce_ua-20110914-1934-frd3thja.log.
> > > > > > > > > > > jstack does not report any deadlocks, but I thought
> > > > > > > > > > > it
> > > > > > > > > > > might
> > > > > > > > > > > be
> > > > > > > > > > > useful
> > > > > > > > > > > so I included it. Swift was not making any progress
> > > > > > > > > > > for
> > > > > > > > > > > about
> > > > > > > > > > > 5
> > > > > > > > > > > hours
> > > > > > > > > > > before I sent the logs. I am running the latest 0.93
> > > > > > > > > > > branch.
> > > > > > > > > > > I
> > > > > > > > > > > will
> > > > > > > > > > > try again today.
> > > > > > > > > > >
> > > > > > > > > > > David
> > > > > > > > > > >
> > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > > Cc: "swift-devel Devel"
> > > > > > > > > > > > <swift-devel at ci.uchicago.edu>,
> > > > > > > > > > > > "Papia
> > > > > > > > > > > > Rizwan" <papia.rizwan at gmail.com>, "Mihael Hategan"
> > > > > > > > > > > > <hategan at mcs.anl.gov>
> > > > > > > > > > > > Sent: Thursday, September 15, 2011 5:54:11 AM
> > > > > > > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > > > > > > David, which of the many Swift logs in that /swat
> > > > > > > > > > > > dir
> > > > > > > > > > > > does
> > > > > > > > > > > > the
> > > > > > > > > > > > jstack.log pertain to? How many of these runs
> > > > > > > > > > > > deadlocked?
> > > > > > > > > > > >
> > > > > > > > > > > > And, did you verify that you (and Papia) are
> > > > > > > > > > > > running
> > > > > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > latest
> > > > > > > > > > > > rev
> > > > > > > > > > > > of the 0.93 branch?
> > > > > > > > > > > >
> > > > > > > > > > > > - Mike
> > > > > > > > > > > >
> > > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > > > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > > > > > > > > > Cc: "swift-devel Devel"
> > > > > > > > > > > > > <swift-devel at ci.uchicago.edu>,
> > > > > > > > > > > > > "Papia
> > > > > > > > > > > > > Rizwan" <papia.rizwan at gmail.com>, "Michael
> > > > > > > > > > > > > Wilde"
> > > > > > > > > > > > > <wilde at mcs.anl.gov>
> > > > > > > > > > > > > Sent: Wednesday, September 14, 2011 11:04:41 PM
> > > > > > > > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > > > > > > > I was able to reproduce the problem with
> > > > > > > > > > > > > persistent
> > > > > > > > > > > > > coasters
> > > > > > > > > > > > > on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > MCS servers.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The jstack output is at
> > > > > > > > > > > > > http://www.ci.uchicago.edu/~davidk/swat/jstack.log
> > > > > > > > > > > > >
> > > > > > > > > > > > > The full collection of logs are at
> > > > > > > > > > > > > http://www.ci.uchicago.edu/~davidk/swat.
> > > > > > > > > > > > >
> > > > > > > > > > > > > David
> > > > > > > > > > > > >
> > > > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > > > > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > > > > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > > > > Cc: "swift-devel Devel"
> > > > > > > > > > > > > > <swift-devel at ci.uchicago.edu>,
> > > > > > > > > > > > > > "Papia
> > > > > > > > > > > > > > Rizwan" <papia.rizwan at gmail.com>
> > > > > > > > > > > > > > Sent: Wednesday, September 14, 2011 10:30:48
> > > > > > > > > > > > > > PM
> > > > > > > > > > > > > > Subject: Re: [Swift-devel] swift 0.93 deadlock
> > > > > > > > > > > > > > Could you also forward the attachments please?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Mihael
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, 2011-09-14 at 14:46 -0500, Michael
> > > > > > > > > > > > > > Wilde
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > I think I am seeing a similar deadlock on
> > > > > > > > > > > > > > > 0.93
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > ParVis
> > > > > > > > > > > > > > > script,
> > > > > > > > > > > > > > > and am trying to get a clean log and jstack
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > confirm.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As far as I can tell, Papia is running the
> > > > > > > > > > > > > > > correct
> > > > > > > > > > > > > > > 0.93
> > > > > > > > > > > > > > > code,
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > please verify.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > David will try to replicate this problem as
> > > > > > > > > > > > > > > well.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Mike
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > > > > > > From: "Papia Rizwan"
> > > > > > > > > > > > > > > > <papia.rizwan at gmail.com>
> > > > > > > > > > > > > > > > To: "swift-devel Devel"
> > > > > > > > > > > > > > > > <swift-devel at ci.uchicago.edu>,
> > > > > > > > > > > > > > > > "Michael
> > > > > > > > > > > > > > > > Wilde" <wilde at mcs.anl.gov>, "Michael P.
> > > > > > > > > > > > > > > > Shields"
> > > > > > > > > > > > > > > > <mpshields at anl.gov>
> > > > > > > > > > > > > > > > Sent: Wednesday, September 14, 2011
> > > > > > > > > > > > > > > > 1:56:13 PM
> > > > > > > > > > > > > > > > Subject: swift 0.93 deadlock
> > > > > > > > > > > > > > > > Attached are the jstack output and the log
> > > > > > > > > > > > > > > > file.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Papia Rizwan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > Swift-devel mailing list
> > > > > > > > > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Michael Wilde
> > > > > > > > > > > > Computation Institute, University of Chicago
> > > > > > > > > > > > Mathematics and Computer Science Division
> > > > > > > > > > > > Argonne National Laboratory
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Michael Wilde
> > > > > > > > > > Computation Institute, University of Chicago
> > > > > > > > > > Mathematics and Computer Science Division
> > > > > > > > > > Argonne National Laboratory
> > > > > > > >
> > > > > > > > --
> > > > > > > > Michael Wilde
> > > > > > > > Computation Institute, University of Chicago
> > > > > > > > Mathematics and Computer Science Division
> > > > > > > > Argonne National Laboratory
> > > > > >
> > > > > > --
> > > > > > Michael Wilde
> > > > > > Computation Institute, University of Chicago
> > > > > > Mathematics and Computer Science Division
> > > > > > Argonne National Laboratory
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >





More information about the Swift-devel mailing list