[Swift-devel] Deadlocks running ParVis script under 0.93

Michael Wilde wilde at mcs.anl.gov
Mon Sep 12 22:14:02 CDT 2011


Yes - Im very sorry!!! I was careful to build a fresh 0.93 but obviously did my svn ups on the wrong tree. Am re-testing now.

- Mike

----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Monday, September 12, 2011 7:55:32 PM
> Subject: Re: [Swift-devel] Deadlocks running ParVis script under 0.93
> On Mon, 2011-09-12 at 17:53 -0700, Mihael Hategan wrote:
> > This is a rather old 0.93. One of the deadlocks was fixed a while
> > ago.
> 
> And so was the other one.
> 
> > I'm investigating the other one.
> >
> > On Mon, 2011-09-12 at 18:04 -0500, Michael Wilde wrote:
> > > Mihael, Im getting Java-level deadlocks in running a ParVis
> > > script. The user is seeming this as well (on at least one
> > > instance).
> > >
> > > Im running on Fusion in the directory /home/wilde/amwg/run01.
> > >
> > > The script is complex, it should run >325 app calls (Im not yet
> > > sure how many - Im guessing at least 100 more).
> > >
> > > The two runs (swift work dirs) that show the deadlocks are:
> > >
> > > fusion$ ls -lt */jstack.out
> > > -rw-r--r-- 1 wilde mcsz 80249 Sep 12 17:39
> > > amwg_stats-20110912-1546-5aqzkvhe/jstack.out
> > > -rw-r--r-- 1 wilde mcsz 135539 Sep 11 10:38
> > > amwg_stats-20110911-1033-fd1brig2/jstack.out
> > > fusion$
> > >
> > > The log files are in the run01 directory. The Swift stdout
> > > progress logs are in the top of the respective swift work dirs.
> > >
> > > One of my runs, amwg_stats-20110912-1546-5aqzkvhe, (as well as one
> > > of the user's runs) hung after 323 app calls with this Java
> > > deadlock:
> > >
> > > Found one Java-level deadlock:
> > > =============================
> > > "pool-1-thread-32":
> > >   waiting to lock monitor 0x000000005ccf97b0 (object
> > >   0x00002aaab56f0e30, a org.griphyn.vdl.mapping.RootDataNode),
> > >   which is held by "pool-1-thread-11"
> > > "pool-1-thread-11":
> > >   waiting to lock monitor 0x000000005ce1cad8 (object
> > >   0x00002aaab5a50768, a
> > >   org.griphyn.vdl.karajan.DSHandleFutureWrapper),
> > >   which is held by "pool-1-thread-32"
> > >
> > > Java stack information for the threads listed above:
> > > ===================================================
> > > "pool-1-thread-32":
> > >         at
> > >         org.griphyn.vdl.karajan.lib.SwiftArg.unwrap(SwiftArg.java:52)
> > >         - waiting to lock <0x00002aaab56f0e30> (a
> > >         org.griphyn.vdl.mapping.RootDataNode)
> > >         at
> > >         org.griphyn.vdl.karajan.lib.SwiftArg$Vargs.asArray(SwiftArg.java:177)
> > >         at
> > >         org.griphyn.vdl.karajan.lib.swiftscript.Misc.swiftscript_strcat(Misc.java:82)
> > >         at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown
> > >         Source)
> > >         at
> > >         sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > >         org.globus.cog.karajan.workflow.nodes.functions.FunctionsCollection.function(FunctionsCollection.java:82)
> > >         at
> > >         org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:27)
> > > ...
> > > "pool-1-thread-11":
> > >         at
> > >         org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:68)
> > >         - waiting to lock <0x00002aaab5a50768> (a
> > >         org.griphyn.vdl.karajan.DSHandleFutureWrapper)
> > >         at
> > >         org.griphyn.vdl.karajan.DSHandleFutureWrapper.handleClosed(DSHandleFutureWrapper.java:122)
> > >         at
> > >         org.griphyn.vdl.mapping.AbstractDataNode.notifyListeners(AbstractDataNode.java:605)
> > >         at
> > >         org.griphyn.vdl.mapping.AbstractDataNode.closeShallow(AbstractDataNode.java:408)
> > >         - locked <0x00002aaab56f0e30> (a
> > >         org.griphyn.vdl.mapping.RootDataNode)
> > >         at
> > >         org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:358)
> > >         at
> > >         org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:227)
> > >         at
> > >         org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:90)
> > >         at
> > >         org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:49)
> > >         - locked <0x00002aaab56f0e30> (a
> > >         org.griphyn.vdl.mapping.RootDataNode)
> > >         at
> > >         org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:67)
> > >         at
> > >         org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
> > >         at
> > >         org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
> > >
> > >
> > > ---
> > > The other script hung after only about 20 app calls with two
> > > deadlocks, one of which is:
> > >
> > > Found one Java-level deadlock:
> > > =============================
> > > "pool-1-thread-32":
> > >   waiting to lock monitor 0x000000005b7a1da8 (object
> > >   0x00002aaab46d7490, a org.griphyn.vdl.mapping.RootDataNode),
> > >   which is held by "pool-1-thread-26"
> > > "pool-1-thread-26":
> > >   waiting to lock monitor 0x000000005b5e6620 (object
> > >   0x00002aaab46d63d0, a org.griphyn.vdl.karajan.WrapperMap),
> > >   which is held by "pool-1-thread-10"
> > > "pool-1-thread-10":
> > >   waiting to lock monitor 0x000000005b4e47e0 (object
> > >   0x00002aaac2015f90, a
> > >   org.griphyn.vdl.mapping.RootArrayDataNode),
> > >   which is held by "pool-1-thread-15"
> > > "pool-1-thread-15":
> > >   waiting to lock monitor 0x000000005b5e6620 (object
> > >   0x00002aaab46d63d0, a org.griphyn.vdl.karajan.WrapperMap),
> > >   which is held by "pool-1-thread-10"
> > >
> > > ---
> > >
> > >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list