[Swift-devel] New 0.93 deadlock - mapping.RootArrayDataNode?
Michael Wilde
wilde at mcs.anl.gov
Tue Sep 20 17:34:50 CDT 2011
0.93 just deadlocked for me on Fusion, starting the ParVis script.
log and jstack are on ci net in ~wilde/ - copies of these files:
-rw-r--r-- 1 wilde mcsz 69992 Sep 20 17:24 amwg_stats-20110920-1718-bj4kcf16.jstack
-rw-r--r-- 1 wilde mcsz 575149 Sep 20 17:18 amwg_stats-20110920-1718-bj4kcf16.log
The script didnt get very far - never started a job:
RunID: 20110920-1718-bj4kcf16
Progress: time: Tue, 20 Sep 2011 17:18:38 -0500
SwiftScript trace: test_inst: -1
SwiftScript trace: test_djf: NEXT
SwiftScript trace: test_case: HRC06
SwiftScript trace: test_nyrs: 10
SwiftScript trace: workdir: /home/wilde/amwg/run01/output/diag/HRC06/
SwiftScript trace: diag_code: /home/wilde/amwg/run01/code/
SwiftScript trace: paleo: False
SwiftScript trace: test_path: /fusion/group/climate/Parvis/atmos/HRC06/
SwiftScript trace: test_begin: 110
SwiftScript trace: cntl_djf: NEXT
SwiftScript trace: cntl_out: /home/wilde/amwg/run01/output/diag/HRC06//dummy
SwiftScript trace: cntl_out_climo: /home/wilde/amwg/run01/output/climo/HRC06//dummy
SwiftScript trace: cntl_case: dummy
SwiftScript trace: cntl_path: /fusion/group/climate/Parvis/atmos/HRC06/
SwiftScript trace: plots: DJF
SwiftScript trace: plots: ANN
SwiftScript trace: plots: JJA
Progress: time: Tue, 20 Sep 2011 17:19:08 -0500 Initializing:150 Selecting site:12 Stage in:8
Progress: time: Tue, 20 Sep 2011 17:19:38 -0500 Initializing:150 Selecting site:12 Stage in:8
Progress: time: Tue, 20 Sep 2011 17:20:08 -0500 Initializing:150 Selecting site:12 Stage in:8
Jstack says>
Found one Java-level deadlock:
=============================
"pool-1-thread-16":
waiting to lock monitor 0x00000000408c3608 (object 0x00002aaab838d778, a org.griphyn.vdl.mapping.RootArrayDataNode),
which is held by "pool-1-thread-7"
"pool-1-thread-7":
waiting to lock monitor 0x00002aaac88a06d0 (object 0x00002aaab7d93af8, a org.griphyn.vdl.mapping.RootArrayDataNode),
which is held by "pool-1-thread-15"
"pool-1-thread-15":
waiting to lock monitor 0x00000000408c3608 (object 0x00002aaab838d778, a org.griphyn.vdl.mapping.RootArrayDataNode),
which is held by "pool-1-thread-7"
Java stack information for the threads listed above:
===================================================
"pool-1-thread-16":
at org.griphyn.vdl.mapping.RootArrayDataNode.getMapper(RootArrayDataNode.java:99)
- waiting to lock <0x00002aaab838d778> (a org.griphyn.vdl.mapping.RootArrayDataNode)
at org.griphyn.vdl.mapping.AbstractDataNode.getMapper(AbstractDataNode.java:571)
at org.griphyn.vdl.karajan.lib.VDLFunction.leafFileName(VDLFunction.java:270)
at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:187)
at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:175)
at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:171)
at org.griphyn.vdl.karajan.lib.swiftscript.FileName.function(FileName.java:17)
at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:62)
at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104)
at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
"pool-1-thread-7":
at org.griphyn.vdl.mapping.ArrayDataNode.getFutureWrapper(ArrayDataNode.java:88)
- waiting to lock <0x00002aaab7d93af8> (a org.griphyn.vdl.mapping.RootArrayDataNode)
at org.griphyn.vdl.mapping.RootArrayDataNode.getMapper(RootArrayDataNode.java:103)
- locked <0x00002aaab838d778> (a org.griphyn.vdl.mapping.RootArrayDataNode)
at org.griphyn.vdl.mapping.AbstractDataNode.getMapper(AbstractDataNode.java:571)
at org.griphyn.vdl.karajan.lib.VDLFunction.leafFileName(VDLFunction.java:270)
at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:187)
at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:175)
at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:171)
at org.griphyn.vdl.karajan.lib.swiftscript.FileName.function(FileName.java:17)
at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:62)
at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104)
at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
"pool-1-thread-15":
at org.griphyn.vdl.mapping.RootArrayDataNode.innerInit(RootArrayDataNode.java:40)
- waiting to lock <0x00002aaab838d778> (a org.griphyn.vdl.mapping.RootArrayDataNode)
at org.griphyn.vdl.mapping.RootArrayDataNode.futureModified(RootArrayDataNode.java:80)
at org.griphyn.vdl.karajan.ArrayIndexFutureList.notifyListeners(ArrayIndexFutureList.java:120)
at org.griphyn.vdl.karajan.ArrayIndexFutureList.addKey(ArrayIndexFutureList.java:57)
at org.griphyn.vdl.mapping.ArrayDataNode.addKey(ArrayDataNode.java:73)
- locked <0x00002aaab7d93af8> (a org.griphyn.vdl.mapping.RootArrayDataNode)
at org.griphyn.vdl.mapping.ArrayDataNode.createDSHandle(ArrayDataNode.java:82)
at org.griphyn.vdl.mapping.AbstractDataNode.getField(AbstractDataNode.java:270)
- locked <0x00002aaab7d93af8> (a org.griphyn.vdl.mapping.RootArrayDataNode)
at org.griphyn.vdl.mapping.AbstractDataNode.getField(AbstractDataNode.java:195)
at org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:127)
at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:46)
at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:62)
--More--(95%)
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list