[Swift-devel] New 0.93 deadlock - mapping.RootArrayDataNode?

Michael Wilde wilde at mcs.anl.gov
Tue Sep 20 17:34:50 CDT 2011


0.93 just deadlocked for me on Fusion, starting the ParVis script.

log and jstack are on ci net in ~wilde/ - copies of these files:
-rw-r--r-- 1 wilde mcsz  69992 Sep 20 17:24 amwg_stats-20110920-1718-bj4kcf16.jstack
-rw-r--r-- 1 wilde mcsz 575149 Sep 20 17:18 amwg_stats-20110920-1718-bj4kcf16.log

The script didnt get very far - never started a job:

RunID: 20110920-1718-bj4kcf16
Progress:  time: Tue, 20 Sep 2011 17:18:38 -0500
SwiftScript trace: test_inst: -1
SwiftScript trace: test_djf: NEXT
SwiftScript trace: test_case: HRC06
SwiftScript trace: test_nyrs: 10
SwiftScript trace: workdir: /home/wilde/amwg/run01/output/diag/HRC06/
SwiftScript trace: diag_code: /home/wilde/amwg/run01/code/
SwiftScript trace: paleo: False
SwiftScript trace: test_path: /fusion/group/climate/Parvis/atmos/HRC06/
SwiftScript trace: test_begin: 110
SwiftScript trace: cntl_djf: NEXT
SwiftScript trace: cntl_out: /home/wilde/amwg/run01/output/diag/HRC06//dummy
SwiftScript trace: cntl_out_climo: /home/wilde/amwg/run01/output/climo/HRC06//dummy
SwiftScript trace: cntl_case: dummy
SwiftScript trace: cntl_path: /fusion/group/climate/Parvis/atmos/HRC06/
SwiftScript trace: plots: DJF
SwiftScript trace: plots: ANN
SwiftScript trace: plots: JJA
Progress:  time: Tue, 20 Sep 2011 17:19:08 -0500  Initializing:150  Selecting site:12  Stage in:8
Progress:  time: Tue, 20 Sep 2011 17:19:38 -0500  Initializing:150  Selecting site:12  Stage in:8
Progress:  time: Tue, 20 Sep 2011 17:20:08 -0500  Initializing:150  Selecting site:12  Stage in:8

Jstack says>

Found one Java-level deadlock:
=============================
"pool-1-thread-16":
  waiting to lock monitor 0x00000000408c3608 (object 0x00002aaab838d778, a org.griphyn.vdl.mapping.RootArrayDataNode),
  which is held by "pool-1-thread-7"
"pool-1-thread-7":
  waiting to lock monitor 0x00002aaac88a06d0 (object 0x00002aaab7d93af8, a org.griphyn.vdl.mapping.RootArrayDataNode),
  which is held by "pool-1-thread-15"
"pool-1-thread-15":
  waiting to lock monitor 0x00000000408c3608 (object 0x00002aaab838d778, a org.griphyn.vdl.mapping.RootArrayDataNode),
  which is held by "pool-1-thread-7"

Java stack information for the threads listed above:
===================================================
"pool-1-thread-16":
        at org.griphyn.vdl.mapping.RootArrayDataNode.getMapper(RootArrayDataNode.java:99)
        - waiting to lock <0x00002aaab838d778> (a org.griphyn.vdl.mapping.RootArrayDataNode)
        at org.griphyn.vdl.mapping.AbstractDataNode.getMapper(AbstractDataNode.java:571)
        at org.griphyn.vdl.karajan.lib.VDLFunction.leafFileName(VDLFunction.java:270)
        at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:187)
        at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:175)
        at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:171)
        at org.griphyn.vdl.karajan.lib.swiftscript.FileName.function(FileName.java:17)
        at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:62)
        at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
        at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
        at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
        at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
        at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
        at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
        at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
        at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
        at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
        at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104)
        at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
"pool-1-thread-7":
        at org.griphyn.vdl.mapping.ArrayDataNode.getFutureWrapper(ArrayDataNode.java:88)
        - waiting to lock <0x00002aaab7d93af8> (a org.griphyn.vdl.mapping.RootArrayDataNode)
        at org.griphyn.vdl.mapping.RootArrayDataNode.getMapper(RootArrayDataNode.java:103)
        - locked <0x00002aaab838d778> (a org.griphyn.vdl.mapping.RootArrayDataNode)
        at org.griphyn.vdl.mapping.AbstractDataNode.getMapper(AbstractDataNode.java:571)
        at org.griphyn.vdl.karajan.lib.VDLFunction.leafFileName(VDLFunction.java:270)
        at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:187)
        at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:175)
        at org.griphyn.vdl.karajan.lib.VDLFunction.filename(VDLFunction.java:171)
        at org.griphyn.vdl.karajan.lib.swiftscript.FileName.function(FileName.java:17)
        at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:62)
        at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194)
        at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214)
        at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
        at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28)
        at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29)
        at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20)
        at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63)
        at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139)
        at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197)
        at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104)
        at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
"pool-1-thread-15":
        at org.griphyn.vdl.mapping.RootArrayDataNode.innerInit(RootArrayDataNode.java:40)
        - waiting to lock <0x00002aaab838d778> (a org.griphyn.vdl.mapping.RootArrayDataNode)
        at org.griphyn.vdl.mapping.RootArrayDataNode.futureModified(RootArrayDataNode.java:80)
        at org.griphyn.vdl.karajan.ArrayIndexFutureList.notifyListeners(ArrayIndexFutureList.java:120)
        at org.griphyn.vdl.karajan.ArrayIndexFutureList.addKey(ArrayIndexFutureList.java:57)
        at org.griphyn.vdl.mapping.ArrayDataNode.addKey(ArrayDataNode.java:73)
        - locked <0x00002aaab7d93af8> (a org.griphyn.vdl.mapping.RootArrayDataNode)
        at org.griphyn.vdl.mapping.ArrayDataNode.createDSHandle(ArrayDataNode.java:82)
        at org.griphyn.vdl.mapping.AbstractDataNode.getField(AbstractDataNode.java:270)
        - locked <0x00002aaab7d93af8> (a org.griphyn.vdl.mapping.RootArrayDataNode)
        at org.griphyn.vdl.mapping.AbstractDataNode.getField(AbstractDataNode.java:195)
        at org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:127)
        at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:46)
        at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:62)
--More--(95%)


-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list