[Swift-devel] hang checker fun

Allan Espinosa aespinosa at cs.uchicago.edu
Fri Mar 25 20:36:57 CDT 2011


Hi Ketan,

Mihael added the HangChecker to shed a bit of light when a workflow is
'hanging'.  Theoretically it should start executing the the job with
the open future but somehow it is still waiting for something.

Btw, the state of the related job is 'Initializing shared directory'

This is with the latest trunk.

jstack doesn't report anything peculiar (like deadlocks):
[aespinosa at communicado cybershake]$ jstack 20444
2011-03-25 20:36:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):

"Attach Listener" daemon prio=10 tid=0x0000000055b81000 nid=0x67e1
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"NBS7" daemon prio=10 tid=0x0000000055b1c800 nid=0x500e waiting on
condition [0x0000000042f55000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"NBS6" daemon prio=10 tid=0x0000000055b1b800 nid=0x500d waiting on
condition [0x0000000042e54000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"NBS5" daemon prio=10 tid=0x0000000055b1a800 nid=0x500b waiting on
condition [0x0000000042d53000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"NBS4" daemon prio=10 tid=0x0000000055b18800 nid=0x5009 waiting on
condition [0x0000000042c52000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"NBS3" daemon prio=10 tid=0x0000000055b20800 nid=0x5008 waiting on
condition [0x0000000042b51000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"NBS2" daemon prio=10 tid=0x0000000055a72800 nid=0x5007 waiting on
condition [0x0000000042a50000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"NBS1" daemon prio=10 tid=0x0000000055bfb000 nid=0x5006 waiting on
condition [0x0000000040e7f000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"Scheduler" prio=10 tid=0x00002aabb8810000 nid=0x5005 in Object.wait()
[0x0000000040774000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aab62337030> (a
org.griphyn.vdl.karajan.VDSAdaptiveScheduler)
        at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:304)
        at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:257)
        - locked <0x00002aab62337030> (a
org.griphyn.vdl.karajan.VDSAdaptiveScheduler)

"Progress ticker" daemon prio=10 tid=0x00002aabc0331000 nid=0x4ffe
waiting on condition [0x0000000040673000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:138)

"Restart Log Sync" daemon prio=10 tid=0x00002aabc00f2800 nid=0x4ffd in
Object.wait() [0x000000004294f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aab61ed0000> (a
org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)
        at java.lang.Object.wait(Object.java:485)
        at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:47)
        - locked <0x00002aab61ed0000> (a
org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)

"Overloaded Host Monitor" daemon prio=10 tid=0x00002aabc015e000
nid=0x4ffc waiting on condition [0x000000004284e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47)

"Timer-0" daemon prio=10 tid=0x0000000055b3b800 nid=0x4ffa in
Object.wait() [0x000000004274d000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aab61ec0260> (a java.util.TaskQueue)
        at java.util.TimerThread.mainLoop(Timer.java:509)
        - locked <0x00002aab61ec0260> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:462)

"NBS0" daemon prio=10 tid=0x00002aabc01a8000 nid=0x4ff9 waiting on
condition [0x000000004264c000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab61ed0160> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-8" prio=10 tid=0x0000000055ae7000 nid=0x4ff8 waiting on
condition [0x000000004254b000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-7" prio=10 tid=0x0000000055b69000 nid=0x4ff7 waiting on
condition [0x000000004244a000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-6" prio=10 tid=0x0000000055eec800 nid=0x4ff6 waiting on
condition [0x0000000042349000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-5" prio=10 tid=0x0000000055b05800 nid=0x4ff5 waiting on
condition [0x0000000042248000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-4" prio=10 tid=0x00002aabc029d000 nid=0x4ff4 waiting on
condition [0x0000000042147000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-3" prio=10 tid=0x00002aabc0106800 nid=0x4ff3 waiting on
condition [0x0000000042046000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-2" prio=10 tid=0x00002aabc02a0000 nid=0x4ff2 waiting on
condition [0x0000000041d71000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-1" prio=10 tid=0x00002aabc00fb000 nid=0x4ff1 waiting on
condition [0x0000000041c70000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aab62385f10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

"Hang checker" prio=10 tid=0x00002aabc0228800 nid=0x4ff0 in
Object.wait() [0x0000000041b6f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aab61ed0718> (a java.util.TaskQueue)
        at java.util.TimerThread.mainLoop(Timer.java:509)
        - locked <0x00002aab61ed0718> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:462)

"Low Memory Detector" daemon prio=10 tid=0x00000000558b5000 nid=0x4fe8
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x00000000558b3000 nid=0x4fe7
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x00000000558ad800 nid=0x4fe6
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00000000558ab800 nid=0x4fe5
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x0000000055886800 nid=0x4fe4 in
Object.wait() [0x0000000040d7e000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aab61ec8770> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
        - locked <0x00002aab61ec8770> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x0000000055884800 nid=0x4fe3
in Object.wait() [0x0000000040c7d000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aab61ed0498> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x00002aab61ed0498> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x0000000055823000 nid=0x4fdd in Object.wait()
[0x0000000041f45000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aab61ec8850> (a
org.griphyn.vdl.karajan.VDL2ExecutionContext)
        at java.lang.Object.wait(Object.java:485)
        at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:218)
        - locked <0x00002aab61ec8850> (a
org.griphyn.vdl.karajan.VDL2ExecutionContext)
        at org.griphyn.vdl.karajan.Loader.main(Loader.java:199)

"VM Thread" prio=10 tid=0x0000000055880800 nid=0x4fe2 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000055836000
nid=0x4fde runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000055838000
nid=0x4fdf runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x000000005583a000
nid=0x4fe0 runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x000000005583b800
nid=0x4fe1 runnable

"VM Periodic Task Thread" prio=10 tid=0x00000000558c0000 nid=0x4fe9
waiting on condition

JNI global references: 1621

2011/3/25 Ketan Maheshwari <ketancmaheshwari at gmail.com>:
> I did not understand this message.
>
> On Fri, Mar 25, 2011 at 7:42 PM, Allan Espinosa <aespinosa at cs.uchicago.edu>
> wrote:
>>
>> this has been occurring for 70 times already.  What i expect is for
>> the app with SgtDim sub to run and close the future.
>>
>> 2011-03-25 19:40:12,217-0500 WARN  HangChecker No events in 10s.
>> 2011-03-25 19:40:12,217-0500 WARN  HangChecker
>> Registered futures:
>> Rupture[] rups  Closed, 1 elements, 0 listeners
>> Variation vars - Closed, no listeners
>> SgtDim sub - Open, 1 listeners
>> string site  Closed, no listeners
>> Variation[] vars  Closed, 72 elements, 0 listeners
>> ----
>>
>> Waiting threads:
>> 0-13
>> 0-13-0-7
>> 0-13-0-8-1-1
>> ----
>>
>>



More information about the Swift-devel mailing list