[Swift-devel] Re: transfer-only workload worked!! (was Re: resuming discussion on the hung processes...)
Mihael Hategan
hategan at mcs.anl.gov
Tue Apr 26 15:58:22 CDT 2011
I think the issue is different. The thread that writes to the restart
log is idle.
Can I take a look at the swift log?
On Tue, 2011-04-26 at 15:45 -0500, Allan Espinosa wrote:
> Hi Mihael,
>
> This is on the latest stable branch. Here's the dump:
>
> 2011-04-25 11:45:35
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
>
> "Attach Listener" daemon prio=10 tid=0x0000000044cd2800 nid=0x4c5f
> waiting on condition [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
>
> Locked ownable synchronizers:
> - None
>
> "Condor provider queue poller" daemon prio=10 tid=0x00002aabb86eb800
> nid=0x3c7a sleeping[0x0000000043c1f000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:76)
> at java.lang.Thread.run(Thread.java:619)
>
> Locked ownable synchronizers:
> - None
>
> "Scheduler" prio=10 tid=0x00002aabb8763800 nid=0x34c0 in Object.wait()
> [0x0000000041678000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:305)
> at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:258)
> - locked <0x00002aaab7ca50a0> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler)
>
> Locked ownable synchronizers:
> - None
>
> "Progress ticker" daemon prio=10 tid=0x00002aabb86d5000 nid=0x2c3f
> waiting on condition [0x0000000041577000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:137)
>
> Locked ownable synchronizers:
> - None
>
> "Restart Log Sync" daemon prio=10 tid=0x0000000044f15800 nid=0x2c38 in
> Object.wait() [0x000000004290c000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab7c0c808> (a
> org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)
> at java.lang.Object.wait(Object.java:485)
> at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:45)
> - locked <0x00002aaab7c0c808> (a
> org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)
>
> Locked ownable synchronizers:
> - None
>
> "Overloaded Host Monitor" daemon prio=10 tid=0x00002aabb857b800
> nid=0x2c33 sleeping[0x000000004280b000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47)
>
> Locked ownable synchronizers:
> - None
>
> "Timer-0" daemon prio=10 tid=0x00000000451a0000 nid=0x2c32 in
> Object.wait() [0x000000004270a000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:509)
> - locked <0x00002aaab7d01f10> (a java.util.TaskQueue)
> at java.util.TimerThread.run(Timer.java:462)
>
> Locked ownable synchronizers:
> - None
>
> "pool-1-thread-4" prio=10 tid=0x00000000452d9800 nid=0x2c17 in
> Object.wait() [0x0000000042508000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at java.lang.Object.wait(Object.java:485)
> at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
> - locked <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
> at java.lang.Thread.run(Thread.java:619)
>
> Locked ownable synchronizers:
> - None
>
> "pool-1-thread-3" prio=10 tid=0x0000000044ffc800 nid=0x2c16 in
> Object.wait() [0x0000000042407000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at java.lang.Object.wait(Object.java:485)
> at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
> - locked <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
> at java.lang.Thread.run(Thread.java:619)
>
> Locked ownable synchronizers:
> - None
>
> "pool-1-thread-2" prio=10 tid=0x00002aabc024a800 nid=0x2c15 in
> Object.wait() [0x0000000042306000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at java.lang.Object.wait(Object.java:485)
> at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
> - locked <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
> at java.lang.Thread.run(Thread.java:619)
>
> Locked ownable synchronizers:
> - None
>
> "pool-1-thread-1" prio=10 tid=0x00002aabb85f4000 nid=0x2c14 in
> Object.wait() [0x0000000042205000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at java.lang.Object.wait(Object.java:485)
> at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
> - locked <0x00002aaab3b8df68> (a
> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
> at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
> at java.lang.Thread.run(Thread.java:619)
>
> Locked ownable synchronizers:
> - None
>
> "Low Memory Detector" daemon prio=10 tid=0x0000000044c72000 nid=0x2c12
> runnable [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
>
> Locked ownable synchronizers:
> - None
>
> "CompilerThread1" daemon prio=10 tid=0x0000000044c70000 nid=0x2c11
> waiting on condition [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
>
> Locked ownable synchronizers:
> - None
>
> "CompilerThread0" daemon prio=10 tid=0x0000000044c6a800 nid=0x2c10
> waiting on condition [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
>
> Locked ownable synchronizers:
> - None
>
> "Signal Dispatcher" daemon prio=10 tid=0x0000000044c68800 nid=0x2c0f
> runnable [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
>
> Locked ownable synchronizers:
> - None
>
> "Finalizer" daemon prio=10 tid=0x0000000044c44000 nid=0x2c0e in
> Object.wait() [0x0000000041acb000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
> - locked <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>
> Locked ownable synchronizers:
> - None
>
> "Reference Handler" daemon prio=10 tid=0x0000000044c42000 nid=0x2c0d
> in Object.wait() [0x000000004039b000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:485)
> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> - locked <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock)
>
> Locked ownable synchronizers:
> - None
>
> "main" prio=10 tid=0x0000000044be0000 nid=0x2c07 in Object.wait()
> [0x0000000040977000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00002aaab76845f0> (a
> org.griphyn.vdl.karajan.VDL2ExecutionContext)
> at java.lang.Object.wait(Object.java:485)
> at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:261)
> - locked <0x00002aaab76845f0> (a org.griphyn.vdl.karajan.VDL2ExecutionContext)
> at org.griphyn.vdl.karajan.Loader.main(Loader.java:197)
>
> Locked ownable synchronizers:
> - None
>
> "VM Thread" prio=10 tid=0x0000000044c3d800 nid=0x2c0c runnable
>
> "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000044bf3000
> nid=0x2c08 runnable
>
> "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000044bf5000
> nid=0x2c09 runnable
>
> "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000044bf7000
> nid=0x2c0a runnable
>
> "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000044bf8800
> nid=0x2c0b runnable
>
> "VM Periodic Task Thread" prio=10 tid=0x0000000044c7d000 nid=0x2c13
> waiting on condition
>
> JNI global references: 1093
>
>
> Here's the last few lines of the resumefile:
> ...
> ...
> 3-199:peak.36!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_36.bsa
> 13-199:peak.33!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_33.bsa
> 13-199:peak.34!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_34.bsa
> 13-199:peak.39!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_39.bsa
> 13-199:peak.37
>
> 2011/4/26 Mihael Hategan <hategan at mcs.anl.gov>:
> > On Tue, 2011-04-26 at 15:31 -0500, Allan Espinosa wrote:
> >
> >> > - does it run repeatedly without any user-visible errors?
> >>
> >> There's this problem where Swift is waiting to finish writing to the
> >> resume file. But that's another issue that I would like to defer for
> >> now.
> >
> > Can you send me a stack dump of that situation?
More information about the Swift-devel
mailing list