[Swift-devel] Re: transfer-only workload worked!! (was Re: resuming discussion on the hung processes...)

Allan Espinosa aespinosa at cs.uchicago.edu
Tue Apr 26 15:45:10 CDT 2011


Hi Mihael,

This is on the latest stable branch.  Here's the dump:

2011-04-25 11:45:35
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):

"Attach Listener" daemon prio=10 tid=0x0000000044cd2800 nid=0x4c5f
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"Condor provider queue poller" daemon prio=10 tid=0x00002aabb86eb800
nid=0x3c7a sleeping[0x0000000043c1f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:76)
	at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
	- None

"Scheduler" prio=10 tid=0x00002aabb8763800 nid=0x34c0 in Object.wait()
[0x0000000041678000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:305)
	at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:258)
	- locked <0x00002aaab7ca50a0> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler)

   Locked ownable synchronizers:
	- None

"Progress ticker" daemon prio=10 tid=0x00002aabb86d5000 nid=0x2c3f
waiting on condition [0x0000000041577000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:137)

   Locked ownable synchronizers:
	- None

"Restart Log Sync" daemon prio=10 tid=0x0000000044f15800 nid=0x2c38 in
Object.wait() [0x000000004290c000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab7c0c808> (a
org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)
	at java.lang.Object.wait(Object.java:485)
	at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:45)
	- locked <0x00002aaab7c0c808> (a
org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)

   Locked ownable synchronizers:
	- None

"Overloaded Host Monitor" daemon prio=10 tid=0x00002aabb857b800
nid=0x2c33 sleeping[0x000000004280b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47)

   Locked ownable synchronizers:
	- None

"Timer-0" daemon prio=10 tid=0x00000000451a0000 nid=0x2c32 in
Object.wait() [0x000000004270a000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.util.TimerThread.mainLoop(Timer.java:509)
	- locked <0x00002aaab7d01f10> (a java.util.TaskQueue)
	at java.util.TimerThread.run(Timer.java:462)

   Locked ownable synchronizers:
	- None

"pool-1-thread-4" prio=10 tid=0x00000000452d9800 nid=0x2c17 in
Object.wait() [0x0000000042508000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at java.lang.Object.wait(Object.java:485)
	at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
	- locked <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
	at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
	- None

"pool-1-thread-3" prio=10 tid=0x0000000044ffc800 nid=0x2c16 in
Object.wait() [0x0000000042407000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at java.lang.Object.wait(Object.java:485)
	at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
	- locked <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
	at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
	- None

"pool-1-thread-2" prio=10 tid=0x00002aabc024a800 nid=0x2c15 in
Object.wait() [0x0000000042306000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at java.lang.Object.wait(Object.java:485)
	at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
	- locked <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
	at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
	- None

"pool-1-thread-1" prio=10 tid=0x00002aabb85f4000 nid=0x2c14 in
Object.wait() [0x0000000042205000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at java.lang.Object.wait(Object.java:485)
	at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
	- locked <0x00002aaab3b8df68> (a
edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
	at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
	at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
	- None

"Low Memory Detector" daemon prio=10 tid=0x0000000044c72000 nid=0x2c12
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"CompilerThread1" daemon prio=10 tid=0x0000000044c70000 nid=0x2c11
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"CompilerThread0" daemon prio=10 tid=0x0000000044c6a800 nid=0x2c10
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"Signal Dispatcher" daemon prio=10 tid=0x0000000044c68800 nid=0x2c0f
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"Finalizer" daemon prio=10 tid=0x0000000044c44000 nid=0x2c0e in
Object.wait() [0x0000000041acb000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
	- locked <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

   Locked ownable synchronizers:
	- None

"Reference Handler" daemon prio=10 tid=0x0000000044c42000 nid=0x2c0d
in Object.wait() [0x000000004039b000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:485)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
	- None

"main" prio=10 tid=0x0000000044be0000 nid=0x2c07 in Object.wait()
[0x0000000040977000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00002aaab76845f0> (a
org.griphyn.vdl.karajan.VDL2ExecutionContext)
	at java.lang.Object.wait(Object.java:485)
	at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:261)
	- locked <0x00002aaab76845f0> (a org.griphyn.vdl.karajan.VDL2ExecutionContext)
	at org.griphyn.vdl.karajan.Loader.main(Loader.java:197)

   Locked ownable synchronizers:
	- None

"VM Thread" prio=10 tid=0x0000000044c3d800 nid=0x2c0c runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000044bf3000
nid=0x2c08 runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000044bf5000
nid=0x2c09 runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000044bf7000
nid=0x2c0a runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000044bf8800
nid=0x2c0b runnable

"VM Periodic Task Thread" prio=10 tid=0x0000000044c7d000 nid=0x2c13
waiting on condition

JNI global references: 1093


Here's the last few lines of the resumefile:
...
...
3-199:peak.36!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_36.bsa
13-199:peak.33!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_33.bsa
13-199:peak.34!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_34.bsa
13-199:peak.39!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_39.bsa
13-199:peak.37

2011/4/26 Mihael Hategan <hategan at mcs.anl.gov>:
> On Tue, 2011-04-26 at 15:31 -0500, Allan Espinosa wrote:
>
>> > - does it run repeatedly without any user-visible errors?
>>
>> There's this problem where Swift is waiting to finish writing to the
>> resume file.  But that's another issue that I would like to defer for
>> now.
>
> Can you send me a stack dump of that situation?



More information about the Swift-devel mailing list