[Swift-devel] hanging resumes (was Re: transfer-only workload worked)

Allan Espinosa aespinosa at cs.uchicago.edu
Tue Apr 26 16:05:34 CDT 2011


The log is in
/home/aespinosa/workflows/cybershake/archive-runs/test-completed/postproc-20110422-2320-mcusga23.log

The log reports that the stageout for the remaining job finished.

2011/4/26 Mihael Hategan <hategan at mcs.anl.gov>:
> I think the issue is different. The thread that writes to the restart
> log is idle.
>
> Can I take a look at the swift log?
>
> On Tue, 2011-04-26 at 15:45 -0500, Allan Espinosa wrote:
>> Hi Mihael,
>>
>> This is on the latest stable branch.  Here's the dump:
>>
>> 2011-04-25 11:45:35
>> Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
>>
>> "Attach Listener" daemon prio=10 tid=0x0000000044cd2800 nid=0x4c5f
>> waiting on condition [0x0000000000000000]
>>    java.lang.Thread.State: RUNNABLE
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Condor provider queue poller" daemon prio=10 tid=0x00002aabb86eb800
>> nid=0x3c7a sleeping[0x0000000043c1f000]
>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>       at java.lang.Thread.sleep(Native Method)
>>       at org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:76)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Scheduler" prio=10 tid=0x00002aabb8763800 nid=0x34c0 in Object.wait()
>> [0x0000000041678000]
>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       at org.globus.cog.karajan.scheduler.LateBindingScheduler.sleep(LateBindingScheduler.java:305)
>>       at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:258)
>>       - locked <0x00002aaab7ca50a0> (a org.griphyn.vdl.karajan.VDSAdaptiveScheduler)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Progress ticker" daemon prio=10 tid=0x00002aabb86d5000 nid=0x2c3f
>> waiting on condition [0x0000000041577000]
>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>       at java.lang.Thread.sleep(Native Method)
>>       at org.griphyn.vdl.karajan.lib.RuntimeStats$ProgressTicker.run(RuntimeStats.java:137)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Restart Log Sync" daemon prio=10 tid=0x0000000044f15800 nid=0x2c38 in
>> Object.wait() [0x000000004290c000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab7c0c808> (a
>> org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)
>>       at java.lang.Object.wait(Object.java:485)
>>       at org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread.run(SyncThread.java:45)
>>       - locked <0x00002aaab7c0c808> (a
>> org.globus.cog.karajan.workflow.nodes.restartLog.SyncThread)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Overloaded Host Monitor" daemon prio=10 tid=0x00002aabb857b800
>> nid=0x2c33 sleeping[0x000000004280b000]
>>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>>       at java.lang.Thread.sleep(Native Method)
>>       at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:47)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Timer-0" daemon prio=10 tid=0x00000000451a0000 nid=0x2c32 in
>> Object.wait() [0x000000004270a000]
>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       at java.util.TimerThread.mainLoop(Timer.java:509)
>>       - locked <0x00002aaab7d01f10> (a java.util.TaskQueue)
>>       at java.util.TimerThread.run(Timer.java:462)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "pool-1-thread-4" prio=10 tid=0x00000000452d9800 nid=0x2c17 in
>> Object.wait() [0x0000000042508000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at java.lang.Object.wait(Object.java:485)
>>       at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
>>       - locked <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "pool-1-thread-3" prio=10 tid=0x0000000044ffc800 nid=0x2c16 in
>> Object.wait() [0x0000000042407000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at java.lang.Object.wait(Object.java:485)
>>       at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
>>       - locked <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "pool-1-thread-2" prio=10 tid=0x00002aabc024a800 nid=0x2c15 in
>> Object.wait() [0x0000000042306000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at java.lang.Object.wait(Object.java:485)
>>       at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
>>       - locked <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "pool-1-thread-1" prio=10 tid=0x00002aabb85f4000 nid=0x2c14 in
>> Object.wait() [0x0000000042205000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at java.lang.Object.wait(Object.java:485)
>>       at edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:315)
>>       - locked <0x00002aaab3b8df68> (a
>> edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingQueue$SerializableLock)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
>>       at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:667)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Low Memory Detector" daemon prio=10 tid=0x0000000044c72000 nid=0x2c12
>> runnable [0x0000000000000000]
>>    java.lang.Thread.State: RUNNABLE
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "CompilerThread1" daemon prio=10 tid=0x0000000044c70000 nid=0x2c11
>> waiting on condition [0x0000000000000000]
>>    java.lang.Thread.State: RUNNABLE
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "CompilerThread0" daemon prio=10 tid=0x0000000044c6a800 nid=0x2c10
>> waiting on condition [0x0000000000000000]
>>    java.lang.Thread.State: RUNNABLE
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Signal Dispatcher" daemon prio=10 tid=0x0000000044c68800 nid=0x2c0f
>> runnable [0x0000000000000000]
>>    java.lang.Thread.State: RUNNABLE
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Finalizer" daemon prio=10 tid=0x0000000044c44000 nid=0x2c0e in
>> Object.wait() [0x0000000041acb000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock)
>>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
>>       - locked <0x00002aaab4ec8920> (a java.lang.ref.ReferenceQueue$Lock)
>>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
>>       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "Reference Handler" daemon prio=10 tid=0x0000000044c42000 nid=0x2c0d
>> in Object.wait() [0x000000004039b000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock)
>>       at java.lang.Object.wait(Object.java:485)
>>       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>       - locked <0x00002aaab4ec88a8> (a java.lang.ref.Reference$Lock)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "main" prio=10 tid=0x0000000044be0000 nid=0x2c07 in Object.wait()
>> [0x0000000040977000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>       at java.lang.Object.wait(Native Method)
>>       - waiting on <0x00002aaab76845f0> (a
>> org.griphyn.vdl.karajan.VDL2ExecutionContext)
>>       at java.lang.Object.wait(Object.java:485)
>>       at org.globus.cog.karajan.workflow.ExecutionContext.waitFor(ExecutionContext.java:261)
>>       - locked <0x00002aaab76845f0> (a org.griphyn.vdl.karajan.VDL2ExecutionContext)
>>       at org.griphyn.vdl.karajan.Loader.main(Loader.java:197)
>>
>>    Locked ownable synchronizers:
>>       - None
>>
>> "VM Thread" prio=10 tid=0x0000000044c3d800 nid=0x2c0c runnable
>>
>> "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000044bf3000
>> nid=0x2c08 runnable
>>
>> "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000044bf5000
>> nid=0x2c09 runnable
>>
>> "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000044bf7000
>> nid=0x2c0a runnable
>>
>> "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000044bf8800
>> nid=0x2c0b runnable
>>
>> "VM Periodic Task Thread" prio=10 tid=0x0000000044c7d000 nid=0x2c13
>> waiting on condition
>>
>> JNI global references: 1093
>>
>>
>> Here's the last few lines of the resumefile:
>> ...
>> ...
>> 3-199:peak.36!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_36.bsa
>> 13-199:peak.33!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_33.bsa
>> 13-199:peak.34!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_34.bsa
>> 13-199:peak.39!gsiftp://gridftp.ranger.tacc.teragrid.org//scratch/01035/tg802895/science/cybershake/Results/TEST/219/206/PeakVals_TEST_219_206_39.bsa
>> 13-199:peak.37
>>
>> 2011/4/26 Mihael Hategan <hategan at mcs.anl.gov>:
>> > On Tue, 2011-04-26 at 15:31 -0500, Allan Espinosa wrote:
>> >
>> >> > - does it run repeatedly without any user-visible errors?
>> >>
>> >> There's this problem where Swift is waiting to finish writing to the
>> >> resume file.  But that's another issue that I would like to defer for
>> >> now.
>> >
>> > Can you send me a stack dump of that situation?
>
>
>
>



-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list