[Swift-user] Block task failed: Connection to worker lost

Yadu Nand Babuji yadunand at uchicago.edu
Wed Dec 3 11:04:36 CST 2014


Hi Jonathan,

The issue you are seeing sounds pretty close to what David reported a 
while back.
Could you send us a tar ball of your run directory from a failed run ?

Could you also check if you've set lowOverAllocation and 
highOverAllocation in your sites definition ?

Thanks,
Yadu

On 12/03/2014 10:50 AM, Ozik, Jonathan wrote:
> Hi all,
>
> I’m trying to run a large set of simulations on Midway using Swift 0.95-RC5.
> 768 of the 2187 tasks completed successfully and then I got the exception:
>
> 	exception @ swift-int.k, line: 530
> Caused by: Block task failed: Connection to worker lost
> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000]
> 	at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)
> 	at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)
> 	at java.util.TimerThread.mainLoop(Timer.java:555)
> 	at java.util.TimerThread.run(Timer.java:505)
>
> Progress: Wed, 03 Dec 2014 14:59:51+0000  Submitted:651  Failed:6  Finished successfully:768  Failed but can retry:762
> Progress: Wed, 03 Dec 2014 14:59:52+0000  Submitted:651  Failed:44  Finished successfully:768  Failed but can retry:724
>
> And the process seems to have stopped.
>
> What log file would be helpful for diagnosing this?
>
> Jonathan
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user




More information about the Swift-user mailing list