[Swift-user] Block task failed: Connection to worker lost
Yadu Nand Babuji
yadunand at uchicago.edu
Wed Dec 3 11:04:36 CST 2014
Hi Jonathan,
The issue you are seeing sounds pretty close to what David reported a
while back.
Could you send us a tar ball of your run directory from a failed run ?
Could you also check if you've set lowOverAllocation and
highOverAllocation in your sites definition ?
Thanks,
Yadu
On 12/03/2014 10:50 AM, Ozik, Jonathan wrote:
> Hi all,
>
> I’m trying to run a large set of simulations on Midway using Swift 0.95-RC5.
> 768 of the 2187 tasks completed successfully and then I got the exception:
>
> exception @ swift-int.k, line: 530
> Caused by: Block task failed: Connection to worker lost
> org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000]
> at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)
> at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
>
> Progress: Wed, 03 Dec 2014 14:59:51+0000 Submitted:651 Failed:6 Finished successfully:768 Failed but can retry:762
> Progress: Wed, 03 Dec 2014 14:59:52+0000 Submitted:651 Failed:44 Finished successfully:768 Failed but can retry:724
>
> And the process seems to have stopped.
>
> What log file would be helpful for diagnosing this?
>
> Jonathan
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
More information about the Swift-user
mailing list