[Swift-user] Block task failed: Connection to worker lost

Ozik, Jonathan jozik at anl.gov
Wed Dec 3 10:50:40 CST 2014


Hi all,

I’m trying to run a large set of simulations on Midway using Swift 0.95-RC5.
768 of the 2187 tasks completed successfully and then I got the exception:

	exception @ swift-int.k, line: 530
Caused by: Block task failed: Connection to worker lost
org.globus.cog.coaster.TimeoutException: Channel timed out. lastTime=141203-145449.325, now=141203-145649.844, channel=TCPChannel [type: server, contact: 1202-5410010-000072-000000]
	at org.globus.cog.coaster.channels.AbstractCoasterChannel.checkTimeouts(AbstractCoasterChannel.java:133)
	at org.globus.cog.coaster.channels.AbstractCoasterChannel$1.run(AbstractCoasterChannel.java:124)
	at java.util.TimerThread.mainLoop(Timer.java:555)
	at java.util.TimerThread.run(Timer.java:505) 

Progress: Wed, 03 Dec 2014 14:59:51+0000  Submitted:651  Failed:6  Finished successfully:768  Failed but can retry:762
Progress: Wed, 03 Dec 2014 14:59:52+0000  Submitted:651  Failed:44  Finished successfully:768  Failed but can retry:724

And the process seems to have stopped.

What log file would be helpful for diagnosing this?

Jonathan




More information about the Swift-user mailing list