[Swift-devel] timeout on OSG with coasters provider staging
Ketan Maheshwari
ketancmaheshwari at gmail.com
Mon Jan 16 11:05:38 CST 2012
Hi Mihael,
I could reproduce this timeout exception on OSG with catsn Swift jobs.
These are 100 jobs with a data size of 10MB each. So, 2000MB of data
movement in all.
I tried with 1 worker running on a single OSG site. I tried three different
OSG sites: Nebraska, UChicago and RENCI.
In each of these cases, I run into the following timeout after ~4 minutes
of run (15-70 jobs complete during this period) . :
Timeout
org.globus.cog.karajan.workflow.service.TimeoutException: Handler(562,
PUT): timed out receiving request. Last time 940817-011255.807, now:
120115-194100.072
at
org.globus.cog.karajan.workflow.service.handlers.RequestHandler.handleTimeout(RequestHandler.java:124)
at
org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131)
at
org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:123)
at
org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:116)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
Command(168, SUBMITJOB): handling reply timeout;
sendReqTime=120115-193900.255, sendTime=120115-193900.255,
now=120115-194100.416, channel=SC-null
This is followed by messages similar to the above last line but the
progress of workflow halts.
Here is the tarball of the experiment:
http://ci.uchicago.edu/~ketan/catsn-exp-formihael.tgz
It contains a README which has the steps to run: basically start-service on
localhost -> start worker on OSG site -> run swift
Regards,
--
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120116/f42dc025/attachment.html>
More information about the Swift-devel
mailing list