[Swift-devel] timeout on OSG with coasters provider staging

Ketan Maheshwari ketancmaheshwari at gmail.com
Mon Jan 16 11:05:38 CST 2012


Hi Mihael,

I could reproduce this timeout exception on OSG with catsn Swift jobs.

These are 100 jobs with a data size of 10MB each. So, 2000MB of data
movement in all.

I tried with 1 worker running on a single OSG site. I tried three different
OSG sites: Nebraska, UChicago and RENCI.

In each of these cases, I run into the following timeout after ~4 minutes
of run (15-70 jobs complete during this period) . :

Timeout
org.globus.cog.karajan.workflow.service.TimeoutException: Handler(562,
PUT): timed out receiving request. Last time 940817-011255.807, now:
120115-194100.072
at
org.globus.cog.karajan.workflow.service.handlers.RequestHandler.handleTimeout(RequestHandler.java:124)
at
org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131)
at
org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:123)
at
org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:116)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
Command(168, SUBMITJOB): handling reply timeout;
sendReqTime=120115-193900.255, sendTime=120115-193900.255,
now=120115-194100.416, channel=SC-null

This is followed by messages similar to the above last line but the
progress of workflow halts.

Here is the tarball of the experiment:
http://ci.uchicago.edu/~ketan/catsn-exp-formihael.tgz

It contains a README which has the steps to run: basically start-service on
localhost -> start worker on OSG site -> run swift

Regards,
-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120116/f42dc025/attachment.html>


More information about the Swift-devel mailing list