[Swift-devel] timeout on OSG with coasters provider staging
Mihael Hategan
hategan at mcs.anl.gov
Mon Jan 16 13:38:24 CST 2012
Nothing interesting there. Do you also happen to have the service and
worker logs?
On Mon, 2012-01-16 at 11:05 -0600, Ketan Maheshwari wrote:
> Hi Mihael,
>
>
> I could reproduce this timeout exception on OSG with catsn Swift jobs.
>
>
> These are 100 jobs with a data size of 10MB each. So, 2000MB of data
> movement in all.
>
>
> I tried with 1 worker running on a single OSG site. I tried three
> different OSG sites: Nebraska, UChicago and RENCI.
>
>
> In each of these cases, I run into the following timeout after ~4
> minutes of run (15-70 jobs complete during this period) . :
>
>
> Timeout
> org.globus.cog.karajan.workflow.service.TimeoutException: Handler(562,
> PUT): timed out receiving request. Last time 940817-011255.807, now:
> 120115-194100.072
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.handleTimeout(RequestHandler.java:124)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:123)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:116)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> Command(168, SUBMITJOB): handling reply timeout;
> sendReqTime=120115-193900.255, sendTime=120115-193900.255,
> now=120115-194100.416, channel=SC-null
>
>
> This is followed by messages similar to the above last line but the
> progress of workflow halts.
>
>
> Here is the tarball of the
> experiment: http://ci.uchicago.edu/~ketan/catsn-exp-formihael.tgz
>
>
> It contains a README which has the steps to run: basically
> start-service on localhost -> start worker on OSG site -> run swift
>
>
> Regards,
> --
> Ketan
>
>
>
More information about the Swift-devel
mailing list