[Swift-devel] timeout on OSG with coasters provider staging

Ketan Maheshwari ketancmaheshwari at gmail.com
Mon Jan 16 14:24:35 CST 2012


Mihael,

Please find service log here:
http://ci.uchicago.edu/~ketan/swift.log.tar.gz

worker logs seems to have lost. I'll see if I can find'em.

Regards,
Ketan

On Mon, Jan 16, 2012 at 1:38 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> Nothing interesting there. Do you also happen to have the service and
> worker logs?
>
> On Mon, 2012-01-16 at 11:05 -0600, Ketan Maheshwari wrote:
> > Hi Mihael,
> >
> >
> > I could reproduce this timeout exception on OSG with catsn Swift jobs.
> >
> >
> > These are 100 jobs with a data size of 10MB each. So, 2000MB of data
> > movement in all.
> >
> >
> > I tried with 1 worker running on a single OSG site. I tried three
> > different OSG sites: Nebraska, UChicago and RENCI.
> >
> >
> > In each of these cases, I run into the following timeout after ~4
> > minutes of run (15-70 jobs complete during this period) . :
> >
> >
> > Timeout
> > org.globus.cog.karajan.workflow.service.TimeoutException: Handler(562,
> > PUT): timed out receiving request. Last time 940817-011255.807, now:
> > 120115-194100.072
> > at
> >
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.handleTimeout(RequestHandler.java:124)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:123)
> > at
> >
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:116)
> > at java.util.TimerThread.mainLoop(Timer.java:512)
> > at java.util.TimerThread.run(Timer.java:462)
> > Command(168, SUBMITJOB): handling reply timeout;
> > sendReqTime=120115-193900.255, sendTime=120115-193900.255,
> > now=120115-194100.416, channel=SC-null
> >
> >
> > This is followed by messages similar to the above last line but the
> > progress of workflow halts.
> >
> >
> > Here is the tarball of the
> > experiment: http://ci.uchicago.edu/~ketan/catsn-exp-formihael.tgz
> >
> >
> > It contains a README which has the steps to run: basically
> > start-service on localhost -> start worker on OSG site -> run swift
> >
> >
> > Regards,
> > --
> > Ketan
> >
> >
> >
>
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120116/c39524de/attachment.html>


More information about the Swift-devel mailing list