[Swift-devel] timeout on OSG with coasters provider staging

Ketan Maheshwari ketancmaheshwari at gmail.com
Thu Jan 19 13:54:19 CST 2012


Mihael,

I have the logs now. Filed as bug 690:

https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=690

Regards,
Ketan

On Mon, Jan 16, 2012 at 2:24 PM, Ketan Maheshwari <
ketancmaheshwari at gmail.com> wrote:

> Mihael,
>
> Please find service log here:
> http://ci.uchicago.edu/~ketan/swift.log.tar.gz
>
> worker logs seems to have lost. I'll see if I can find'em.
>
> Regards,
> Ketan
>
> On Mon, Jan 16, 2012 at 1:38 PM, Mihael Hategan <hategan at mcs.anl.gov>wrote:
>
>> Nothing interesting there. Do you also happen to have the service and
>> worker logs?
>>
>> On Mon, 2012-01-16 at 11:05 -0600, Ketan Maheshwari wrote:
>> > Hi Mihael,
>> >
>> >
>> > I could reproduce this timeout exception on OSG with catsn Swift jobs.
>> >
>> >
>> > These are 100 jobs with a data size of 10MB each. So, 2000MB of data
>> > movement in all.
>> >
>> >
>> > I tried with 1 worker running on a single OSG site. I tried three
>> > different OSG sites: Nebraska, UChicago and RENCI.
>> >
>> >
>> > In each of these cases, I run into the following timeout after ~4
>> > minutes of run (15-70 jobs complete during this period) . :
>> >
>> >
>> > Timeout
>> > org.globus.cog.karajan.workflow.service.TimeoutException: Handler(562,
>> > PUT): timed out receiving request. Last time 940817-011255.807, now:
>> > 120115-194100.072
>> > at
>> >
>> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.handleTimeout(RequestHandler.java:124)
>> > at
>> >
>> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131)
>> > at
>> >
>> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:123)
>> > at
>> >
>> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:116)
>> > at java.util.TimerThread.mainLoop(Timer.java:512)
>> > at java.util.TimerThread.run(Timer.java:462)
>> > Command(168, SUBMITJOB): handling reply timeout;
>> > sendReqTime=120115-193900.255, sendTime=120115-193900.255,
>> > now=120115-194100.416, channel=SC-null
>> >
>> >
>> > This is followed by messages similar to the above last line but the
>> > progress of workflow halts.
>> >
>> >
>> > Here is the tarball of the
>> > experiment: http://ci.uchicago.edu/~ketan/catsn-exp-formihael.tgz
>> >
>> >
>> > It contains a README which has the steps to run: basically
>> > start-service on localhost -> start worker on OSG site -> run swift
>> >
>> >
>> > Regards,
>> > --
>> > Ketan
>> >
>> >
>> >
>>
>>
>>
>
>
> --
> Ketan
>
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120119/ad9b05be/attachment.html>


More information about the Swift-devel mailing list