[Swift-devel] timeout on OSG with coasters provider staging

Ketan Maheshwari ketancmaheshwari at gmail.com
Thu Jan 19 17:22:19 CST 2012


Here is another worker log this one is for a real SCEC run:

ci.uchicago.edu/~ketan/timeout_worker_log_scec.txt

On Thu, Jan 19, 2012 at 1:54 PM, Ketan Maheshwari <
ketancmaheshwari at gmail.com> wrote:

> Mihael,
>
> I have the logs now. Filed as bug 690:
>
> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=690
>
> Regards,
> Ketan
>
> On Mon, Jan 16, 2012 at 2:24 PM, Ketan Maheshwari <
> ketancmaheshwari at gmail.com> wrote:
>
>> Mihael,
>>
>> Please find service log here:
>> http://ci.uchicago.edu/~ketan/swift.log.tar.gz
>>
>> worker logs seems to have lost. I'll see if I can find'em.
>>
>> Regards,
>> Ketan
>>
>> On Mon, Jan 16, 2012 at 1:38 PM, Mihael Hategan <hategan at mcs.anl.gov>wrote:
>>
>>> Nothing interesting there. Do you also happen to have the service and
>>> worker logs?
>>>
>>> On Mon, 2012-01-16 at 11:05 -0600, Ketan Maheshwari wrote:
>>> > Hi Mihael,
>>> >
>>> >
>>> > I could reproduce this timeout exception on OSG with catsn Swift jobs.
>>> >
>>> >
>>> > These are 100 jobs with a data size of 10MB each. So, 2000MB of data
>>> > movement in all.
>>> >
>>> >
>>> > I tried with 1 worker running on a single OSG site. I tried three
>>> > different OSG sites: Nebraska, UChicago and RENCI.
>>> >
>>> >
>>> > In each of these cases, I run into the following timeout after ~4
>>> > minutes of run (15-70 jobs complete during this period) . :
>>> >
>>> >
>>> > Timeout
>>> > org.globus.cog.karajan.workflow.service.TimeoutException: Handler(562,
>>> > PUT): timed out receiving request. Last time 940817-011255.807, now:
>>> > 120115-194100.072
>>> > at
>>> >
>>> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.handleTimeout(RequestHandler.java:124)
>>> > at
>>> >
>>> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131)
>>> > at
>>> >
>>> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:123)
>>> > at
>>> >
>>> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:116)
>>> > at java.util.TimerThread.mainLoop(Timer.java:512)
>>> > at java.util.TimerThread.run(Timer.java:462)
>>> > Command(168, SUBMITJOB): handling reply timeout;
>>> > sendReqTime=120115-193900.255, sendTime=120115-193900.255,
>>> > now=120115-194100.416, channel=SC-null
>>> >
>>> >
>>> > This is followed by messages similar to the above last line but the
>>> > progress of workflow halts.
>>> >
>>> >
>>> > Here is the tarball of the
>>> > experiment: http://ci.uchicago.edu/~ketan/catsn-exp-formihael.tgz
>>> >
>>> >
>>> > It contains a README which has the steps to run: basically
>>> > start-service on localhost -> start worker on OSG site -> run swift
>>> >
>>> >
>>> > Regards,
>>> > --
>>> > Ketan
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>
>>
>> --
>> Ketan
>>
>>
>>
>
>
> --
> Ketan
>
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120119/491e6f4d/attachment.html>


More information about the Swift-devel mailing list