[Swift-devel] Progress on Bug 690? - Re: timeout on OSG with coasters provider staging

Mihael Hategan hategan at mcs.anl.gov
Wed Jan 25 13:15:12 CST 2012


Sorry. I was with the sshcl provider and the merging. I'll have to look
at it this weekend.

On Wed, 2012-01-25 at 08:33 -0600, Michael Wilde wrote:
> Mihael, Ketan, can you send an update on this, and escalate the priority of resolving this problem?
> 
> A resolution is needed rather urgently for the ExTENCI project.
> 
> Mihael, do you know where the problem lies, and have a strategy for a fix?
> 
> Thanks,
> 
> - Mike
> 
> ----- Original Message -----
> > From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > To: "Mihael Hategan" <hategan at mcs.anl.gov>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Thursday, January 19, 2012 5:22:19 PM
> > Subject: Re: [Swift-devel] timeout on OSG with coasters provider staging
> > Here is another worker log this one is for a real SCEC run:
> > 
> > 
> > ci.uchicago.edu/~ketan/timeout_worker_log_scec.txt
> > 
> > 
> > On Thu, Jan 19, 2012 at 1:54 PM, Ketan Maheshwari <
> > ketancmaheshwari at gmail.com > wrote:
> > 
> > 
> > Mihael,
> > 
> > 
> > I have the logs now. Filed as bug 690:
> > 
> > 
> > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=690
> > 
> > Regards,
> > Ketan
> > 
> > 
> > 
> > 
> > 
> > On Mon, Jan 16, 2012 at 2:24 PM, Ketan Maheshwari <
> > ketancmaheshwari at gmail.com > wrote:
> > 
> > 
> > Mihael,
> > 
> > 
> > Please find service log here:
> > http://ci.uchicago.edu/~ketan/swift.log.tar.gz
> > 
> > worker logs seems to have lost. I'll see if I can find'em.
> > 
> > Regards,
> > Ketan
> > 
> > 
> > 
> > 
> > 
> > On Mon, Jan 16, 2012 at 1:38 PM, Mihael Hategan < hategan at mcs.anl.gov
> > > wrote:
> > 
> > 
> > Nothing interesting there. Do you also happen to have the service and
> > worker logs?
> > 
> > 
> > 
> > 
> > On Mon, 2012-01-16 at 11:05 -0600, Ketan Maheshwari wrote:
> > > Hi Mihael,
> > >
> > >
> > > I could reproduce this timeout exception on OSG with catsn Swift
> > > jobs.
> > >
> > >
> > > These are 100 jobs with a data size of 10MB each. So, 2000MB of data
> > > movement in all.
> > >
> > >
> > > I tried with 1 worker running on a single OSG site. I tried three
> > > different OSG sites: Nebraska, UChicago and RENCI.
> > >
> > >
> > > In each of these cases, I run into the following timeout after ~4
> > > minutes of run (15-70 jobs complete during this period) . :
> > >
> > >
> > > Timeout
> > > org.globus.cog.karajan.workflow.service.TimeoutException:
> > > Handler(562,
> > > PUT): timed out receiving request. Last time 940817-011255.807, now:
> > > 120115-194100.072
> > > at
> > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.handleTimeout(RequestHandler.java:124)
> > > at
> > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131)
> > > at
> > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:123)
> > > at
> > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:116)
> > > at java.util.TimerThread.mainLoop(Timer.java:512)
> > > at java.util.TimerThread.run(Timer.java:462)
> > > Command(168, SUBMITJOB): handling reply timeout;
> > > sendReqTime=120115-193900.255, sendTime=120115-193900.255,
> > > now=120115-194100.416, channel=SC-null
> > >
> > >
> > > This is followed by messages similar to the above last line but the
> > > progress of workflow halts.
> > >
> > >
> > > Here is the tarball of the
> > > experiment: http://ci.uchicago.edu/~ketan/catsn-exp-formihael.tgz
> > >
> > >
> > > It contains a README which has the steps to run: basically
> > > start-service on localhost -> start worker on OSG site -> run swift
> > >
> > >
> > > Regards,
> > > --
> > > Ketan
> > >
> > >
> > >
> > 
> > 
> > 
> > 
> > 
> > 
> > --
> > Ketan
> > 
> > 
> > 
> > 
> > 
> > 
> > --
> > Ketan
> > 
> > 
> > 
> > 
> > 
> > 
> > --
> > Ketan
> > 
> > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 





More information about the Swift-devel mailing list