[Swift-devel] MODIS freezes on Midway

David Kelly davidk at ci.uchicago.edu
Wed Mar 20 14:05:56 CDT 2013


Yadu, 

As a test, I just untarred swiftdemo.v04.tgz, removed the SBATCH_RESERVATION line from setup.sh and was able to run on midway. Send me a message on Skype when you have a few minutes and we can take a closer look at this. 

David 

----- Original Message -----

> From: "Yadu Nand" <yadudoc1729 at gmail.com>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "swift-devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, March 20, 2013 12:19:01 PM
> Subject: Re: [Swift-devel] MODIS freezes on Midway

> Hi everyone,

> David, please find the submit files attached to the mail.
> I am running the 5 different variants of modis,
> (local,midway,beagle,uc3, multiple)
> from the test system we have. I am not setting the SBATCH_RESERVATION
> variable
> in the setup scripts.

> Ketan, for modis_local and modis_midway, I am setting <filesystem
> provider="local"/>
> and interestingly, modis_local works fine from the stress test apps
> group now, the rest
> fail though. I think there is a different issue here now.

> It looks like most failures I'm seeing now is from perl and tc.data
> issues. I've attached the
> modis.stdout from the 5 testcases, if you'd like to take a look. The
> tc.data supplied is the
> same as the ones that came with swiftdemo.

> -Yadu

> On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde < wilde at mcs.anl.gov >
> wrote:

> > Likely not needed if its using provider staging.
> 

> > ----- Original Message -----
> 
> > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com >
> 
> > > To: "David Kelly" < davidk at ci.uchicago.edu >
> 
> > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu >
> 
> > > Sent: Tuesday, March 19, 2013 6:04:44 PM
> 
> > > Subject: Re: [Swift-devel] MODIS freezes on Midway
> 
> > >
> 
> > >
> 
> > >
> 
> > > In addition to what David mentioned, from logs, it seems that
> > > your
> 
> > > sites file is missing this line:
> 
> > >
> 
> > >
> 
> > > <filesystem provider="local"/>
> 
> > >
> 
> > >
> 
> > >
> 
> > > Or David, correct me if this is not required in this
> > > configuration.
> 
> > >
> 
> > >
> 
> > >
> 
> > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly <
> > > davidk at ci.uchicago.edu
> 
> > > > wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Yadu,
> 
> > >
> 
> > >
> 
> > > The setup.sh script sets this environment variable:
> 
> > >
> 
> > >
> 
> > > export SBATCH_RESERVATION=osg
> 
> > >
> 
> > >
> 
> > > I believe sbatch is picking up on this and trying to run with
> > > this
> 
> > > reservation, which is likely expired. Can you try unsetting
> 
> > > SBATCH_RESERVATION, commenting out that line in setup.sh and
> > > trying
> 
> > > again?
> 
> > >
> 
> > >
> 
> > > Thanks,
> 
> > > David
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 

> > > From: "Yadu Nand" < yadudoc1729 at gmail.com >
> 
> > > To: "swift-devel" < swift-devel at ci.uchicago.edu >
> 
> > > Sent: Tuesday, March 19, 2013 5:35:41 PM
> 
> > > Subject: [Swift-devel] MODIS freezes on Midway
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Hi,
> 
> > >
> 
> > > I've been running the modis tests on Midway, from the demo that
> > > mike
> 
> > > had shared:
> 
> > >
> 
> > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz
> 
> > >
> 
> > >
> 
> > > In the logs(please see attachment) I see a fail message : "
> > > sbatch:
> 
> > > error: Batch job
> 
> > > submission failed: Requested reservation is i nvalid"
> 
> > >
> 
> > >
> 
> > > The fact that no error messages are shown on stdout doesn't help,
> 
> > > plus, swift just
> 
> > > seems to hang forever. * Please help! *
> 
> > >
> 
> > >
> 
> > > I see test.midway show no progress, with just the same status for
> 
> > > about 20mins:
> 
> > >
> 
> > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35
> 
> > > Submitted:65
> 
> > >
> 
> > >
> 
> > > After this, I tried to kill by Ctrl+C and then I get a few error
> 
> > > messages :
> 
> > >
> 
> > > Failed to shut down block: Block 0319-5807480-000000
> > > (16x3540.000s)
> 
> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> 
> > > Can only cancel an active task
> 
> > > at
> 
> > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196)
> 
> > > at
> 
> > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85)
> 
> > > at
> 
> > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69)
> 
> > > at
> 
> > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106)
> 
> > > at
> 
> > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271)
> 
> > > at
> 
> > > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28)
> 
> > > at
> 
> > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88)
> 
> > > at
> 
> > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519)
> 
> > > at
> 
> > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86)
> 
> > > at
> 
> > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115)
> 
> > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz
> 
> > >
> 
> > > --
> 
> > > Yadu Nand B
> 
> > > _______________________________________________
> 
> > > Swift-devel mailing list
> 
> > > Swift-devel at ci.uchicago.edu
> 
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> > >
> 
> > >
> 
> > > _______________________________________________
> 
> > > Swift-devel mailing list
> 
> > > Swift-devel at ci.uchicago.edu
> 
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > --
> 
> > > Ketan
> 
> > >
> 
> > >
> 
> > > _______________________________________________
> 
> > > Swift-devel mailing list
> 
> > > Swift-devel at ci.uchicago.edu
> 
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> > >
> 
> > _______________________________________________
> 
> > Swift-devel mailing list
> 
> > Swift-devel at ci.uchicago.edu
> 
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 

> --
> Yadu Nand B
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130320/11d79a82/attachment.html>


More information about the Swift-devel mailing list