[Swift-devel] MODIS freezes on Midway

Yadu Nand yadudoc1729 at gmail.com
Wed Mar 20 15:20:22 CDT 2013


Hi Mike,

I can do passwordless-ssh to beagle and uc3 from midway.

I am attaching a log from the failures I'm seeing, and David just confirmed
that he is also
seeing the same.

-Yadu

On Thu, Mar 21, 2013 at 1:35 AM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> Yadu, I dont think the beagle test requires you to have any directories on
> beagle.
>
> But it does expect that you have set up your ssh keys so that from the
> midway login host you can do a password-less ssh to beagle.  Test that
> manually before trying test.beagle.
>
> - Mike
>
> ----- Original Message -----
> > From: "Yadu Nand" <yadudoc1729 at gmail.com>
> > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > Cc: "David Kelly" <davidk at ci.uchicago.edu>, "swift-devel" <
> swift-devel at ci.uchicago.edu>
> > Sent: Wednesday, March 20, 2013 2:46:03 PM
> > Subject: Re: [Swift-devel] MODIS freezes on Midway
> >
> >
> >
> > Quick update.
> >
> > The sites.xml file for modis.midway was messed up while copying. Now
> > the tests for local and midway are running fine from the test suite.
> >
> > I'm seeing the test.beagle fail because there isn't a folder in my
> > name on lustre... It fails with
> > Could not submit job
> > Could not start coaster service
> > Task ended before registration was received. Failed to download
> > bootstrap jar.
> >
> > On test.uc3, I can see jobs getting submitted and completed, but it
> > looks like an exception is thrown in perl, halting the test.
> >
> >
> > Sorry about the formatting, I'm mailing from my phone.
> >
> > -yadu
> > On Mar 21, 2013 12:43 AM, "Michael Wilde" < wilde at mcs.anl.gov >
> > wrote:
> >
> >
> > Also the code is now in svn:
> >
> > https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/OSG_2013-03-11/MODIS
> >
> > - Mike
> >
> > ----- Original Message -----
> > > From: "David Kelly" < davidk at ci.uchicago.edu >
> > > To: "Yadu Nand" < yadudoc1729 at gmail.com >
> > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu >, "Michael Wilde"
> > > < wilde at mcs.anl.gov >
> > > Sent: Wednesday, March 20, 2013 2:05:56 PM
> > > Subject: Re: [Swift-devel] MODIS freezes on Midway
> > >
> > >
> > > Yadu,
> > >
> > >
> > > As a test, I just untarred swiftdemo.v04.tgz, removed the
> > > SBATCH_RESERVATION line from setup.sh and was able to run on
> > > midway.
> > > Send me a message on Skype when you have a few minutes and we can
> > > take a closer look at this.
> > >
> > > David
> > >
> > >
> > > ----- Original Message -----
> > >
> > >
> > > From: "Yadu Nand" < yadudoc1729 at gmail.com >
> > > To: "Michael Wilde" < wilde at mcs.anl.gov >
> > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu >
> > > Sent: Wednesday, March 20, 2013 12:19:01 PM
> > > Subject: Re: [Swift-devel] MODIS freezes on Midway
> > >
> > > Hi everyone,
> > >
> > >
> > > David, please find the submit files attached to the mail.
> > > I am running the 5 different variants of modis,
> > > (local,midway,beagle,uc3, multiple)
> > > from the test system we have. I am not setting the
> > > SBATCH_RESERVATION
> > > variable
> > > in the setup scripts.
> > >
> > >
> > > Ketan, for modis_local and modis_midway, I am setting <filesystem
> > > provider="local"/>
> > > and interestingly, modis_local works fine from the stress test apps
> > > group now, the rest
> > > fail though. I think there is a different issue here now.
> > >
> > >
> > > It looks like most failures I'm seeing now is from perl and tc.data
> > > issues. I've attached the
> > > modis.stdout from the 5 testcases, if you'd like to take a look.
> > > The
> > > tc.data supplied is the
> > > same as the ones that came with swiftdemo.
> > >
> > >
> > > -Yadu
> > >
> > >
> > >
> > >
> > > On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde < wilde at mcs.anl.gov
> > > >
> > > wrote:
> > >
> > >
> > > Likely not needed if its using provider staging.
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com >
> > > > To: "David Kelly" < davidk at ci.uchicago.edu >
> > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu >
> > > > Sent: Tuesday, March 19, 2013 6:04:44 PM
> > > > Subject: Re: [Swift-devel] MODIS freezes on Midway
> > > >
> > > >
> > > >
> > > > In addition to what David mentioned, from logs, it seems that
> > > > your
> > > > sites file is missing this line:
> > > >
> > > >
> > > > <filesystem provider="local"/>
> > > >
> > > >
> > > >
> > > > Or David, correct me if this is not required in this
> > > > configuration.
> > > >
> > > >
> > > >
> > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly <
> > > > davidk at ci.uchicago.edu
> > > > > wrote:
> > > >
> > > >
> > > >
> > > >
> > > > Yadu,
> > > >
> > > >
> > > > The setup.sh script sets this environment variable:
> > > >
> > > >
> > > > export SBATCH_RESERVATION=osg
> > > >
> > > >
> > > > I believe sbatch is picking up on this and trying to run with
> > > > this
> > > > reservation, which is likely expired. Can you try unsetting
> > > > SBATCH_RESERVATION, commenting out that line in setup.sh and
> > > > trying
> > > > again?
> > > >
> > > >
> > > > Thanks,
> > > > David
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > > From: "Yadu Nand" < yadudoc1729 at gmail.com >
> > > > To: "swift-devel" < swift-devel at ci.uchicago.edu >
> > > > Sent: Tuesday, March 19, 2013 5:35:41 PM
> > > > Subject: [Swift-devel] MODIS freezes on Midway
> > > >
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I've been running the modis tests on Midway, from the demo that
> > > > mike
> > > > had shared:
> > > >
> > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz
> > > >
> > > >
> > > > In the logs(please see attachment) I see a fail message : "
> > > > sbatch:
> > > > error: Batch job
> > > > submission failed: Requested reservation is i nvalid"
> > > >
> > > >
> > > > The fact that no error messages are shown on stdout doesn't help,
> > > > plus, swift just
> > > > seems to hang forever. * Please help! *
> > > >
> > > >
> > > > I see test.midway show no progress, with just the same status for
> > > > about 20mins:
> > > >
> > > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35
> > > > Submitted:65
> > > >
> > > >
> > > > After this, I tried to kill by Ctrl+C and then I get a few error
> > > > messages :
> > > >
> > > > Failed to shut down block: Block 0319-5807480-000000
> > > > (16x3540.000s)
> > > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > > > Can only cancel an active task
> > > > at
> > > >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196)
> > > > at
> > > >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85)
> > > > at
> > > >
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69)
> > > > at
> > > >
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106)
> > > > at
> > > >
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271)
> > > > at
> > > >
> org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28)
> > > > at
> > > >
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88)
> > > > at
> > > >
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519)
> > > > at
> > > >
> org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86)
> > > > at
> > > >
> org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115)
> > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz
> > > >
> > > > --
> > > > Yadu Nand B
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Ketan
> > > >
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > >
> > >
> > >
> > > --
> > > Yadu Nand B
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > >
> >
>



-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130321/0a22169f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Midway-Modis-UC3-Beagle
Type: application/octet-stream
Size: 13753 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130321/0a22169f/attachment.obj>


More information about the Swift-devel mailing list