[Swift-devel] recent error on beagle
Mihael Hategan
hategan at mcs.anl.gov
Thu Jun 2 15:39:42 CDT 2011
Yes. Sorry about the delay. The word is that I need to backport the
patch from trunk to 0.92 and then have a patch release. I was waiting
for words from other folks, and I got that yesterday. I will be doing
this as soon as I have some time, which is probably somewhere between
today and next Tuesday.
Mihael
On Thu, 2011-06-02 at 15:24 -0500, Tim Armstrong wrote:
> Any word on this bug? I have a nice use-case for SwiftR where it
> would be very handy to take advantage of Swift's dynamic resource
> procurement.
>
> - Tim
>
> On Thu, May 26, 2011 at 3:41 PM, Mihael Hategan <hategan at mcs.anl.gov>
> wrote:
> Given that this has now been reported a number of times, it
> may make
> sense to backport the fix from trunk and make a patch release
> for 0.92.
>
> Objections?
>
>
> On Thu, 2011-05-26 at 14:59 -0500, Tim Armstrong wrote:
> > Hi,
> > I've encountered this issue with SwiftR, running release
> 0.92 from
> > the svn repository. The issue occurs when
> > GLOBUS::maxWallTime="03:55:00" in tc and maxTime is 4 hours
> in
> > sites.xml. After 5 minutes (or whatever the difference is
> between the
> > two times), I get the exception copied below. A tarball is
> attached
> > with the logs, script, etc. replicate.sh shows how to
> replicate the
> > issue on PADS.
> >
> > Assuming that my problem is the same as the others, it would
> be good
> > if the fix could be merged to release 0.92, as I'm trying to
> bundle
> > stable swift releases with SwiftR.
> >
> > - Tim
> >
> >
> > Swift svn swift-r4336 cog-r3096 (cog modified locally)
> >
> > RunID: 20110526-1317-2c8ybi10
> > Progress:
> > SwiftScript trace: top of loop: rserver waiting for input
> > on, /tmp/nbest/SwiftR/swift.0827/requestpipe
> > Progress: Active:1
> > Progress: Finished successfully:1
> > SwiftScript trace: rserver: got
> > dir, /tmp/nbest/SwiftR/requests.P09626/R0000007
> > Progress: uninitialized:1 Finished successfully:1
> > Progress: Submitted:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > Progress: Active:1 Finished successfully:1
> > queuedsize > 0 but no job dequeued. Queued: {}
> > java.lang.Throwable
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> > queuedsize > 0 but no job dequeued. Queued: {}
> > java.lang.Throwable
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)
> > at
> >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> > Progress: Finished successfully:1 Failed but can retry:1
> >
> >
> > On Sun, May 22, 2011 at 1:51 PM, Mihael Hategan
> <hategan at mcs.anl.gov>
> > wrote:
> > The second one looks to me like a coaster problem.
> Can't say
> > much about
> > the first issue.
> >
> > Can you try with plain pbs if you want to test the
> pbs
> > provider?
> >
> > Mihael
> >
> >
> > On Sun, 2011-05-22 at 08:39 -0500, ketan wrote:
> > > I can confirm that the trunk is not usable for pbs
> provider.
> > I am using
> > > trunk for submitting jobs on beagle and I see a
> few
> > unexpected things:
> > >
> > > 1. The stderr is showing inconsistent messages:
> The results
> > are getting
> > > written to the output even though stderr doesn't
> report any.
> > > 2. qsub jobs being cancelled inadvertantly: I
> submitted 40
> > of them
> > > yesterday, however, only 2 survived today. The log
> is here:
> > >
> > >
> >
> http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log
> > >
> > > In addition, the ssh-pbs provider does not seem to
> be
> > working for large
> > > runs (it worked for a small number of test runs):
> Getting
> > unexpected
> > > stdouts. Following is the stdout:
> > >
> > >
> http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout
> > >
> > > Following is the log file for the above run:
> > >
> > >
> >
> http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log
> > >
> > >
> > > Ketan
> > >
> > > On 5/21/11 5:12 PM, Michael Wilde wrote:
> > > >
> > > > ----- Original Message -----
> > > >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky
> wrote:
> > > >>> as I mentioned, I've been running with Mike's
> swift
> > which was
> > > >>> patched
> > > >>> for beagle. are all the things that make
> running on
> > beagle work in
> > > >>> trunk?
> > > >> No idea.
> > > >>
> > > >> Mike?
> > > > Justin, working with Ketan, just applied changes
> to trunk
> > which should make it work now on Beagle (or any Cray
> XT5+ or
> > XE). This uses a different set of sites.xml tags
> than the
> > prototype in the current Beagle swift 0.92.1 module.
> Justin
> > has a note on this at:
> > > >
> https://sites.google.com/site/swiftdevel/sites/pbs/cray
> > > >
> > > > It was working before for one-node worker jobs;
> now it
> > should work for multi-node worker jobs as well.
> > > >
> > > > Justin and Ketan should comment on the state of
> testing
> > and readiness of this trunk feature. Don't try
> trunk on
> > Beagle till they give the go-ahead.
> > > >
> > > > - Mike
> > > >
> > > >>> If so i'll update to the latest and test. I
> don't
> > think I'm
> > > >>> using stable...
> > > >> Ok
> > > >>
> > > >> Mihael
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > >
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> >
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
>
>
>
>
More information about the Swift-devel
mailing list