[Swift-devel] recent error on beagle

Mihael Hategan hategan at mcs.anl.gov
Thu Jun 2 15:39:42 CDT 2011


Yes. Sorry about the delay. The word is that I need to backport the
patch from trunk to 0.92 and then have a patch release. I was waiting
for words from other folks, and I got that yesterday. I will be doing
this as soon as I have some time, which is probably somewhere between
today and next Tuesday.

Mihael

On Thu, 2011-06-02 at 15:24 -0500, Tim Armstrong wrote:
> Any word on this bug?  I have a nice use-case for SwiftR where it
> would be very handy to take advantage of Swift's dynamic resource
> procurement.
> 
> - Tim
> 
> On Thu, May 26, 2011 at 3:41 PM, Mihael Hategan <hategan at mcs.anl.gov>
> wrote:
>         Given that this has now been reported a number of times, it
>         may make
>         sense to backport the fix from trunk and make a patch release
>         for 0.92.
>         
>         Objections?
>         
>         
>         On Thu, 2011-05-26 at 14:59 -0500, Tim Armstrong wrote:
>         > Hi,
>         >   I've encountered this issue with SwiftR, running release
>         0.92 from
>         > the svn repository.  The issue occurs when
>         > GLOBUS::maxWallTime="03:55:00" in tc and maxTime is 4 hours
>         in
>         > sites.xml.  After 5 minutes (or whatever the difference is
>         between the
>         > two times), I get the exception copied below.  A tarball is
>         attached
>         > with the logs, script, etc.  replicate.sh shows how to
>         replicate the
>         > issue on PADS.
>         >
>         > Assuming that my problem is the same as the others, it would
>         be good
>         > if the fix could be merged to release 0.92, as I'm trying to
>         bundle
>         > stable swift releases with SwiftR.
>         >
>         > - Tim
>         >
>         >
>         > Swift svn swift-r4336 cog-r3096 (cog modified locally)
>         >
>         > RunID: 20110526-1317-2c8ybi10
>         > Progress:
>         > SwiftScript trace: top of loop: rserver waiting for input
>         > on, /tmp/nbest/SwiftR/swift.0827/requestpipe
>         > Progress:  Active:1
>         > Progress:  Finished successfully:1
>         > SwiftScript trace: rserver: got
>         > dir, /tmp/nbest/SwiftR/requests.P09626/R0000007
>         > Progress:  uninitialized:1  Finished successfully:1
>         > Progress:  Submitted:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > Progress:  Active:1  Finished successfully:1
>         > queuedsize > 0 but no job dequeued. Queued: {}
>         > java.lang.Throwable
>         >         at
>         >
>         org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)
>         >         at
>         >
>         org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)
>         >         at
>         >
>         org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
>         > queuedsize > 0 but no job dequeued. Queued: {}
>         > java.lang.Throwable
>         >         at
>         >
>         org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)
>         >         at
>         >
>         org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)
>         >         at
>         >
>         org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
>         > Progress:  Finished successfully:1 Failed but can retry:1
>         >
>         >
>         > On Sun, May 22, 2011 at 1:51 PM, Mihael Hategan
>         <hategan at mcs.anl.gov>
>         > wrote:
>         >         The second one looks to me like a coaster problem.
>         Can't say
>         >         much about
>         >         the first issue.
>         >
>         >         Can you try with plain pbs if you want to test the
>         pbs
>         >         provider?
>         >
>         >         Mihael
>         >
>         >
>         >         On Sun, 2011-05-22 at 08:39 -0500, ketan wrote:
>         >         > I can confirm that the trunk is not usable for pbs
>         provider.
>         >         I am using
>         >         > trunk for submitting jobs on beagle and I see a
>         few
>         >         unexpected things:
>         >         >
>         >         > 1. The stderr is showing inconsistent messages:
>         The results
>         >         are getting
>         >         > written to the output even though stderr doesn't
>         report any.
>         >         > 2. qsub jobs being cancelled inadvertantly: I
>         submitted 40
>         >         of them
>         >         > yesterday, however, only 2 survived today. The log
>         is here:
>         >         >
>         >         >
>         >
>         http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log
>         >         >
>         >         > In addition, the ssh-pbs provider does not seem to
>         be
>         >         working for large
>         >         > runs (it worked for a small number of test runs):
>         Getting
>         >         unexpected
>         >         > stdouts. Following is the stdout:
>         >         >
>         >         >
>         http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout
>         >         >
>         >         > Following is the log file for the above run:
>         >         >
>         >         >
>         >
>         http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log
>         >         >
>         >         >
>         >         > Ketan
>         >         >
>         >         > On 5/21/11 5:12 PM, Michael Wilde wrote:
>         >         > >
>         >         > > ----- Original Message -----
>         >         > >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky
>         wrote:
>         >         > >>> as I mentioned, I've been running with Mike's
>         swift
>         >         which was
>         >         > >>> patched
>         >         > >>> for beagle. are all the things that make
>         running on
>         >         beagle work in
>         >         > >>> trunk?
>         >         > >> No idea.
>         >         > >>
>         >         > >> Mike?
>         >         > > Justin, working with Ketan, just applied changes
>         to trunk
>         >         which should make it work now on Beagle (or any Cray
>         XT5+ or
>         >         XE).  This uses a different set of sites.xml tags
>         than the
>         >         prototype in the current Beagle swift 0.92.1 module.
>         Justin
>         >         has a note on this at:
>         >         > >
>          https://sites.google.com/site/swiftdevel/sites/pbs/cray
>         >         > >
>         >         > > It was working before for one-node worker jobs;
>         now it
>         >         should work for multi-node worker jobs as well.
>         >         > >
>         >         > > Justin and Ketan should comment on the state of
>         testing
>         >         and readiness of this trunk feature.  Don't try
>         trunk on
>         >         Beagle till they give the go-ahead.
>         >         > >
>         >         > > - Mike
>         >         > >
>         >         > >>>   If so i'll update to the latest and test. I
>         don't
>         >         think I'm
>         >         > >>> using stable...
>         >         > >> Ok
>         >         > >>
>         >         > >> Mihael
>         >         > _______________________________________________
>         >         > Swift-devel mailing list
>         >         > Swift-devel at ci.uchicago.edu
>         >         >
>         http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>         >
>         >
>         >         _______________________________________________
>         >         Swift-devel mailing list
>         >         Swift-devel at ci.uchicago.edu
>         >
>         http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>         >
>         >
>         
>         
>         
> 





More information about the Swift-devel mailing list