[Swift-devel] queuedsize > 0 but no job dequeued

Ketan Maheshwari ketancmaheshwari at gmail.com
Wed Aug 31 09:04:26 CDT 2011


Mihael,

I did the run with the debug enabled on coasters. Please find the logs etc,
for this run here:

http://www.ci.uchicago.edu/~ketan/run25.tgz<http://www.ci.uchicago.edu/~ketan/run23.tgz>


Note that the run went well and ran upto 20k jobs without issues. After that
I did not get nodes so I stopped it and resumed it this morning. It ran for
about 1000+ jobs and crashed with the same error message.


Regards,
Ketan

On Tue, Aug 30, 2011 at 3:05 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> Any chance you can re-run this with debug enabled on coasters
> (log4j.logger.org.globus.cog.abstraction.coaster=DEBUG)?
>
> On Mon, 2011-08-29 at 20:55 -0700, Mihael Hategan wrote:
> > My bad. The info is in the swift log.
> >
> > On Mon, 2011-08-29 at 20:59 -0500, Ketan Maheshwari wrote:
> > > This is on Beagle. I am running local:pbs from /lustre.
> > >
> > > On Mon, Aug 29, 2011 at 8:30 PM, Mihael Hategan <hategan at mcs.anl.gov>
> > > wrote:
> > >         On Mon, 2011-08-29 at 19:52 -0500, Ketan Maheshwari wrote:
> > >         > Mihael,
> > >         >
> > >         >
> > >         > This run was with automatic coasters. I do not see any
> > >         specific
> > >         > coasters.log file written during this run in .globus/coaster
> > >         nor in
> > >         > the run's work dir.
> > >
> > >
> > >         It's on the remote site in .globus/coasters.
> > >
> > >         >
> > >         >
> > >         > Ketan
> > >         >
> > >         > On Mon, Aug 29, 2011 at 7:16 PM, Mihael Hategan
> > >         <hategan at mcs.anl.gov>
> > >         > wrote:
> > >         >         Can I have the coasters log please?
> > >         >
> > >         >
> > >         >         On Sun, 2011-08-28 at 16:47 -0500, Ketan Maheshwari
> > >         wrote:
> > >         >         > Hello,
> > >         >         >
> > >         >         >
> > >         >         > I remember this error happened in the past with
> > >         Glen's and
> > >         >         Sheri's
> > >         >         > runs. I saw this today again on Beagle with 0.93
> > >         while
> > >         >         running the
> > >         >         > DSSAT run.
> > >         >         >
> > >         >         >
> > >         >         > The run stops with the following complete message:
> > >         >         >
> > >         >         >
> > >         >         > queuedsize > 0 but no job dequeued. Queued: {}
> > >         >         > java.lang.Throwable
> > >         >         >     at
> > >         >         >
> > >         >
> > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:269)
> > >         >         >     at
> > >         >         >
> > >         >
> > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:539)
> > >         >         >     at
> > >         >         >
> > >         >
> > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:110)
> > >         >         > queuedsize > 0 but no job dequeued. Queued: {}
> > >         >         > java.lang.Throwable
> > >         >         >     at
> > >         >         >
> > >         >
> > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:269)
> > >         >         >     at
> > >         >         >
> > >         >
> > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:539)
> > >         >         >     at
> > >         >         >
> > >         >
> > >
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:110)
> > >         >         > Progress:  time: Sun, 28 Aug 2011 13:34:26 -0600
> > >         >          Submitted:76
> > >         >         >  Active:23  Checking status:1  Finished
> > >         successfully:597
> > >         >         >
> > >         >         >
> > >         >         >
> > >         >         >
> > >         >         > The logs, properties and sources for this run are:
> > >         >         > http://www.ci.uchicago.edu/~ketan/run23.tgz
> > >         >         >
> > >         >         >
> > >         >         > Regards,
> > >         >         > --
> > >         >         > Ketan
> > >         >         >
> > >         >         >
> > >         >         >
> > >         >
> > >         >         > _______________________________________________
> > >         >         > Swift-devel mailing list
> > >         >         > Swift-devel at ci.uchicago.edu
> > >         >         >
> > >         >
> > >
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >         >
> > >         >
> > >         >
> > >         >
> > >         >
> > >         >
> > >         > --
> > >         > Ketan
> > >         >
> > >         >
> > >         >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Ketan
> > >
> > >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110831/b520ba1e/attachment.html>


More information about the Swift-devel mailing list