[Swift-devel] recent error on beagle

Mihael Hategan hategan at mcs.anl.gov
Sat May 21 16:02:39 CDT 2011


On Sat, 2011-05-21 at 12:46 -0400, Glen Hocky wrote:
> Does anyone know what this error means? It just started happening on
> queuedsize > 0 but no job dequeued. Queued: {}
> java.lang.Throwable
>         at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)
>         at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)
>         at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)

This seems to be the same error that Sheri was seeing.

I committed a fix to trunk. The issue is that the account of running
jobs doesn't take into consideration the passing of time, whereas the
account of the allocated blocks does. As time goes by things may get to
a state where there appear to be more running jobs than possible.

This can, however, also be triggered if for some reason the number of
workers ends up being larger than what the service thinks it's starting.
I suspect that in Sheri's run(s) that might also be the case.

Could you let me know if you are running with the stable branch? If
that's the case I will port the fix there too.

Mihael




More information about the Swift-devel mailing list