[Swift-devel] coasters about half the jobs
Michael Wilde
wilde at mcs.anl.gov
Fri Feb 18 21:58:48 CST 2011
It fails for 10- and 1-job runs as well.
- Mike
----- Original Message -----
> Just tried this on Beagle with similar workload to the one that shoes
> the original problem. I got:
>
> Progress: Stage in:2486 Submitting:14
> Progress: Stage in:1712 Submitting:787 Submitted:1
> queuedsize > 0 but no job dequeued. Queued: {}
> java.lang.Throwable
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
>
> Logs are in:
>
> login1$ cat out.pdb.all.00
> Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog modified
> locally)
>
> Output on stdout/err is below.
>
> Thanks!
>
> Mike
>
> RunID: 20110218-2137-v87vupcc
> Progress:
> SwiftScript trace: 10gs-1
> SwiftScript trace: 1a1u-1
> SwiftScript trace: 1m3g-1
> SwiftScript trace: 1a1x-1
> SwiftScript trace: 1a1m-1
> SwiftScript trace: 1a12-1
> SwiftScript trace: 1m62-1
> SwiftScript trace: 1a22-1
> SwiftScript trace: 121p-1
> SwiftScript trace: 1a4p-1
> SwiftScript trace: 1m6b-1
> SwiftScript trace: 1m7b-1
> SwiftScript trace: 1m9i-1
> SwiftScript trace: 1mi1-1
> SwiftScript trace: 1m6b-2
> SwiftScript trace: 1a22-2
> SwiftScript trace: 1mfg-1
> SwiftScript trace: 1m9j-1
> SwiftScript trace: 1a1w-1
> SwiftScript trace: 1mdi-1
> SwiftScript trace: 1mq1-1
> SwiftScript trace: 1mp1-1
> SwiftScript trace: 1mq0-1
> SwiftScript trace: 1mk3-1
> SwiftScript trace: 1mj4-1
> SwiftScript trace: 1mil-1
> SwiftScript trace: 1mr1-1
> SwiftScript trace: 1nbq-1
> SwiftScript trace: 1mr8-1
> SwiftScript trace: 1mr1-2
> SwiftScript trace: 1n4m-2
> SwiftScript trace: 1n83-1
> SwiftScript trace: 1mm2-1
> SwiftScript trace: 1nd7-1
> SwiftScript trace: 1nm8-1
> SwiftScript trace: 1n4m-3
> SwiftScript trace: 1nfi-2
> SwiftScript trace: 1nou-2
> SwiftScript trace: 1nou-1
> SwiftScript trace: 1nfi-1
> SwiftScript trace: 1o5e-1
> SwiftScript trace: 1o6u-2
> SwiftScript trace: 1nty-1
> SwiftScript trace: 1mx3-1
> SwiftScript trace: 1n3u-2
> SwiftScript trace: 1muz-1
> SwiftScript trace: 1o86-1
> SwiftScript trace: 1n3u-1
> SwiftScript trace: 1oa8-1
> SwiftScript trace: 1oc0-1
> Progress: uninitialized:3
> Progress: Initializing:1311 Selecting site:1189
> Progress: Selecting site:2499 Initializing site shared directory:1
> Progress: Selecting site:2340 Initializing site shared directory:1
> Stage in:159
> Progress: Stage in:2486 Submitting:14
> Progress: Stage in:1712 Submitting:787 Submitted:1
> queuedsize > 0 but no job dequeued. Queued: {}
> java.lang.Throwable
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> queuedsize > 0 but no job dequeued. Queued: {}
> java.lang.Throwable
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> login1$ finger kelly
>
>
> Logs are on CT net in /home/wilde/mp/mp04:
> cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/
>
> - Mike
>
>
>
> ----- Original Message -----
> > There was a bug in the block allocation scheme that would cause
> > blocks
> > to be kept, in the long run, at about half of what would normally be
> > necessary. This included shutting down perfectly good blocks that
> > could
> > be used for jobs. The effect was more dramatic when the maximum
> > block
> > size was 1.
> >
> > I committed a fix for this in the stable branch (cog r3052). If
> > you've
> > experienced the above, you could give this a try. It would also be
> > helpful if you gave it a try anyway, just to check if things are
> > going
> > ok.
> >
> > Mihael
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list