[Swift-devel] coasters about half the jobs

Mihael Hategan hategan at mcs.anl.gov
Fri Feb 18 22:35:48 CST 2011


And sorry about that.

r3053 should fix that.

On Fri, 2011-02-18 at 20:01 -0800, Mihael Hategan wrote:
> Thanks.
> 
> On Fri, 2011-02-18 at 21:58 -0600, Michael Wilde wrote:
> > It fails for 10- and 1-job runs as well.
> > 
> > - Mike
> > 
> > 
> > ----- Original Message -----
> > > Just tried this on Beagle with similar workload to the one that shoes
> > > the original problem. I got:
> > > 
> > > Progress: Stage in:2486 Submitting:14
> > > Progress: Stage in:1712 Submitting:787 Submitted:1
> > > queuedsize > 0 but no job dequeued. Queued: {}
> > > java.lang.Throwable
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> > > 
> > > Logs are in:
> > > 
> > > login1$ cat out.pdb.all.00
> > > Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog modified
> > > locally)
> > > 
> > > Output on stdout/err is below.
> > > 
> > > Thanks!
> > > 
> > > Mike
> > > 
> > > RunID: 20110218-2137-v87vupcc
> > > Progress:
> > > SwiftScript trace: 10gs-1
> > > SwiftScript trace: 1a1u-1
> > > SwiftScript trace: 1m3g-1
> > > SwiftScript trace: 1a1x-1
> > > SwiftScript trace: 1a1m-1
> > > SwiftScript trace: 1a12-1
> > > SwiftScript trace: 1m62-1
> > > SwiftScript trace: 1a22-1
> > > SwiftScript trace: 121p-1
> > > SwiftScript trace: 1a4p-1
> > > SwiftScript trace: 1m6b-1
> > > SwiftScript trace: 1m7b-1
> > > SwiftScript trace: 1m9i-1
> > > SwiftScript trace: 1mi1-1
> > > SwiftScript trace: 1m6b-2
> > > SwiftScript trace: 1a22-2
> > > SwiftScript trace: 1mfg-1
> > > SwiftScript trace: 1m9j-1
> > > SwiftScript trace: 1a1w-1
> > > SwiftScript trace: 1mdi-1
> > > SwiftScript trace: 1mq1-1
> > > SwiftScript trace: 1mp1-1
> > > SwiftScript trace: 1mq0-1
> > > SwiftScript trace: 1mk3-1
> > > SwiftScript trace: 1mj4-1
> > > SwiftScript trace: 1mil-1
> > > SwiftScript trace: 1mr1-1
> > > SwiftScript trace: 1nbq-1
> > > SwiftScript trace: 1mr8-1
> > > SwiftScript trace: 1mr1-2
> > > SwiftScript trace: 1n4m-2
> > > SwiftScript trace: 1n83-1
> > > SwiftScript trace: 1mm2-1
> > > SwiftScript trace: 1nd7-1
> > > SwiftScript trace: 1nm8-1
> > > SwiftScript trace: 1n4m-3
> > > SwiftScript trace: 1nfi-2
> > > SwiftScript trace: 1nou-2
> > > SwiftScript trace: 1nou-1
> > > SwiftScript trace: 1nfi-1
> > > SwiftScript trace: 1o5e-1
> > > SwiftScript trace: 1o6u-2
> > > SwiftScript trace: 1nty-1
> > > SwiftScript trace: 1mx3-1
> > > SwiftScript trace: 1n3u-2
> > > SwiftScript trace: 1muz-1
> > > SwiftScript trace: 1o86-1
> > > SwiftScript trace: 1n3u-1
> > > SwiftScript trace: 1oa8-1
> > > SwiftScript trace: 1oc0-1
> > > Progress: uninitialized:3
> > > Progress: Initializing:1311 Selecting site:1189
> > > Progress: Selecting site:2499 Initializing site shared directory:1
> > > Progress: Selecting site:2340 Initializing site shared directory:1
> > > Stage in:159
> > > Progress: Stage in:2486 Submitting:14
> > > Progress: Stage in:1712 Submitting:787 Submitted:1
> > > queuedsize > 0 but no job dequeued. Queued: {}
> > > java.lang.Throwable
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> > > queuedsize > 0 but no job dequeued. Queued: {}
> > > java.lang.Throwable
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> > > login1$ finger kelly
> > > 
> > > 
> > > Logs are on CT net in /home/wilde/mp/mp04:
> > > cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/
> > > 
> > > - Mike
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > There was a bug in the block allocation scheme that would cause
> > > > blocks
> > > > to be kept, in the long run, at about half of what would normally be
> > > > necessary. This included shutting down perfectly good blocks that
> > > > could
> > > > be used for jobs. The effect was more dramatic when the maximum
> > > > block
> > > > size was 1.
> > > >
> > > > I committed a fix for this in the stable branch (cog r3052). If
> > > > you've
> > > > experienced the above, you could give this a try. It would also be
> > > > helpful if you gave it a try anyway, just to check if things are
> > > > going
> > > > ok.
> > > >
> > > > Mihael
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > > --
> > > Michael Wilde
> > > Computation Institute, University of Chicago
> > > Mathematics and Computer Science Division
> > > Argonne National Laboratory
> > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list