[Swift-devel] coasters about half the jobs

Mihael Hategan hategan at mcs.anl.gov
Fri Feb 18 22:01:59 CST 2011


Thanks.

On Fri, 2011-02-18 at 21:58 -0600, Michael Wilde wrote:
> It fails for 10- and 1-job runs as well.
> 
> - Mike
> 
> 
> ----- Original Message -----
> > Just tried this on Beagle with similar workload to the one that shoes
> > the original problem. I got:
> > 
> > Progress: Stage in:2486 Submitting:14
> > Progress: Stage in:1712 Submitting:787 Submitted:1
> > queuedsize > 0 but no job dequeued. Queued: {}
> > java.lang.Throwable
> > at
> > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> > 
> > Logs are in:
> > 
> > login1$ cat out.pdb.all.00
> > Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog modified
> > locally)
> > 
> > Output on stdout/err is below.
> > 
> > Thanks!
> > 
> > Mike
> > 
> > RunID: 20110218-2137-v87vupcc
> > Progress:
> > SwiftScript trace: 10gs-1
> > SwiftScript trace: 1a1u-1
> > SwiftScript trace: 1m3g-1
> > SwiftScript trace: 1a1x-1
> > SwiftScript trace: 1a1m-1
> > SwiftScript trace: 1a12-1
> > SwiftScript trace: 1m62-1
> > SwiftScript trace: 1a22-1
> > SwiftScript trace: 121p-1
> > SwiftScript trace: 1a4p-1
> > SwiftScript trace: 1m6b-1
> > SwiftScript trace: 1m7b-1
> > SwiftScript trace: 1m9i-1
> > SwiftScript trace: 1mi1-1
> > SwiftScript trace: 1m6b-2
> > SwiftScript trace: 1a22-2
> > SwiftScript trace: 1mfg-1
> > SwiftScript trace: 1m9j-1
> > SwiftScript trace: 1a1w-1
> > SwiftScript trace: 1mdi-1
> > SwiftScript trace: 1mq1-1
> > SwiftScript trace: 1mp1-1
> > SwiftScript trace: 1mq0-1
> > SwiftScript trace: 1mk3-1
> > SwiftScript trace: 1mj4-1
> > SwiftScript trace: 1mil-1
> > SwiftScript trace: 1mr1-1
> > SwiftScript trace: 1nbq-1
> > SwiftScript trace: 1mr8-1
> > SwiftScript trace: 1mr1-2
> > SwiftScript trace: 1n4m-2
> > SwiftScript trace: 1n83-1
> > SwiftScript trace: 1mm2-1
> > SwiftScript trace: 1nd7-1
> > SwiftScript trace: 1nm8-1
> > SwiftScript trace: 1n4m-3
> > SwiftScript trace: 1nfi-2
> > SwiftScript trace: 1nou-2
> > SwiftScript trace: 1nou-1
> > SwiftScript trace: 1nfi-1
> > SwiftScript trace: 1o5e-1
> > SwiftScript trace: 1o6u-2
> > SwiftScript trace: 1nty-1
> > SwiftScript trace: 1mx3-1
> > SwiftScript trace: 1n3u-2
> > SwiftScript trace: 1muz-1
> > SwiftScript trace: 1o86-1
> > SwiftScript trace: 1n3u-1
> > SwiftScript trace: 1oa8-1
> > SwiftScript trace: 1oc0-1
> > Progress: uninitialized:3
> > Progress: Initializing:1311 Selecting site:1189
> > Progress: Selecting site:2499 Initializing site shared directory:1
> > Progress: Selecting site:2340 Initializing site shared directory:1
> > Stage in:159
> > Progress: Stage in:2486 Submitting:14
> > Progress: Stage in:1712 Submitting:787 Submitted:1
> > queuedsize > 0 but no job dequeued. Queued: {}
> > java.lang.Throwable
> > at
> > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> > at
> > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521)
> > at
> > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> > queuedsize > 0 but no job dequeued. Queued: {}
> > java.lang.Throwable
> > at
> > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253)
> > at
> > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521)
> > at
> > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)
> > login1$ finger kelly
> > 
> > 
> > Logs are on CT net in /home/wilde/mp/mp04:
> > cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/
> > 
> > - Mike
> > 
> > 
> > 
> > ----- Original Message -----
> > > There was a bug in the block allocation scheme that would cause
> > > blocks
> > > to be kept, in the long run, at about half of what would normally be
> > > necessary. This included shutting down perfectly good blocks that
> > > could
> > > be used for jobs. The effect was more dramatic when the maximum
> > > block
> > > size was 1.
> > >
> > > I committed a fix for this in the stable branch (cog r3052). If
> > > you've
> > > experienced the above, you could give this a try. It would also be
> > > helpful if you gave it a try anyway, just to check if things are
> > > going
> > > ok.
> > >
> > > Mihael
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 





More information about the Swift-devel mailing list