[Swift-devel] scheduler stuff for Google Summer of Code 2009

Mihael Hategan hategan at mcs.anl.gov
Wed Feb 11 15:33:48 CST 2009


----- Ben Clifford <benc at hawaga.org.uk> wrote:
> 
> On Wed, 11 Feb 2009, Michael Wilde wrote:
> 
> > - scaling swift to 1M+ task workflows, efficiently (streaming the 
> > mappers)
> 
> There's more to this than simple streaming mappers.
> 
> At the moment, everything is built around having a Java object in memory 
> for every piece of data that can be referenced, and that object tends to 
> stick around for a long time (at least as long as that data can be 
> referenced). For example, if you have an array which has a large number of 
> elements, then each of those elements has at least one object in memory 
> representing it, because as long as you have the array in scope, you can 
> say a[1] or a[anything] and thus get to every element.

I do not think that this issue is the bottleneck here. For every application
invocation there is a karajan thread. The fact that one such thread eats
around 10-20k seems to be the problem. By contrast, a piece of Swift data
probably takes less than 1k.

So I think that one order of magnitude improvement could be achieved by 
addressing that 10-20k problem (or by somehow having fewer karajan threads).

> 
> The in-memory implementation of the data model and anything that touches 
> it would need some fairly serious work to cope with having stuff kept out 
> of core; and I think keeping stuff out of core is something that would 
> need to happen.
> 
> (that is, 'streaming mappers' as a phrase seems to deal with "not getting 
> knowledge about data too fast" but does not deal with "forgetting 
> knowledge about data fast enough")
> 
> -- 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list