[Swift-devel] Data-aware scheduling in Swift ?

Matei Ripeanu matei.ripeanu at gmail.com
Fri Mar 30 06:04:03 CDT 2012


Emalayan, Mike, Justin, all,

 

There are a number of points worth discussing before we fully embark into
this: 

 

First:  We need to better understand what the gains we expect to have on
BG/P from locality.  We know we have sizeable gains on our cluster with data
stored on disk (and where we have much lower cross-section bandwidth).  I
expect that most of these gains are preserved when we use RAM disks on our
cluster. And will stay there as long as we do not have to transfer huge
volumes of data.  Unfortunately we can test this only with 20 nodes - I have
no good intuition about what will happen o BG/P at large scale.

 

Second: We should discuss how key is having this feature on Swift on BG/P
for all the other points we want to prove for the paper.   I think support
for only one of the patterns we look at to optimize with the cross-layer
communication can be demonstrated without (e.g., the one for broadcast)
while the other two (pipelines and gather) can not.     On the other side,
is there a way to run our benchmark scripts on BG/P  (I guess not) to
demonstrate the potential gains if Swift implemented that? Or can we run
(some of) the applications  without Swift on our cluster?

 

Third:  I am afraid getting functionality this into Swift/Coasters is quite
some work.  On the other side Mike suggests a relatively clear
implementation path. (It will probably work for pipelines but I'm not sure
it will work for 'gather')

 

What I suggest:  Let's discuss between ourselves three things before
embarking into changing Swift/Coasters:  (1) we want to increase the
certainty that we'll see performance gains if we implement this,  (2) see
whether there aren't ways to demonstrate (some of) what we  want outside
Swift; (3) re-evaluate the schedule and priorities - we have roughly four
weeks to the deadline.

 

Let me know what you think,

 

-Matei   

 

 

 

 

 

 

 

 

 

From: Emalayan Vairavanathan [mailto:svemalayan at yahoo.com] 
Sent: March-29-12 7:06 PM
To: mosastore at googlegroups.com; matei
Cc: swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] Data-aware scheduling in Swift ?

 

Thank you Jon, Mike and Justin.

 

Having this functionality would be really useful for us to demonstrate how
useful extended attributes are in MosaStore in long term. Further for our SC
paper this is a critical functionality and we need this to support both
pipeline and reduce patters.

 

Mike: I will be happy to help with this. In terms of effort and priories,
how much time we need to spend to get this done? Is it feasible to target
this for our SC paper ? 

 

Justin: We do have numbers for the difference between a local MosaStore
access and a remote access on our cluster. This is what we have published in
CCGrid 2012 (I have attached the paper). But we do not have numbers on BG/P.
I can try it on BG/P and get back to you.

 

Matei: Do you have any suggestion ?

 

Thank you

Emalayan

 

  _____  

From: Michael Wilde <wilde at mcs.anl.gov>
To: Emalayan Vairavanathan <svemalayan at yahoo.com> 
Cc: MosaStore <mosastore at googlegroups.com>; swift-devel at ci.uchicago.edu 
Sent: Thursday, 29 March 2012 9:15 AM
Subject: Re: [Swift-devel] Data-aware scheduling in Swift ?


Swift will place an app() call on any free node. (As Jon just replied, while
I was writing this...)

If we want to do an experiment with some kind of data affinity, we can try
the following hack:

- Stage-A returns the node that it ran on
- swift script passes that as an arg "preferredNode(nodeName) to Stage-B
- scheduler tries to place Stage-B on the coaster named nodeName.

Its that last part thats the trickiest, as this will require a mod to the
scheduler. And it gets trickier if the scheduler needs to try to defer
Stage-B until nodeName can take a new job.  It *might* be easier, in a first
pass, to only place STage-B on nodeName if nodeName has a free job slot,
else to place it anywhere.

But all of this will require going into the coaster scheduler code.

I suggest we do this as a joint effort; I can try, with help from Mihael and
Justin, to locate the code that we'd need to modify, if you are willing to
do some experiments and hacking.

- Mike


----- Original Message -----
> From: "Emalayan Vairavanathan" <svemalayan at yahoo.com>
> To: swift-devel at ci.uchicago.edu
> Cc: "MosaStore" <mosastore at googlegroups.com>
> Sent: Thursday, March 29, 2012 10:59:41 AM
> Subject: [Swift-devel] Data-aware scheduling in Swift ?
> Hi All,
> 
> 
> I have a question about how swift schedules computations.
> 
> 
> Suppose there are two computation stages namely Stage-A and Stage-B in
> an application. Stage-A produces the data and Stage-B consumes the
> data . Could you please tell me how swift schedules these
> computations? Does it schedules Stage-A and Stage-B on the same node
> or on multiple nodes?
> Is it possible to configure the swift to schedules these computations
> on the same node (or is this the default behavior of swift ) ?
> 
> 
> 
> 
> Thank you
> Emalayan
> 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory

-- 
You received this message because you are subscribed to the Google Groups
"MosaStore" group.
To post to this group, send email to mosastore at googlegroups.com.
To unsubscribe from this group, send email to
mosastore+unsubscribe at googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/mosastore?hl=en.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120330/d8c2ef5d/attachment.html>


More information about the Swift-devel mailing list