[Swift-devel] MapReduce, doubts

Sun Aug 28 10:15:02 CDT 2011

----- Original Message -----
> From: "Yadu Nand" <yadudoc1729 at gmail.com>
> To: "swift-devel" <swift-devel at ci.uchicago.edu>, "Justin M Wozniak" <wozniak at mcs.anl.gov>, "Mihael Hategan"
> <hategan at mcs.anl.gov>, "Michael Wilde" <wilde at mcs.anl.gov>
> Sent: Sunday, August 28, 2011 8:03:29 AM
> Subject: MapReduce, doubts
> Hi,
> 
> I was going through some materials ([1], [2] , [3]) to understand
> Google's MapReduce system and I have a couple of queries :
> 
> 1. How do we address the issue of data locality ?
> When we run a map job, it is a priority to run it such that least
> network overhead is incurred, so preferably on the same system
> holding the data (or one which is nearest , I don't know how this
> works).

Currently, we dont. We have discussed a new feature to do this (its listed as a GSoC project and I can probably locate a discussion with a 2010 GSoC candidate in which I detailed a possible strategy).

We can current implement a similar scheme using an external mapper to select input files from multiple sites and map them to gsiftp URIs.  Then an enhancement in the scheduler could select a site based on the URI of some or all or the input files.

> 2. Is it possible to somehow force the reduce tasks to wait till all
> map jobs are done ?

Isn't that just normal swift semantics? If we coded a simple-minded reduce job whose input was the array of outputs from the map() stage, the reduce (assuming its an app function) would wait for all the map() ops to finish, right?

I would ask instead "do we want to?". Do the distributed reduce ops in map-reduce really wait? Doesn't MR do distributed reduction in batches, asynchronously to the completion of the map() operations? Isnt this a key property that is made possible by the name/value pair-based nature of the MR data model?  I thought MR reduce ops take place at any location, in any input chunk size, in a tree-based manner, and that this is possible because the reduction operator is "distributed" in the mathematical sense.

> The MapReduce uses a system which permits reduce to run only
> after all the map jobs are done executing. I'm not entirely sure why
> this is a requirement but this has its own issues, such as a single
> slow mapper. This is usually tackled by the main-controller noticing
> the slow one and running multiple instances of the map job to get
> results faster. Does swift at some level use the concept of a central
> controller ? How do we tackle this ?
> 
> 3. How does swift handle failures ? Is there a facility for
> re-execution ?

Yes, Swift retries failing app invocations as controlled by the properties execution.retries and lazy.errors. You can read on these in the users guide and in the properties file.

> Is this documented somewhere ? Do we use any file-system that
> handles loss of a particular file /input-set ?

No, we dont, but some of this would come with using a replication-based model for the input dataset where the mapper could supply a list of possible inputs instead of one, and the scheduler could pick a replica each time it selects a site for a (retried) job.

Also, we might think of a "forMostOf" statement which could implement semantics that would be suitable for runs in which you dont need every single map() to complete. I.e. the target array can be considered closed when "most of" (tbd) the input collection had been processed. The formost() could complete when it enters the "tail" of the loop (see Tim Armstrong's paper on the tail phenomenon).

> I'm stopping here, there are more questions nagging me, but its
> probably best to not blurt it out all at once :)

I think you are hitting the right issues here, and I encourage you to keep pushing towards something that you could readily experiment with.  This si exactly where we need to go to provide a convenient method for expressing map-reduce as an elegant high-level script.

I also encourage you to read on what Ed Walker did for map-reduce in his parallel shell.

- Mike

> [1] http://code.google.com/edu/parallel/mapreduce-tutorial.html
> [2] http://www.youtube.com/watch?v=-vD6PUdf3Js
> [3]
> http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html
> --
> Thanks and Regards,
> Yadu Nand B

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory