[Swift-devel] MapReduce, doubts

Jonathan Monette jonmon at mcs.anl.gov
Sun Aug 28 11:32:44 CDT 2011


On Aug 28, 2011, at 10:15 AM, Michael Wilde wrote:

> ----- Original Message -----
>> From: "Yadu Nand" <yadudoc1729 at gmail.com>
>> To: "swift-devel" <swift-devel at ci.uchicago.edu>, "Justin M Wozniak" <wozniak at mcs.anl.gov>, "Mihael Hategan"
>> <hategan at mcs.anl.gov>, "Michael Wilde" <wilde at mcs.anl.gov>
>> Sent: Sunday, August 28, 2011 8:03:29 AM
>> Subject: MapReduce, doubts
>> Hi,
>> 
>> I was going through some materials ([1], [2] , [3]) to understand
>> Google's MapReduce system and I have a couple of queries :
>> 
>> 1. How do we address the issue of data locality ?
>> When we run a map job, it is a priority to run it such that least
>> network overhead is incurred, so preferably on the same system
>> holding the data (or one which is nearest , I don't know how this
>> works).
> 
> Currently, we dont. We have discussed a new feature to do this (its listed as a GSoC project and I can probably locate a discussion with a 2010 GSoC candidate in which I detailed a possible strategy).
> 
> We can current implement a similar scheme using an external mapper to select input files from multiple sites and map them to gsiftp URIs.  Then an enhancement in the scheduler could select a site based on the URI of some or all or the input files.

In my work with GOSwift I found that mappers will follow the GSIURI path.

For a single file:
file input1 <“gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/jonmon/data/input1.txt”>; 

The above will map the file input1.txt that resides on PADS.

For a group of files:
file inputs[] <filesys_mapper; location = "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/jonmon/data”,  suffix=“.txt”>;

The above will map all files with a ".txt" extension in the directory data on PADS.

I think this is what you were talking about having the external mapper do.

> 
>> 2. Is it possible to somehow force the reduce tasks to wait till all
>> map jobs are done ?
> 
> Isn't that just normal swift semantics? If we coded a simple-minded reduce job whose input was the array of outputs from the map() stage, the reduce (assuming its an app function) would wait for all the map() ops to finish, right?
> 
> I would ask instead "do we want to?". Do the distributed reduce ops in map-reduce really wait? Doesn't MR do distributed reduction in batches, asynchronously to the completion of the map() operations? Isnt this a key property that is made possible by the name/value pair-based nature of the MR data model?  I thought MR reduce ops take place at any location, in any input chunk size, in a tree-based manner, and that this is possible because the reduction operator is "distributed" in the mathematical sense.
> 
>> The MapReduce uses a system which permits reduce to run only
>> after all the map jobs are done executing. I'm not entirely sure why
>> this is a requirement but this has its own issues, such as a single
>> slow mapper. This is usually tackled by the main-controller noticing
>> the slow one and running multiple instances of the map job to get
>> results faster. Does swift at some level use the concept of a central
>> controller ? How do we tackle this ?
>> 
>> 3. How does swift handle failures ? Is there a facility for
>> re-execution ?
> 
> Yes, Swift retries failing app invocations as controlled by the properties execution.retries and lazy.errors. You can read on these in the users guide and in the properties file.
> 
>> Is this documented somewhere ? Do we use any file-system that
>> handles loss of a particular file /input-set ?
> 
> No, we dont, but some of this would come with using a replication-based model for the input dataset where the mapper could supply a list of possible inputs instead of one, and the scheduler could pick a replica each time it selects a site for a (retried) job.
> 
> Also, we might think of a "forMostOf" statement which could implement semantics that would be suitable for runs in which you dont need every single map() to complete. I.e. the target array can be considered closed when "most of" (tbd) the input collection had been processed. The formost() could complete when it enters the "tail" of the loop (see Tim Armstrong's paper on the tail phenomenon).

I think this has been discussed before.  In the Montage application there is a step where I map more filenames than will be created.  So I don't need all the maps to complete for the workflow to keep progressing.  I made a workaround but I think this "forMostOf" feature would be useful.  I will locate the thread in which Mihael and I had this discussion.
> 
>> I'm stopping here, there are more questions nagging me, but its
>> probably best to not blurt it out all at once :)
> 
> I think you are hitting the right issues here, and I encourage you to keep pushing towards something that you could readily experiment with.  This si exactly where we need to go to provide a convenient method for expressing map-reduce as an elegant high-level script.
> 
> I also encourage you to read on what Ed Walker did for map-reduce in his parallel shell.
> 
> - Mike
> 
>> [1] http://code.google.com/edu/parallel/mapreduce-tutorial.html
>> [2] http://www.youtube.com/watch?v=-vD6PUdf3Js
>> [3]
>> http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html
>> --
>> Thanks and Regards,
>> Yadu Nand B
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list