[Swift-devel] MapReduce, doubts

Michael Wilde wilde at mcs.anl.gov
Sun Aug 28 12:11:57 CDT 2011


Jon, yes, thats right. Can you add the info below to the User Guide, or if time is pressing, to the cookbook?

Thanks,

- Mike

----- Original Message -----
> From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Yadu Nand" <yadudoc1729 at gmail.com>, "swift-devel" <swift-devel at ci.uchicago.edu>
> Sent: Sunday, August 28, 2011 11:32:44 AM
> Subject: Re: [Swift-devel] MapReduce, doubts
> On Aug 28, 2011, at 10:15 AM, Michael Wilde wrote:
> 
> > ----- Original Message -----
> >> From: "Yadu Nand" <yadudoc1729 at gmail.com>
> >> To: "swift-devel" <swift-devel at ci.uchicago.edu>, "Justin M Wozniak"
> >> <wozniak at mcs.anl.gov>, "Mihael Hategan"
> >> <hategan at mcs.anl.gov>, "Michael Wilde" <wilde at mcs.anl.gov>
> >> Sent: Sunday, August 28, 2011 8:03:29 AM
> >> Subject: MapReduce, doubts
> >> Hi,
> >>
> >> I was going through some materials ([1], [2] , [3]) to understand
> >> Google's MapReduce system and I have a couple of queries :
> >>
> >> 1. How do we address the issue of data locality ?
> >> When we run a map job, it is a priority to run it such that least
> >> network overhead is incurred, so preferably on the same system
> >> holding the data (or one which is nearest , I don't know how this
> >> works).
> >
> > Currently, we dont. We have discussed a new feature to do this (its
> > listed as a GSoC project and I can probably locate a discussion with
> > a 2010 GSoC candidate in which I detailed a possible strategy).
> >
> > We can current implement a similar scheme using an external mapper
> > to select input files from multiple sites and map them to gsiftp
> > URIs. Then an enhancement in the scheduler could select a site based
> > on the URI of some or all or the input files.
> 
> In my work with GOSwift I found that mappers will follow the GSIURI
> path.
> 
> For a single file:
> file input1
> <“gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/jonmon/data/input1.txt”>;
> 
> The above will map the file input1.txt that resides on PADS.
> 
> For a group of files:
> file inputs[] <filesys_mapper; location =
> "gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/jonmon/data”,
> suffix=“.txt”>;
> 
> The above will map all files with a ".txt" extension in the directory
> data on PADS.
> 
> I think this is what you were talking about having the external mapper
> do.
> 
> >
> >> 2. Is it possible to somehow force the reduce tasks to wait till
> >> all
> >> map jobs are done ?
> >
> > Isn't that just normal swift semantics? If we coded a simple-minded
> > reduce job whose input was the array of outputs from the map()
> > stage, the reduce (assuming its an app function) would wait for all
> > the map() ops to finish, right?
> >
> > I would ask instead "do we want to?". Do the distributed reduce ops
> > in map-reduce really wait? Doesn't MR do distributed reduction in
> > batches, asynchronously to the completion of the map() operations?
> > Isnt this a key property that is made possible by the name/value
> > pair-based nature of the MR data model? I thought MR reduce ops take
> > place at any location, in any input chunk size, in a tree-based
> > manner, and that this is possible because the reduction operator is
> > "distributed" in the mathematical sense.
> >
> >> The MapReduce uses a system which permits reduce to run only
> >> after all the map jobs are done executing. I'm not entirely sure
> >> why
> >> this is a requirement but this has its own issues, such as a single
> >> slow mapper. This is usually tackled by the main-controller
> >> noticing
> >> the slow one and running multiple instances of the map job to get
> >> results faster. Does swift at some level use the concept of a
> >> central
> >> controller ? How do we tackle this ?
> >>
> >> 3. How does swift handle failures ? Is there a facility for
> >> re-execution ?
> >
> > Yes, Swift retries failing app invocations as controlled by the
> > properties execution.retries and lazy.errors. You can read on these
> > in the users guide and in the properties file.
> >
> >> Is this documented somewhere ? Do we use any file-system that
> >> handles loss of a particular file /input-set ?
> >
> > No, we dont, but some of this would come with using a
> > replication-based model for the input dataset where the mapper could
> > supply a list of possible inputs instead of one, and the scheduler
> > could pick a replica each time it selects a site for a (retried)
> > job.
> >
> > Also, we might think of a "forMostOf" statement which could
> > implement semantics that would be suitable for runs in which you
> > dont need every single map() to complete. I.e. the target array can
> > be considered closed when "most of" (tbd) the input collection had
> > been processed. The formost() could complete when it enters the
> > "tail" of the loop (see Tim Armstrong's paper on the tail
> > phenomenon).
> 
> I think this has been discussed before. In the Montage application
> there is a step where I map more filenames than will be created. So I
> don't need all the maps to complete for the workflow to keep
> progressing. I made a workaround but I think this "forMostOf" feature
> would be useful. I will locate the thread in which Mihael and I had
> this discussion.
> >
> >> I'm stopping here, there are more questions nagging me, but its
> >> probably best to not blurt it out all at once :)
> >
> > I think you are hitting the right issues here, and I encourage you
> > to keep pushing towards something that you could readily experiment
> > with. This si exactly where we need to go to provide a convenient
> > method for expressing map-reduce as an elegant high-level script.
> >
> > I also encourage you to read on what Ed Walker did for map-reduce in
> > his parallel shell.
> >
> > - Mike
> >
> >> [1] http://code.google.com/edu/parallel/mapreduce-tutorial.html
> >> [2] http://www.youtube.com/watch?v=-vD6PUdf3Js
> >> [3]
> >> http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html
> >> --
> >> Thanks and Regards,
> >> Yadu Nand B
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list