[Swift-devel] Re: Clarifications regarding a GSoC project idea [swift]
Michael Wilde
wilde at mcs.anl.gov
Thu Mar 17 16:42:42 CDT 2011
Hi Yadu,
A good source of ideas for how to do map reduce in Swift might be the work that Ed Walker did to implement map reduce in his parallel shell:
http://portal.acm.org/citation.cfm?id=1645175
and
http://sites.google.com/site/ewalker544/research-2/dataflowshell
(where you can download Ed's parallel bash)
I think there are two main (and somewhat separate) aspects here:
- how to work both with and without name/value pairs: Swift has no intrinsic name/value concept, and one can find good use cases both with and without keys.
Note that the separate project to add associative arrays to Swift is one way to integrate the concept of keys
- how to do a reduction trees (especially for non-key-based workflows) in a manner that reduces the amount of data and avoids the requirement of sending the output of every map operation back to a single site for reduction
I should also mention that this project is one of the more research-oriented and less focused projects on our list. There are several more concrete projects that you may also find interesting. So if you find this one fascinating by all means keep thinking about it. But if you want something more concrete I can disscuss a few other possibilities with you.
- Mike
----- Original Message -----
> Hi,
>
> I am interested in working on the project on Implementing efficient
> Map-Reduce models using the Swift parallel scripting language as
> mentioned on the ideas page[1]. I fairly understand the map-reduce
> concept and have done a toy erlang implementation [2].
>
> As of now, I have gotten swift compiled and running. I am also going
> through the papers on swift [3] and map-reduce [4] [5]. Any help or
> directions to get a clear picture of the problem at hand would be
> greatly appreciated.
>
> Secondly, I am working on a wrapper for RBUDP[6] on XIO (for college
> project, as part of a team). If there are any projects that could be
> done
> around profiling GridFTP over UDT and RBUDP separately and
> performance comparisons, I would be interested in that as well.
> Implementing RBUDP driver for XIO was mentioned here [7]
>
> I am not quite sure where exactly I should be mailing this, so if this
> gets posted to the wrong mailing list please excuse my clumsiness.
>
> [1] http://dev.globus.org/wiki/Google_Summer_of_Code_2011_Ideas
> [2] https://github.com/yadudoc/erlang/blob/master/mapred.erl
> [3] http://www.ci.uchicago.edu/swift/papers/SwiftParallelScripting.pdf
> [4] http://labs.google.com/papers/mapreduce-osdi04.pdf
> [5] Hadoop: The Definitive Guide , O'Reilly Media (Chapter 6)
> [6] http://www.evl.uic.edu/cavern/papers/cluster2002.pdf
> [7] http://dev.globus.org/wiki/Project_Ideas
>
> --
> Thanks and Regards,
> Yadu Nand B
> (+91 94477 80725)
> ( http://humanint.posterous.com )
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list