[Swift-commit] r7124 - SwiftApps/Swift-MapRed/Paper
yadunandb at ci.uchicago.edu
yadunandb at ci.uchicago.edu
Wed Oct 2 17:15:32 CDT 2013
Author: yadunandb
Date: 2013-10-02 17:15:32 -0500 (Wed, 02 Oct 2013)
New Revision: 7124
Modified:
SwiftApps/Swift-MapRed/Paper/swifthadoop.tex
Log:
Added notes (partial)
Modified: SwiftApps/Swift-MapRed/Paper/swifthadoop.tex
===================================================================
--- SwiftApps/Swift-MapRed/Paper/swifthadoop.tex 2013-10-02 20:54:03 UTC (rev 7123)
+++ SwiftApps/Swift-MapRed/Paper/swifthadoop.tex 2013-10-02 22:15:32 UTC (rev 7124)
@@ -559,3 +559,80 @@
== Yadu Notes ==
+Current flow is designed like this:
+
+0. To associate files with their location we define a new swift type *fileptr*
+ fileptr (for filepointer) is a regular file to swift, but it can be interpreted
+ by by apps or (in future) swiftwrap as a globally addressable filepath.
+ Currently a fileptr is of the structure:
+ Node0 /path/to/file0 /path/to/file1 /path/to/file2 ...
+ ...
+ Noden /path/to/file0 ...
+
+1. mapper_func takes some input and generates some files as output and returns a fileptr
+ So, the map stage is really just a foreach loop, filling up map_results[] array.
+
+ map_results[] = map ( mapper_func, input_array );
+
+2. Process the fileptr's returned from the map stage to generate a list of nodes, where
+ the map jobs left intermediate data.
+ Currently the get_uniq_nodes() app returns a file with the format:
+ node1 file1 file2 ...
+ node2 file1 file2 ...
+
+3. Run a combiner job on every node returned from get_uniq_nodes()
+ The combiner is used to reduce the results in a distributed fashion.
+ A combine function is possible only when the combine function is commutative and
+ associative. As in map-reduce, a combine function maybe called on one or more files.
+ For more complex reduce functions, a proper combine stage could be impossible, in
+ such cases, a combiner could be just a concatenation/compression which would result
+ in better file transfer performance to forward stages.
+
+4. Optional-Tree reduction. There are several strategies to doing a distributed reduce
+ of results from several nodes. A general K-way-tree reduction is a configurable
+ method to efficiently perform compute heavy reduction operations.
+ In cases where the cost is in the bulk of file transfers and reduction stage does
+ not result in smaller files, a tree reduction may not be the best fit due to the
+ additional transfers required between stages.
+ If a tree-reduction is not chosen, a simple single level reduction could be
+ used to gather all results from the nodes where results from either map/combine
+ stage could be fetched to generate the final result.
+
+
+Points from discussion with Mike.
+
+1. Function pointers or the ability to pass functions as args to other
+ functions. We could make a mockup with preprocessors.
+
+ map_results = map ( mapper_func, input_array );
+
+ translates to :
+
+ foreach item, i in input_array {
+ map_results[i] = mapper_func (item);
+ }
+
+ This is just syntactic sugar, worth considering what additional capability truly being
+ able to pass functions/apps give us.
+
+2. Extend current filepointers to function as filepointersets.
+ Instead of one filepointer pointing to one file, a file pointer could point at
+ one or more files as long as the format is maintained.
+ Current code works with this.
+
+ Filepointers should be treated as just file when it is being returned/copied.
+ When it is passed as an arg to an app, it should go through a conceptual dereferencing.
+ In this context, the filepointer should be interpreted and the actual files it points
+ to should be fetched if not present locally.
+
+ 3.
+
+
+
+
+
+
+
+
+
+
More information about the Swift-commit
mailing list