[Swift-commit] r7124 - SwiftApps/Swift-MapRed/Paper

Wed Oct 2 17:15:32 CDT 2013

Author: yadunandb
Date: 2013-10-02 17:15:32 -0500 (Wed, 02 Oct 2013)
New Revision: 7124

Modified:
   SwiftApps/Swift-MapRed/Paper/swifthadoop.tex
Log:

Added notes (partial)


Modified: SwiftApps/Swift-MapRed/Paper/swifthadoop.tex
===================================================================

--- SwiftApps/Swift-MapRed/Paper/swifthadoop.tex	2013-10-02 20:54:03 UTC (rev 7123)
+++ SwiftApps/Swift-MapRed/Paper/swifthadoop.tex	2013-10-02 22:15:32 UTC (rev 7124)
@@ -559,3 +559,80 @@
 
 == Yadu Notes ==
 
+Current flow is designed like this:
+
+0. To associate files with their location we define a new swift type *fileptr*
+   fileptr (for filepointer) is a regular file to swift, but it can be interpreted 
+   by by apps or (in future) swiftwrap as a globally addressable filepath.
+   Currently a fileptr is of the structure:
+   Node0 /path/to/file0 /path/to/file1 /path/to/file2 ...
+   ...
+   Noden /path/to/file0 ...
+
+1. mapper_func takes some input and generates some files as output and returns a fileptr
+   So, the map stage is really just a foreach loop, filling up map_results[] array.
+
+   map_results[] = map ( mapper_func, input_array );
+
+2. Process the fileptr's returned from the map stage to generate a list of nodes, where
+   the map jobs left intermediate data.
+   Currently the get_uniq_nodes() app returns a file with the format:
+   node1 file1 file2 ...
+   node2 file1 file2 ...
+
+3. Run a combiner job on every node returned from get_uniq_nodes()
+   The combiner is used to reduce the results in a distributed fashion.
+   A combine function is possible only when the combine function is commutative and 
+   associative. As in map-reduce, a combine function maybe called on one or more files.
+   For more complex reduce functions, a proper combine stage could be impossible, in 
+   such cases, a combiner could be just a concatenation/compression which would result 
+   in better file transfer performance to forward stages.
+
+4. Optional-Tree reduction. There are several strategies to doing a distributed reduce
+   of results from several nodes. A general K-way-tree reduction is a configurable
+   method to efficiently perform compute heavy reduction operations.
+   In cases where the cost is in the bulk of file transfers and reduction stage does
+   not result in smaller files, a tree reduction may not be the best fit due to the
+   additional transfers required between stages.
+   If a tree-reduction is not chosen, a simple single level reduction could be
+   used to gather all results from the nodes where results from either map/combine 
+   stage could be fetched to generate the final result.
+
+
+Points from discussion with Mike.
+
+1. Function pointers or the ability to pass functions as args to other
+ functions. We could make a mockup with preprocessors.
+ 
+ map_results = map ( mapper_func, input_array );
+ 
+ translates to :
+
+ foreach item, i in input_array {
+   map_results[i] = mapper_func (item);
+ }
+
+ This is just syntactic sugar, worth considering what additional capability truly being
+ able to pass functions/apps give us.
+
+2. Extend current filepointers to function as filepointersets.
+   Instead of one filepointer pointing to one file, a file pointer could point at
+   one or more files as long as the format is maintained. 
+   Current code works with this.
+   
+   Filepointers should be treated as just file when it is being returned/copied.
+   When it is passed as an arg to an app, it should go through a conceptual dereferencing.
+   In this context, the filepointer should be interpreted and the actual files it points
+   to should be fetched if not present locally.
+
+   3. 
+   
+   
+   
+
+
+
+
+
+
+