[Swift-devel] ConcurrentMapper changes

Ben Clifford benc at hawaga.org.uk
Thu Nov 1 11:57:10 CDT 2007


I just modified the way that ConcurrentMapper lays out files (r1437)

You will likely not have encountered ConcurrentMapper by name. It is used 
when you do not specify a mapper for a dataset, for example for 
intermediate variables.

Previously, all files named by this mapper were given a long name in the 
root directory of the submit and cache directories.

When a large number of files were named in this fashion, for example in an 
array with thousands of elements, this would result in a file for each 
element and a root directory with thousands of files.

Most immediately I encountered this problem working with Andrew Jamieson 
running on TeraPort using GPFS. Many hosts attempting to access one 
directory is severely unscalable on GPFS.

The changes I have made add more structure to filenames generated by the 
ConcurrentMapper:


 1. All files appear in a _concurrent/ subdirectory.


 2. Simple/marker data typed files appear directly below _concurrent, 
named as before. For example:

  file outfile;

might give a filename:

  _concurrent//outfile-3339612a-08e1-443d-bd14-2329080d2d94-


 3. Structures are mapped to a sub-directory, with each element being a 
file in that subdirectory. For example,

 type footype { file left; file right; }
 footype structurefile;

might give a directory:

_concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field

containing two files:

_concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/left
_concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/right


4. Array elements are placed in a subdirectory. Within that subdirectory, 
the index is using to construct a further hierarchy such that there will 
never be more than 50 directories/files in any one directory. For example:

  file manyfile[];

might give mappings like this:

myfile[0] stored in:
 _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-0

myfile[22] stored in:
 _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-22

myfile[30] stored in:
 _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/h5/elt-30

myfile[734] stored in:
 _concurrent//manyfile-bcdeedee-4df7-4d21-a207-d8051da3d133--array/h9/h4/elt-734

To form the paths, basically something like this happens:
convert each number into base 25. discard the most significant digit. 
then starting at the least significant digit and working towards 
the most significant digit, make that digit into a subdirectory.

For example, 734 in base 10 is  (1) (4) (9) in base 25

so we form intermediate path /h9/h4/

Doing this means that for large arrays directory paths will grow, whilst 
for small arrays will be short; and the size of the array does not need to 
be known ahead of time.

The constant '25' can easily be adjusted. Its a compiled-in constant 
defined in one place at the moment, but could be made into a mapper 
parameter.

-- 



More information about the Swift-devel mailing list