[Swift-devel] ConcurrentMapper changes
Ben Clifford
benc at hawaga.org.uk
Thu Nov 1 11:57:10 CDT 2007
I just modified the way that ConcurrentMapper lays out files (r1437)
You will likely not have encountered ConcurrentMapper by name. It is used
when you do not specify a mapper for a dataset, for example for
intermediate variables.
Previously, all files named by this mapper were given a long name in the
root directory of the submit and cache directories.
When a large number of files were named in this fashion, for example in an
array with thousands of elements, this would result in a file for each
element and a root directory with thousands of files.
Most immediately I encountered this problem working with Andrew Jamieson
running on TeraPort using GPFS. Many hosts attempting to access one
directory is severely unscalable on GPFS.
The changes I have made add more structure to filenames generated by the
ConcurrentMapper:
1. All files appear in a _concurrent/ subdirectory.
2. Simple/marker data typed files appear directly below _concurrent,
named as before. For example:
file outfile;
might give a filename:
_concurrent//outfile-3339612a-08e1-443d-bd14-2329080d2d94-
3. Structures are mapped to a sub-directory, with each element being a
file in that subdirectory. For example,
type footype { file left; file right; }
footype structurefile;
might give a directory:
_concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field
containing two files:
_concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/left
_concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/right
4. Array elements are placed in a subdirectory. Within that subdirectory,
the index is using to construct a further hierarchy such that there will
never be more than 50 directories/files in any one directory. For example:
file manyfile[];
might give mappings like this:
myfile[0] stored in:
_concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-0
myfile[22] stored in:
_concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-22
myfile[30] stored in:
_concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/h5/elt-30
myfile[734] stored in:
_concurrent//manyfile-bcdeedee-4df7-4d21-a207-d8051da3d133--array/h9/h4/elt-734
To form the paths, basically something like this happens:
convert each number into base 25. discard the most significant digit.
then starting at the least significant digit and working towards
the most significant digit, make that digit into a subdirectory.
For example, 734 in base 10 is (1) (4) (9) in base 25
so we form intermediate path /h9/h4/
Doing this means that for large arrays directory paths will grow, whilst
for small arrays will be short; and the size of the array does not need to
be known ahead of time.
The constant '25' can easily be adjusted. Its a compiled-in constant
defined in one place at the moment, but could be made into a mapper
parameter.
--
More information about the Swift-devel
mailing list