[Swift-devel] How best to distribute named input and outut files across dirs?

Michael Wilde wilde at mcs.anl.gov
Mon Nov 5 09:27:24 CST 2007


Whats the best way to spread output files across a directory if they are 
mapped, as opposed to anonymous?

In awf2.swift the outputs went into a single big dir (below _concurrent) 
because they are neither mapped nor members of an array.

In awf3.swift I switched to an array, and they were nicely (albeit 
verbosely ;) mapped to an array structure automatically.

In awf4.swift I name the outputs, and the files are now nicely named but 
all reside back in the client submit directory.

Now I want to make awf5, and spread named inputs and outputs across 
dirs. I recall suggesting a way to do this to Andrew, but didint track 
how he and you did it, Ben.

Andrew, can you send me your latest swift code?

Ben, Mihael, is the best way to do this to manually spread the inputs 
across a dirs, and map both the inputs and outputs using readdata?

angleinput/{00 through 99}/pcNNNN.pcap

angleout/{00 through 99}/ofNNNN.angle,cfNNNN.center}

I need to focus on a few admin things for a bit, but any/all advice is 
welcome.



::::::::::::::
awf2.swift
::::::::::::::
type pcapfile;
type angleout;
type anglecenter;

(angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
{
   app { angle4 @ifile @ofile @cfile; }
}

pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;

foreach pf in pcapfiles {
   angleout of;
   anglecenter cf;
   (of,cf) = angle4(pf);
}
::::::::::::::
awf3.swift
::::::::::::::
type pcapfile;
type angleout;
type anglecenter;

(angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
{
   app { angle4 @ifile @ofile @cfile; }
}

pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;

angleout of[];
anglecenter cf[];

foreach pf,i in pcapfiles {
   (of[i],cf[i]) = angle4(pf);
}
::::::::::::::
awf4.swift
::::::::::::::
type pcapfile;
type angleout;
type anglecenter;

(angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
{
   app { angle4 @ifile @ofile @cfile; }
}

pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;

angleout    of[] <simple_mapper;prefix="of",suffix=".angle">;
anglecenter cf[] <simple_mapper;prefix="cf",suffix=".center">;
                  // note i used .angle for both in current tests...

foreach pf,i in pcapfiles {
   (of[i],cf[i]) = angle4(pf);
}



On 11/1/07 11:57 AM, Ben Clifford wrote:
> I just modified the way that ConcurrentMapper lays out files (r1437)
> 
> You will likely not have encountered ConcurrentMapper by name. It is used 
> when you do not specify a mapper for a dataset, for example for 
> intermediate variables.
> 
> Previously, all files named by this mapper were given a long name in the 
> root directory of the submit and cache directories.
> 
> When a large number of files were named in this fashion, for example in an 
> array with thousands of elements, this would result in a file for each 
> element and a root directory with thousands of files.
> 
> Most immediately I encountered this problem working with Andrew Jamieson 
> running on TeraPort using GPFS. Many hosts attempting to access one 
> directory is severely unscalable on GPFS.
> 
> The changes I have made add more structure to filenames generated by the 
> ConcurrentMapper:
> 
> 
>  1. All files appear in a _concurrent/ subdirectory.
> 
> 
>  2. Simple/marker data typed files appear directly below _concurrent, 
> named as before. For example:
> 
>   file outfile;
> 
> might give a filename:
> 
>   _concurrent//outfile-3339612a-08e1-443d-bd14-2329080d2d94-
> 
> 
>  3. Structures are mapped to a sub-directory, with each element being a 
> file in that subdirectory. For example,
> 
>  type footype { file left; file right; }
>  footype structurefile;
> 
> might give a directory:
> 
> _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field
> 
> containing two files:
> 
> _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/left
> _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/right
> 
> 
> 4. Array elements are placed in a subdirectory. Within that subdirectory, 
> the index is using to construct a further hierarchy such that there will 
> never be more than 50 directories/files in any one directory. For example:
> 
>   file manyfile[];
> 
> might give mappings like this:
> 
> myfile[0] stored in:
>  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-0
> 
> myfile[22] stored in:
>  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-22
> 
> myfile[30] stored in:
>  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/h5/elt-30
> 
> myfile[734] stored in:
>  _concurrent//manyfile-bcdeedee-4df7-4d21-a207-d8051da3d133--array/h9/h4/elt-734
> 
> To form the paths, basically something like this happens:
> convert each number into base 25. discard the most significant digit. 
> then starting at the least significant digit and working towards 
> the most significant digit, make that digit into a subdirectory.
> 
> For example, 734 in base 10 is  (1) (4) (9) in base 25
> 
> so we form intermediate path /h9/h4/
> 
> Doing this means that for large arrays directory paths will grow, whilst 
> for small arrays will be short; and the size of the array does not need to 
> be known ahead of time.
> 
> The constant '25' can easily be adjusted. Its a compiled-in constant 
> defined in one place at the moment, but could be made into a mapper 
> parameter.
> 



More information about the Swift-devel mailing list