[Swift-devel] How best to distribute named input and outut files across dirs?

Andrew Robert Jamieson andrewj at uchicago.edu
Mon Nov 5 09:55:12 CST 2007


Hey Mike and others,

   I used that splitting bash script to separate the files into 
subdirectories.

Then I used that other script you helped me with to find where I put those 
files.  This script generated the .csv which was then read by the csv 
mapper.

Nothing fancy.

-Andrew



On Mon, 5 Nov 2007, Michael Wilde wrote:

> Whats the best way to spread output files across a directory if they are 
> mapped, as opposed to anonymous?
>
> In awf2.swift the outputs went into a single big dir (below _concurrent) 
> because they are neither mapped nor members of an array.
>
> In awf3.swift I switched to an array, and they were nicely (albeit verbosely 
> ;) mapped to an array structure automatically.
>
> In awf4.swift I name the outputs, and the files are now nicely named but all 
> reside back in the client submit directory.
>
> Now I want to make awf5, and spread named inputs and outputs across dirs. I 
> recall suggesting a way to do this to Andrew, but didint track how he and you 
> did it, Ben.
>
> Andrew, can you send me your latest swift code?
>
> Ben, Mihael, is the best way to do this to manually spread the inputs across 
> a dirs, and map both the inputs and outputs using readdata?
>
> angleinput/{00 through 99}/pcNNNN.pcap
>
> angleout/{00 through 99}/ofNNNN.angle,cfNNNN.center}
>
> I need to focus on a few admin things for a bit, but any/all advice is 
> welcome.
>
>
>
> ::::::::::::::
> awf2.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
>
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
>  app { angle4 @ifile @ofile @cfile; }
> }
>
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
>
> foreach pf in pcapfiles {
>  angleout of;
>  anglecenter cf;
>  (of,cf) = angle4(pf);
> }
> ::::::::::::::
> awf3.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
>
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
>  app { angle4 @ifile @ofile @cfile; }
> }
>
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
>
> angleout of[];
> anglecenter cf[];
>
> foreach pf,i in pcapfiles {
>  (of[i],cf[i]) = angle4(pf);
> }
> ::::::::::::::
> awf4.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
>
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
>  app { angle4 @ifile @ofile @cfile; }
> }
>
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
>
> angleout    of[] <simple_mapper;prefix="of",suffix=".angle">;
> anglecenter cf[] <simple_mapper;prefix="cf",suffix=".center">;
>                 // note i used .angle for both in current tests...
>
> foreach pf,i in pcapfiles {
>  (of[i],cf[i]) = angle4(pf);
> }
>
>
>
> On 11/1/07 11:57 AM, Ben Clifford wrote:
>> I just modified the way that ConcurrentMapper lays out files (r1437)
>> 
>> You will likely not have encountered ConcurrentMapper by name. It is used 
>> when you do not specify a mapper for a dataset, for example for 
>> intermediate variables.
>> 
>> Previously, all files named by this mapper were given a long name in the 
>> root directory of the submit and cache directories.
>> 
>> When a large number of files were named in this fashion, for example in an 
>> array with thousands of elements, this would result in a file for each 
>> element and a root directory with thousands of files.
>> 
>> Most immediately I encountered this problem working with Andrew Jamieson 
>> running on TeraPort using GPFS. Many hosts attempting to access one 
>> directory is severely unscalable on GPFS.
>> 
>> The changes I have made add more structure to filenames generated by the 
>> ConcurrentMapper:
>> 
>> 
>>  1. All files appear in a _concurrent/ subdirectory.
>> 
>> 
>>  2. Simple/marker data typed files appear directly below _concurrent, named 
>> as before. For example:
>> 
>>   file outfile;
>> 
>> might give a filename:
>> 
>>   _concurrent//outfile-3339612a-08e1-443d-bd14-2329080d2d94-
>> 
>> 
>>  3. Structures are mapped to a sub-directory, with each element being a 
>> file in that subdirectory. For example,
>> 
>>  type footype { file left; file right; }
>>  footype structurefile;
>> 
>> might give a directory:
>> 
>> _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field
>> 
>> containing two files:
>> 
>> _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/left
>> _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/right
>> 
>> 
>> 4. Array elements are placed in a subdirectory. Within that subdirectory, 
>> the index is using to construct a further hierarchy such that there will 
>> never be more than 50 directories/files in any one directory. For example:
>> 
>>   file manyfile[];
>> 
>> might give mappings like this:
>> 
>> myfile[0] stored in:
>>  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-0
>> 
>> myfile[22] stored in:
>>  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-22
>> 
>> myfile[30] stored in:
>>  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/h5/elt-30
>> 
>> myfile[734] stored in:
>>  _concurrent//manyfile-bcdeedee-4df7-4d21-a207-d8051da3d133--array/h9/h4/elt-734
>> 
>> To form the paths, basically something like this happens:
>> convert each number into base 25. discard the most significant digit. then 
>> starting at the least significant digit and working towards the most 
>> significant digit, make that digit into a subdirectory.
>> 
>> For example, 734 in base 10 is  (1) (4) (9) in base 25
>> 
>> so we form intermediate path /h9/h4/
>> 
>> Doing this means that for large arrays directory paths will grow, whilst 
>> for small arrays will be short; and the size of the array does not need to 
>> be known ahead of time.
>> 
>> The constant '25' can easily be adjusted. Its a compiled-in constant 
>> defined in one place at the moment, but could be made into a mapper 
>> parameter.
>> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>



More information about the Swift-devel mailing list