[Swift-devel] Re: How best to distribute named input and outut files across dirs?

Ben Clifford benc at hawaga.org.uk
Mon Nov 5 10:54:03 CST 2007


I'm confused by your use of the concurrent mapper with the word 'output' - 
anything appearing under _concurrent is rather arbitrarily named.

For inputs, how are you specifying input mapping at the moment? Can you 
give the mapper declaration you use for inputs?

For outputs, some ideas:

  i) explicitly map output paths using the CSV mapper or execution mapper.
 ii) write a custom mapper or have one of us do it that has more 
     hierarchical behaviour.


On Mon, 5 Nov 2007, Michael Wilde wrote:

> Whats the best way to spread output files across a directory if they are
> mapped, as opposed to anonymous?
> 
> In awf2.swift the outputs went into a single big dir (below _concurrent)
> because they are neither mapped nor members of an array.
> 
> In awf3.swift I switched to an array, and they were nicely (albeit verbosely
> ;) mapped to an array structure automatically.
> 
> In awf4.swift I name the outputs, and the files are now nicely named but all
> reside back in the client submit directory.
> 
> Now I want to make awf5, and spread named inputs and outputs across dirs. I
> recall suggesting a way to do this to Andrew, but didint track how he and you
> did it, Ben.
> 
> Andrew, can you send me your latest swift code?
> 
> Ben, Mihael, is the best way to do this to manually spread the inputs across a
> dirs, and map both the inputs and outputs using readdata?
> 
> angleinput/{00 through 99}/pcNNNN.pcap
> 
> angleout/{00 through 99}/ofNNNN.angle,cfNNNN.center}
> 
> I need to focus on a few admin things for a bit, but any/all advice is
> welcome.
> 
> 
> 
> ::::::::::::::
> awf2.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
> 
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
>   app { angle4 @ifile @ofile @cfile; }
> }
> 
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
> 
> foreach pf in pcapfiles {
>   angleout of;
>   anglecenter cf;
>   (of,cf) = angle4(pf);
> }
> ::::::::::::::
> awf3.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
> 
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
>   app { angle4 @ifile @ofile @cfile; }
> }
> 
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
> 
> angleout of[];
> anglecenter cf[];
> 
> foreach pf,i in pcapfiles {
>   (of[i],cf[i]) = angle4(pf);
> }
> ::::::::::::::
> awf4.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
> 
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
>   app { angle4 @ifile @ofile @cfile; }
> }
> 
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
> 
> angleout    of[] <simple_mapper;prefix="of",suffix=".angle">;
> anglecenter cf[] <simple_mapper;prefix="cf",suffix=".center">;
>                  // note i used .angle for both in current tests...
> 
> foreach pf,i in pcapfiles {
>   (of[i],cf[i]) = angle4(pf);
> }
> 
> 
> 
> On 11/1/07 11:57 AM, Ben Clifford wrote:
> > I just modified the way that ConcurrentMapper lays out files (r1437)
> > 
> > You will likely not have encountered ConcurrentMapper by name. It is used
> > when you do not specify a mapper for a dataset, for example for intermediate
> > variables.
> > 
> > Previously, all files named by this mapper were given a long name in the
> > root directory of the submit and cache directories.
> > 
> > When a large number of files were named in this fashion, for example in an
> > array with thousands of elements, this would result in a file for each
> > element and a root directory with thousands of files.
> > 
> > Most immediately I encountered this problem working with Andrew Jamieson
> > running on TeraPort using GPFS. Many hosts attempting to access one
> > directory is severely unscalable on GPFS.
> > 
> > The changes I have made add more structure to filenames generated by the
> > ConcurrentMapper:
> > 
> > 
> >  1. All files appear in a _concurrent/ subdirectory.
> > 
> > 
> >  2. Simple/marker data typed files appear directly below _concurrent, named
> > as before. For example:
> > 
> >   file outfile;
> > 
> > might give a filename:
> > 
> >   _concurrent//outfile-3339612a-08e1-443d-bd14-2329080d2d94-
> > 
> > 
> >  3. Structures are mapped to a sub-directory, with each element being a file
> > in that subdirectory. For example,
> > 
> >  type footype { file left; file right; }
> >  footype structurefile;
> > 
> > might give a directory:
> > 
> > _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field
> > 
> > containing two files:
> > 
> > _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/left
> > _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/right
> > 
> > 
> > 4. Array elements are placed in a subdirectory. Within that subdirectory,
> > the index is using to construct a further hierarchy such that there will
> > never be more than 50 directories/files in any one directory. For example:
> > 
> >   file manyfile[];
> > 
> > might give mappings like this:
> > 
> > myfile[0] stored in:
> >  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-0
> > 
> > myfile[22] stored in:
> >  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-22
> > 
> > myfile[30] stored in:
> >  _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/h5/elt-30
> > 
> > myfile[734] stored in:
> >  _concurrent//manyfile-bcdeedee-4df7-4d21-a207-d8051da3d133--array/h9/h4/elt-734
> > 
> > To form the paths, basically something like this happens:
> > convert each number into base 25. discard the most significant digit. then
> > starting at the least significant digit and working towards the most
> > significant digit, make that digit into a subdirectory.
> > 
> > For example, 734 in base 10 is  (1) (4) (9) in base 25
> > 
> > so we form intermediate path /h9/h4/
> > 
> > Doing this means that for large arrays directory paths will grow, whilst for
> > small arrays will be short; and the size of the array does not need to be
> > known ahead of time.
> > 
> > The constant '25' can easily be adjusted. Its a compiled-in constant defined
> > in one place at the moment, but could be made into a mapper parameter.
> > 
> 
> 



More information about the Swift-devel mailing list