[Swift-devel] Re: How best to distribute named input and outut files across dirs?
Ben Clifford
benc at hawaga.org.uk
Mon Nov 5 10:54:03 CST 2007
I'm confused by your use of the concurrent mapper with the word 'output' -
anything appearing under _concurrent is rather arbitrarily named.
For inputs, how are you specifying input mapping at the moment? Can you
give the mapper declaration you use for inputs?
For outputs, some ideas:
i) explicitly map output paths using the CSV mapper or execution mapper.
ii) write a custom mapper or have one of us do it that has more
hierarchical behaviour.
On Mon, 5 Nov 2007, Michael Wilde wrote:
> Whats the best way to spread output files across a directory if they are
> mapped, as opposed to anonymous?
>
> In awf2.swift the outputs went into a single big dir (below _concurrent)
> because they are neither mapped nor members of an array.
>
> In awf3.swift I switched to an array, and they were nicely (albeit verbosely
> ;) mapped to an array structure automatically.
>
> In awf4.swift I name the outputs, and the files are now nicely named but all
> reside back in the client submit directory.
>
> Now I want to make awf5, and spread named inputs and outputs across dirs. I
> recall suggesting a way to do this to Andrew, but didint track how he and you
> did it, Ben.
>
> Andrew, can you send me your latest swift code?
>
> Ben, Mihael, is the best way to do this to manually spread the inputs across a
> dirs, and map both the inputs and outputs using readdata?
>
> angleinput/{00 through 99}/pcNNNN.pcap
>
> angleout/{00 through 99}/ofNNNN.angle,cfNNNN.center}
>
> I need to focus on a few admin things for a bit, but any/all advice is
> welcome.
>
>
>
> ::::::::::::::
> awf2.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
>
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
> app { angle4 @ifile @ofile @cfile; }
> }
>
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
>
> foreach pf in pcapfiles {
> angleout of;
> anglecenter cf;
> (of,cf) = angle4(pf);
> }
> ::::::::::::::
> awf3.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
>
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
> app { angle4 @ifile @ofile @cfile; }
> }
>
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
>
> angleout of[];
> anglecenter cf[];
>
> foreach pf,i in pcapfiles {
> (of[i],cf[i]) = angle4(pf);
> }
> ::::::::::::::
> awf4.swift
> ::::::::::::::
> type pcapfile;
> type angleout;
> type anglecenter;
>
> (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile)
> {
> app { angle4 @ifile @ofile @cfile; }
> }
>
> pcapfile pcapfiles[]<filesys_mapper; prefix="pc", suffix=".pcap">;
>
> angleout of[] <simple_mapper;prefix="of",suffix=".angle">;
> anglecenter cf[] <simple_mapper;prefix="cf",suffix=".center">;
> // note i used .angle for both in current tests...
>
> foreach pf,i in pcapfiles {
> (of[i],cf[i]) = angle4(pf);
> }
>
>
>
> On 11/1/07 11:57 AM, Ben Clifford wrote:
> > I just modified the way that ConcurrentMapper lays out files (r1437)
> >
> > You will likely not have encountered ConcurrentMapper by name. It is used
> > when you do not specify a mapper for a dataset, for example for intermediate
> > variables.
> >
> > Previously, all files named by this mapper were given a long name in the
> > root directory of the submit and cache directories.
> >
> > When a large number of files were named in this fashion, for example in an
> > array with thousands of elements, this would result in a file for each
> > element and a root directory with thousands of files.
> >
> > Most immediately I encountered this problem working with Andrew Jamieson
> > running on TeraPort using GPFS. Many hosts attempting to access one
> > directory is severely unscalable on GPFS.
> >
> > The changes I have made add more structure to filenames generated by the
> > ConcurrentMapper:
> >
> >
> > 1. All files appear in a _concurrent/ subdirectory.
> >
> >
> > 2. Simple/marker data typed files appear directly below _concurrent, named
> > as before. For example:
> >
> > file outfile;
> >
> > might give a filename:
> >
> > _concurrent//outfile-3339612a-08e1-443d-bd14-2329080d2d94-
> >
> >
> > 3. Structures are mapped to a sub-directory, with each element being a file
> > in that subdirectory. For example,
> >
> > type footype { file left; file right; }
> > footype structurefile;
> >
> > might give a directory:
> >
> > _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field
> >
> > containing two files:
> >
> > _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/left
> > _concurrent//structurefile-c68b99dc-de3c-4288-822f-2ab3d4dc6427--field/right
> >
> >
> > 4. Array elements are placed in a subdirectory. Within that subdirectory,
> > the index is using to construct a further hierarchy such that there will
> > never be more than 50 directories/files in any one directory. For example:
> >
> > file manyfile[];
> >
> > might give mappings like this:
> >
> > myfile[0] stored in:
> > _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-0
> >
> > myfile[22] stored in:
> > _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/elt-22
> >
> > myfile[30] stored in:
> > _concurrent//manyfile-0b91d809-37f5-46da-91c8-6c4a9157b06b--array/h5/elt-30
> >
> > myfile[734] stored in:
> > _concurrent//manyfile-bcdeedee-4df7-4d21-a207-d8051da3d133--array/h9/h4/elt-734
> >
> > To form the paths, basically something like this happens:
> > convert each number into base 25. discard the most significant digit. then
> > starting at the least significant digit and working towards the most
> > significant digit, make that digit into a subdirectory.
> >
> > For example, 734 in base 10 is (1) (4) (9) in base 25
> >
> > so we form intermediate path /h9/h4/
> >
> > Doing this means that for large arrays directory paths will grow, whilst for
> > small arrays will be short; and the size of the array does not need to be
> > known ahead of time.
> >
> > The constant '25' can easily be adjusted. Its a compiled-in constant defined
> > in one place at the moment, but could be made into a mapper parameter.
> >
>
>
More information about the Swift-devel
mailing list