[Swift-devel] Use case and examples needed to avoid large directories
Mihael Hategan
hategan at mcs.anl.gov
Fri Sep 28 18:38:22 CDT 2007
My point was that whatever you do with the mappers on the client side
will be reproduced on the server side. For example, if your input data
would be broken like this:
0/00.dat
0/01.dat
...
0/09.dat
1/10.dat
...
9/99.dat
and your output data would be split in a similar way, then on the server
side these things would go into shared/0/00.dat... shared/9/99.dat and
whatever dirs your output data is in.
Getting mappers to do this in the first place is another matter, which
eludes me at the moment.
Mihael
On Fri, 2007-09-28 at 18:32 -0500, Michael Wilde wrote:
> Andrew Jamieson reviewed the needs of his application with me today and
> we noted the following:
>
> When run under VDS, a showstopper problem for TeraPort, which is running
> GPFS, was that too many files needed to be created in a single output
> directory. The observed behavior was that when more than around 200
> files were placed by parallel jobs into a single output directory, the
> rate of file creation was so slow that the overall workflow speed was
> badly impacted. I dont know if thats GPFS in general or TeraPort in
> particular, but in old VDS days we saw the same behavior for GADU
> workflows on Jazz, at the same low threshold of files-per-dir.
>
> I mentioned the large-number-of-files-per-directory problem to Mihael.
> He says "already solved": if you break your input data up in that
> manner, the temp directories on the execution nodes that hold that data
> will have the same structure.
>
> I'd like to ask about this in a bit more detail.
>
> Do we still need some "magic" in the mapper to make sure that
> intermediate and output files are similarly structured?
>
> Is there a description anywhere in the Swift docs on how data caching,
> file naming, temporary dir creation, and data transfer is handled in
> Swift, and how properties and mappers affect things? Ben, as you work
> on the mapper text and tutorial examples, is this a good section to
> document that in?
>
> - Mike
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list