[Swift-devel] Use case and examples needed to avoid large directories
Michael Wilde
wilde at mcs.anl.gov
Fri Sep 28 18:32:09 CDT 2007
Andrew Jamieson reviewed the needs of his application with me today and
we noted the following:
When run under VDS, a showstopper problem for TeraPort, which is running
GPFS, was that too many files needed to be created in a single output
directory. The observed behavior was that when more than around 200
files were placed by parallel jobs into a single output directory, the
rate of file creation was so slow that the overall workflow speed was
badly impacted. I dont know if thats GPFS in general or TeraPort in
particular, but in old VDS days we saw the same behavior for GADU
workflows on Jazz, at the same low threshold of files-per-dir.
I mentioned the large-number-of-files-per-directory problem to Mihael.
He says "already solved": if you break your input data up in that
manner, the temp directories on the execution nodes that hold that data
will have the same structure.
I'd like to ask about this in a bit more detail.
Do we still need some "magic" in the mapper to make sure that
intermediate and output files are similarly structured?
Is there a description anywhere in the Swift docs on how data caching,
file naming, temporary dir creation, and data transfer is handled in
Swift, and how properties and mappers affect things? Ben, as you work
on the mapper text and tutorial examples, is this a good section to
document that in?
- Mike
More information about the Swift-devel
mailing list