[Swift-devel] Use case and examples needed to avoid large directories
Michael Wilde
wilde at mcs.anl.gov
Sat Sep 29 12:14:24 CDT 2007
Thanks for the mapper details - I need to digest them. Would be nice to
create a simple example job that illustrates the issues.
I *think* that in Andrews workflow, the problem will occur when a large
dataset of files generated by parallel foreach jobs get read in by a
single job.
There, we had to consider several issues in VDS:
- too many files per dir
- cmds lines too bug for condor
- cmd lines to big for linux
Sounds like the mapper can probably handle the first issue; the second
goes away with swift, and the third can probably be handled by creating
a script that takes a huge list of filenames in a file, and invokes the
app as many times as needed to process the whole list (ie in batches of
N). This in turn will need yet another mapper to handle ( a
list-of-files-file mapper that also avoids big single directories by
using a dir tree).
I suspect that Andrew's initial tests will not exceed the Linux cmd line
length. But later tests might: 8000+ files etc.)
- Mike
On 9/29/07 3:53 AM, Ben Clifford wrote:
> On Fri, 28 Sep 2007, Mihael Hategan wrote:
>
>> Getting mappers to do this in the first place is another matter, which
>> eludes me at the moment.
>
> Likely a custom mapper if you want a whole tree mapped into a structure.
> Mapping pieces of any one (sub)directory should be possible, at least in
> basic form, with the present mappers.
>
> Mapping a whole tree would not be hugely different from the simple_mapper
> (although it would be some modification). But I'd be interested on working
> with Andrew to get something done there that isn't a hack.
>
More information about the Swift-devel
mailing list