[Swift-devel] Use case and examples needed to avoid large directories

Michael Wilde wilde at mcs.anl.gov
Sat Sep 29 12:14:24 CDT 2007


Thanks for the mapper details - I need to digest them.  Would be nice to 
create a simple example job that illustrates the issues.

I *think* that in Andrews workflow, the problem will occur when a large 
dataset of files generated by parallel foreach jobs get read in by a 
single job.

There, we had to consider several issues in VDS:
- too many files per dir
- cmds lines too bug for condor
- cmd lines to big for linux

Sounds like the mapper can probably handle the first issue; the second 
goes away with swift, and the third can probably be handled by creating 
a script that takes a huge list of filenames in a file, and invokes the 
app as many times as needed to process the whole list (ie in batches of 
N). This in turn will need yet another mapper to handle ( a 
list-of-files-file mapper that also avoids big single directories by 
using a dir tree).

I suspect that Andrew's initial tests will not exceed the Linux cmd line 
length.  But later tests might: 8000+ files etc.)

- Mike

On 9/29/07 3:53 AM, Ben Clifford wrote:
> On Fri, 28 Sep 2007, Mihael Hategan wrote:
> 
>> Getting mappers to do this in the first place is another matter, which
>> eludes me at the moment.
> 
> Likely a custom mapper if you want a whole tree mapped into a structure. 
> Mapping pieces of any one (sub)directory should be possible, at least in 
> basic form, with the present mappers.
> 
> Mapping a whole tree would not be hugely different from the simple_mapper 
> (although it would be some modification). But I'd be interested on working 
> with Andrew to get something done there that isn't a hack.
> 



More information about the Swift-devel mailing list