[Swift-devel] Use case and examples needed to avoid large directories

Michael Wilde wilde at mcs.anl.gov
Sun Sep 30 14:44:05 CDT 2007


This is a very intriguing idea, but a complex one that I'd pursue 
further out in the future.

The approach, however, is exactly what we need to manually code in some 
cases.  This was true, for example, in the "softmean" level of many of 
the fmri examples, where softmean needs to average a large set of 
inputs, and its behavior is, we thing, associatve.

I think we should work on this in three steps:

1) work on the mapper behavior that Ben described and make sure that we 
can represent a large dataset, with too many members to perform well in 
one directory, as a tree of file.  This step would involve making sure 
we can maintain a consistent tree representation regardless of whether 
the files are being produced or consumed by individual programs of by a 
single program. (E.g, in Andrew's workflow, they are processed by a 
sequence of individual invocations and then consumed later by a single 
invocation.

2) Make it possible, through some combination of swift and external 
scripting, to pass a long list of files to a program through a single 
file. Since the target program is often coded to expect its input on the 
command line, this requires a wrapper.  The question is whether that 
wrapper can or should be in swift.  Seems simplest if its not in swift, 
and we focus instead on making sure that swift, through mappers and 
built-in functions, can hande the various dataset representations needed 
for this.  (tree-of-files, and file containining list of names of files 
in that tree).

3) At a later date, explore Mihael's idea, if we see that this is a 
common enough pattern to automate in this manner.

I suspect from Ben's various clarifications that we can do 1 and 2 above 
with little or no language change.  Ben, Andrew, Im eager to see how you 
solve this and would love to see it in the language docs.  Seems like we 
could use a chapter called "Swift Cookbook" for handling common 
application situations.

- Mike


On 9/29/07 12:22 PM, Mihael Hategan wrote:
> On Sat, 2007-09-29 at 12:14 -0500, Michael Wilde wrote:
>> - cmd lines to big for linux
>>
> 
> Maybe we can somehow mark applications that are associative in one of
> its vector parameters.
> 
> combine "-type" "x" "-files" @associative(@f) @result(@out);
> 
> This would mean that Swift has the liberty of, say for f = 0...1000, to
> break things into:
> 
> combine -type x -files 0...500 tmp1
> combine -type x -files 501...1000 tmp2
> combine -type x -files tmp1 tmp2 out
> 
> Or something like that.
> 
>> - Mike
>>
>> On 9/29/07 3:53 AM, Ben Clifford wrote:
>>> On Fri, 28 Sep 2007, Mihael Hategan wrote:
>>>
>>>> Getting mappers to do this in the first place is another matter, which
>>>> eludes me at the moment.
>>> Likely a custom mapper if you want a whole tree mapped into a structure. 
>>> Mapping pieces of any one (sub)directory should be possible, at least in 
>>> basic form, with the present mappers.
>>>
>>> Mapping a whole tree would not be hugely different from the simple_mapper 
>>> (although it would be some modification). But I'd be interested on working 
>>> with Andrew to get something done there that isn't a hack.
>>>
> 
> 



More information about the Swift-devel mailing list