[Swift-devel] on the semantics of 'array closing'

Sat Jun 16 10:05:39 CDT 2007

Hi all,

I'm jumping in late; I re-read the thread a few times but may have 
missed something. So correct me as needed. Also, rather than 
spending more time polishing the thoughts below I just put them out 
here for discussion.

This discussion seems to me very important, as it can close down 
several of the major open issues that are very critical to the 
language, both to give it complete and consistent semantics and to 
make it practical fr the problems that we are applying it to.

Four important but missing aspects of this discussion are: 
pipelining, error handing, restart, and mapping.

I feel that swift needs the following semantics:

1. Pipelining:

The data dependency aspects of swift are carried out at the atomic 
level in a pipelined manner.

  -- elements of an array are written into the stream

  -- readers of the array consume the stream

  -- the entire program remains active in parallel, across function 
boundaries

Array elements [k,v] are identified by their index, k, which can be 
an int or string.

2. Error handling

In practice, many large-scale foreach() operations will never 
complete, yet they will deliver a lot of useful results that we want 
subsequent statements in a program to continue to operate on. Thus 
closing needs to permit different criteria other than just "finishing".

An array is "closed" when its producer function/foreach "shuts 
down".  Can we permit shutdown/closing to occur based on finishing, 
time, or quota/threshold.  These would be parameters of the foreach 
statements that could be overridden.

(For some practical examples, see map-reduce; it has similar 
problems: parallel computations reach a level whwre there is lots of 
parallelism, and as it proceeds, gets to a poiunt where only the 
"stragglers" are left - things waiting in slow queues or for hung 
data transfers, etc.  Ive read this in m/r papers, and found that 
our experiences match those reported by the google m/r people).

3. Restart

We want computations to be restartable.  If 50% of a large 
array/dataset gets created in a 10-hour run, and then fails, we want 
the run to be restartable and continue where it left of with minimal 
  lost of "completed" results.

4. Mapping

Lastly, swift mapping should be connected to this whole process: the 
mapped contents of a dataset should be a stream of xml elements 
rather than a "completed" xml document, so that we can practically 
handle very large datasets.  So when a foreach() statement processes 
  a array, its processing the mapped stream of the array. mappers 
should be parallel processes that produce and consume these streams 
of xml elements.

- Mike

Ben Clifford wrote, On 6/16/2007 8:34 AM:
> 
> On Sat, 16 Jun 2007, Mihael Hategan wrote:
> 
>>> It works because Swift implicitly marks arrays returned from compound 
>>> procedures as closed (which may or may not be correct).
>> We defined it as correct. Something created in one scope cannot be
>> modified in a parent scope.
> 
> That's fine - what was unintuitive to me was that something created in one 
> scope cannot be referred to in that same scope. i.e. you can create a 
> piecewise using a[...]=... but cannot then refer to a.
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997