[Swift-devel] playing with array closing.

Tue Nov 20 00:04:59 CST 2007

I spent some time at the weekend and today playing with 'the array closing 
problem'.

The array-closing problem is what happens when we combine 
single-assignment semantics (which say that you will only write a=foo; 
once for each variable a) with our array assignment semantics (which say 
that arrays are populated by multiple assignments, a[0]=foo; a[1]=bar;).

Below, exhibit A, is a program which does not work in the present 
trunk implementation - instead it hangs after executing 
top-level statements R,S,T and before executing statement W.

Statement W will not be executed until the array name 'array' is closed, 
that is, until it is known that there are no further writes to the array.

So I prototyped some compile-time dataflow analysis (a bit like the 
present input marking code that already exists) to see that statements 
R,S,T write (or potentially write to) 'array' and that no other statements 
do.

Armed with this knowledge, the compiled karajan code is modified so that:
 i) when datasets are created (using vdl:new) they are labelled with a 
list of statements that may write to them.
 ii) those statements are modified so that they notify the appropriate 
datasets when they have finished.

So each statement issues a partial close on the datasets it writes to, and 
each dataset is aware which partial closes to expect.

When a dataset has received partial closes (at runtime) from everything it 
is expecting (which is determined at compile time), it becomes fully 
closed.

In the example code, that means that statement W's dependency on the array 
being closed is now satisfied, and so it is executed, and so this workflow 
ends.

Its not so straightforward - for example, statement U writes to the array 
several times, and we don't want the first write to do the corresponding 
partial close. So the above processing happens only for statements in the 
same scope as the declaration. In the case of sub-scopes, such as inside a 
foreach, partial closes don't happen, but the enclosing statement (foreach 
in the example below) are treated as a single statement which completes 
and closes only when the whole loop is finished.

I think this is the right approach to pursue for this problem.

Also, I think that this implementation could join up with the present 
dataset marking code (which is used to determine what is an input and what 
is not), and also be used for better compile time type checking and 
related things (eg. checking for variables declared multiple times, 
variables assigned to multiple times when they shouldn't be, ...)

==== EXHIBIT A, being a program which does not work in the present trunk 
implementation ====
type file;

(file f) writefile(int s) {
  app {
    echo s stdout=@f;
  }
}

(file f) listvals(file array[]) {
  app {
    echo @filenames(array) stdout=@f;
  }
}

file array[];                    (Q)

array[0]=writefile(99999);       (R)
array[1]=writefile(10000);       (S)

foreach i in [2:5] {             (T)
  array[i]=writefile(i+80);      (U)
}

file out <"out">;                (V)

out = listvals(array);           (W)