[Swift-devel] Associative array in Swift [GSoC]

Sun Jun 19 03:18:07 CDT 2011

On Sat, 2011-06-18 at 23:32 -0700, Tim Armstrong wrote:
> Maybe I missed something, but from the discussion I'm not quite clear
> about what the use case is.
> 
> It seems like there are three possibilities for the data structure:
> 
> 1 Order is completely unimportant (i.e. the data structure is a set or
> multi-set)

That is essentially the case.  Arrays are maps. However Ben points out
the issue of restart logs. So let me explain that a bit.

There can be different dynamic scopes for variables. Consider the
following example:

function g(a) {
  for v in a {
    file x = f(v);
    ...
  }
}

And imagine that some of the iterations have completed invoking f(v) and
some haven't. Also picture that there are multiple parallel invocations
of g(). The question is how can one record and distinguish which x-es
have been computed and which haven't.

One relatively simple answer is that given that all invocations of g()
are parallel and all iterations are parallel, one can picture each scope
as a node in a tree, and each spawned thread in that scope as a branch.
If branches are numbered sequentially then it would be sufficient to
record the path in the tree from the root to the scope of the x we're
interested in. We call this path a "thread prefix" and its property is
that no two different scopes have the same thread prefix. This is
something internal to swift.

The issue that can arise however is that if, upon a restart, the
correspondence between the thread prefix and the actual iteration value
is not kept, then we may end up with mismatched stuff (this is only an
issue with iterations, since different invocations of g() are labeled
based on their lexical position) So either we make sure that the order
of iteration is the same, or alternatively use the actual iteration
index to construct the thread prefix.

We do the latter. So normally there is no issue since every new array
index is a swift expression which is deterministic.

But if we do:
file a[];
for i in [1:2] {
  a = a + [f(i)];
  // instead of a[i] = f(i);
}

then the order in which things are put in a is nondeterministic (it is
in both cases). In the commented out case one would log that, say, a[1]
in thread-prefix 0 (the variable was declared in the global scope)  is
already calculated. Which is fine. A restart would check that a[1] was
calculated and it could only have the value of f((1).

In the non-commented out case, well, it depends on whether you assign
based on iteration sequence or completion of f() sequence, but in either
case it's nondeterministic, so a[0] could contain either f(1) or f(2).
Simply checking that a[0] was calculated is not sufficient to restore
the state. The correspondence between a[0] and f(1) or f(2) also needs
to be established. That makes things more complicated and we don't want
that. Also, it makes swift programs nondeterministic and we don't want
that either.

> 2 Relative order is important (i.e. if the assignment happens in a
> later iteration of a for loop, it should be later in the list)
> 3 Absolute order is important - i.e. we're using explicit array
> indices
> 
> It seems like you're proposing 2, which seems a bit odd to me in the
> context of swift as it adds some kind of sequential dependency between
> iterations of a for loop when determining array indices.

I guess what's being proposed, and I'm beginning to forget what the
overall goal was, is a way to not care about indices. And when you don't
care about them, they may as well be sequential since it's
straightforward to implement. I mean I don't see why a random sequence
is better then a sequence here. The random sequence is essentially a
hash of a sequence, so I only see it as introducing nonsense. We're not
trying to "secure" the indices.

But looking back at the initial email, I think that we should dig into
the assumptions that prompted this discussion. There has been a lot of
talking on how to do this, but almost nothing on why we should do it.
> 
> The idea of building a nested (ragged) array and then flattening it
> seems more swift-like to me.
> 
> Ie. 
>  foreach i in a {
>    foreach j in b {
>      output[i][j] = f(i,j)
>    }
>  }
> file output2[];
> output2 = flatten(output)
> 
> That is kind of clunky though.  Would it be possible to have some kind
> of syntax for implicitly flattening arrays, e.g.:
> 
> file output2[];
> foreach i in a {
>   foreach j in b {
>      output[i,j] = f(i,j)
>   }
> }
> 
If you have an upper bound for j, then I guess i*n + j would work.
Alternatively if we allow array keys or tuple keys, output[[i,
j]]/output[(i, j)] might work. Or we could provide some magic injective
f(i, j) to be used as index for the flattened array.

Mihael