[Swift-devel] Standard Library Proposal/Discussion Page

Wed Jan 21 15:55:22 CST 2015

On Wed, 2015-01-21 at 15:04 -0600, Tim Armstrong wrote:
> I'm assuming that the size of a dense array would be specified when its
> initialized.  Then a dense array of size n behaves the same as n individual
> variables - there's no reason you have to assign them in order, or iterate
> over them in order unless that's what you actually want/need to do.  I
> agree that trying to have dense arrays that automatically expand in a
> parallel language is a bad idea.

I think what I said was in reply to your question whether dense arrays
would solve the problem. The answer is it would probably help, but it
would not lead to ideal solutions. The point would have been that you
can use them to emulate tail() by passing a starting index to a
recursive function. You need a deterministic way of going through all
the indices. Starting at 0 and incrementing is just the most obvious
example. However, the creation of the elements is nondeterministic so
the order in which elements are created and the order in which an
algorithm would "iterate" through such an array would never be the same
except for unlikely accidents.

The way things are iterated now is with foreach which is also
nondeterministic in the way it picks the order, but it roughly matches
the order of production.

> 
> I think it's really worth considering, because for a lot of standard
> library functions, bad or confusing things will happen if arguments are
> sparse arrays.  Of course we can document all of these functions to warn
> users about doing that, but that requires them to a) actually read the docs
> b) understand the distinction c) remember this when they're writing code
> and d) be able to trace weird deadlocks or error messages back to the
> original cause when they make an inadvertent mistake.  I don't mean this as
> an insult to users - I could see myself doing all of those things.  It
> seems very suboptimal.

Can you be more specific about "bad or confusing things"? One obvious
thing is that a piece of code will access an array element that no other
piece of code will ever write. I think this can be detected at array
closing time. The same applies to dense arrays: you cannot in general
guarantee that all elements in a dense array will be assigned. So you
are back to square one.  In order to avoid deadlocks, you would still
need to track all writes and determine when no more writes to the sparse
array will happen (essentially the current closing semantics) and then
check if any readers are blocked on the array.

> 
> This is really a separate discussion that doesn't affect the standard
> library, but I think the updateable idea, at least in the form that was
> proposed on the phone call, is flawed and actually is equivalent to
> changing Swift from a language with parallel semantics to a language with
> sequential semantics that we're trying to auto-parallelise.  I don't think
> we want to try to solve the autoparallelisation problem.

That is what we are doing IMO and we are letting futures do the work for
us. An elegant solution that avoids complex compilers to a large extent.
I do, however, agree that we should not try to solve the problem of
auto-parallelizing C-like languages unless it turns out that we can do
so *easily* and there is a reasonable justification for it.

>  It's easy to
> autoparallelise trivial examples but as soon as a programmer uses something
> exotic like an if statement (or an if statement inside a loop, or many more
> possibilities) it becomes much harder.  I can provide examples that will
> break any specific approach if needed.  For example, the "versioned"
> variables idea doesn't work for this code:
> 
> versioned int x = 1;
> 
> if (...) {
>   x = 2;
> }
> 
> trace(x);
> 
> Does x at the bottom refer to version 0 of the variable or version 1 of the
> variable?  You can't determine that statically.  You might say that you
> could define the semantics so that the version x it's referring to depends
> on which branch is taken at runtime.

Perhaps. Your example reminds me of the issue of keeping track of the
number of writers.

int x = 1;

if (...) {
  x = 2;
}

After how many writes do you close x?

This is currently solved by the insertion of a balancing else branch:

if (...) {
  x = 2;
}
else {
  partialClose(x);
}

In other words, the compiler keeps track of how many writes are in both
branches and inserts closing statements to bring them in sync. I don't
see why that couldn't be used for other purposes:

if (...) {
  x = 2;
}
else {
  x = x;
}

It is in a sense what your example code means: "if the condition is
true, make x two, otherwise implicitly let x be the same as before".

>   But then you've reinvented sequential
> control flow and all that entails.  Once you get loops involved it is much
> worse too - it's probably not that hard to come up with an extension that
> works ok for this specific example, but adding loops to the equation breaks
> it further.

Maybe. Or maybe it's in the same class of problems as the writes. I'm
not sure. I was floating the idea by.

How about just providing function pointers and reduce()? Solves min/max,
sum, avg, etc. Does not solve finding all sub-strings in a string.

Mihael