[Swift-devel] Standard Library Proposal/Discussion Page

Tim Armstrong tim.g.armstrong at gmail.com
Wed Jan 21 17:02:14 CST 2015


I see your point with dense/sparse array errors - there is the prospect of
confusing errors in either case.  I think with dense arrays, the intent is
clearer - the users knows they have to assign elements [0..N-1] and the
function knows it can expect elements [0..N-1].  With sparse arrays the
user has might readily expect the function to "do the right thing" with the
input, since the types check out - getting back to the root cause of the
problem would require more steps of inference to get back to the root
problem - that the function doesn't do what they expect.

That transformation of adding an implicit else branch is really just a way
of implementing sequential semantics by adding data dependencies to
implement the sequential control dependency.  The other example of closing
an array is a pure data dependency on the entire value of A rather than a
control flow dependency.

I don't see Swift as a sequential language that we're autoparallelising
though.  For one, we let statements with side-effects execute out of order
as a general rule.  Constructs like foreach are also explicitly allowed to
execute out of order.  I.e. there is no total order in the program in which
statements have to execute, there's just a partial order defined by data
flow.  To work out which version of a variable is being referred to
requires defining a total order.

For example, if we involve loops:

versioned x = 1;

foreach i in A {
   if (rand()) {
     x = i;
   }
}

I'm going to assume the intended behavior is that the loop iterations
execute in some sequential order and each iteration depends on the
previous.  But this requires that a) the keys of A have a defined order,
and b) we don't start iterating over A until we know what the "first"
element is, i.e. essentially until A is closed.

But that doesn't really square with the idea that iterations of a foreach
loop are independent that previously existed.   This seems to require that
the presence of a variable of a particular type changes the semantics of an
enclosing control flow structure. This is really bizarre from a language
design point of view.

I just think if we want sequential control flow we need to be realistic
about the pros and cons that entails.

- Tim


On Wed, Jan 21, 2015 at 3:55 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> On Wed, 2015-01-21 at 15:04 -0600, Tim Armstrong wrote:
> > I'm assuming that the size of a dense array would be specified when its
> > initialized.  Then a dense array of size n behaves the same as n
> individual
> > variables - there's no reason you have to assign them in order, or
> iterate
> > over them in order unless that's what you actually want/need to do.  I
> > agree that trying to have dense arrays that automatically expand in a
> > parallel language is a bad idea.
>
> I think what I said was in reply to your question whether dense arrays
> would solve the problem. The answer is it would probably help, but it
> would not lead to ideal solutions. The point would have been that you
> can use them to emulate tail() by passing a starting index to a
> recursive function. You need a deterministic way of going through all
> the indices. Starting at 0 and incrementing is just the most obvious
> example. However, the creation of the elements is nondeterministic so
> the order in which elements are created and the order in which an
> algorithm would "iterate" through such an array would never be the same
> except for unlikely accidents.
>
> The way things are iterated now is with foreach which is also
> nondeterministic in the way it picks the order, but it roughly matches
> the order of production.
>
> >
> > I think it's really worth considering, because for a lot of standard
> > library functions, bad or confusing things will happen if arguments are
> > sparse arrays.  Of course we can document all of these functions to warn
> > users about doing that, but that requires them to a) actually read the
> docs
> > b) understand the distinction c) remember this when they're writing code
> > and d) be able to trace weird deadlocks or error messages back to the
> > original cause when they make an inadvertent mistake.  I don't mean this
> as
> > an insult to users - I could see myself doing all of those things.  It
> > seems very suboptimal.
>
> Can you be more specific about "bad or confusing things"? One obvious
> thing is that a piece of code will access an array element that no other
> piece of code will ever write. I think this can be detected at array
> closing time. The same applies to dense arrays: you cannot in general
> guarantee that all elements in a dense array will be assigned. So you
> are back to square one.  In order to avoid deadlocks, you would still
> need to track all writes and determine when no more writes to the sparse
> array will happen (essentially the current closing semantics) and then
> check if any readers are blocked on the array.
>
> >
> > This is really a separate discussion that doesn't affect the standard
> > library, but I think the updateable idea, at least in the form that was
> > proposed on the phone call, is flawed and actually is equivalent to
> > changing Swift from a language with parallel semantics to a language with
> > sequential semantics that we're trying to auto-parallelise.  I don't
> think
> > we want to try to solve the autoparallelisation problem.
>
> That is what we are doing IMO and we are letting futures do the work for
> us. An elegant solution that avoids complex compilers to a large extent.
> I do, however, agree that we should not try to solve the problem of
> auto-parallelizing C-like languages unless it turns out that we can do
> so *easily* and there is a reasonable justification for it.
>
> >  It's easy to
> > autoparallelise trivial examples but as soon as a programmer uses
> something
> > exotic like an if statement (or an if statement inside a loop, or many
> more
> > possibilities) it becomes much harder.  I can provide examples that will
> > break any specific approach if needed.  For example, the "versioned"
> > variables idea doesn't work for this code:
> >
> > versioned int x = 1;
> >
> > if (...) {
> >   x = 2;
> > }
> >
> > trace(x);
> >
> > Does x at the bottom refer to version 0 of the variable or version 1 of
> the
> > variable?  You can't determine that statically.  You might say that you
> > could define the semantics so that the version x it's referring to
> depends
> > on which branch is taken at runtime.
>
> Perhaps. Your example reminds me of the issue of keeping track of the
> number of writers.
>
> int x = 1;
>
> if (...) {
>   x = 2;
> }
>
> After how many writes do you close x?
>
> This is currently solved by the insertion of a balancing else branch:
>
> if (...) {
>   x = 2;
> }
> else {
>   partialClose(x);
> }
>
> In other words, the compiler keeps track of how many writes are in both
> branches and inserts closing statements to bring them in sync. I don't
> see why that couldn't be used for other purposes:
>
> if (...) {
>   x = 2;
> }
> else {
>   x = x;
> }
>
> It is in a sense what your example code means: "if the condition is
> true, make x two, otherwise implicitly let x be the same as before".
>
>
> >   But then you've reinvented sequential
> > control flow and all that entails.  Once you get loops involved it is
> much
> > worse too - it's probably not that hard to come up with an extension that
> > works ok for this specific example, but adding loops to the equation
> breaks
> > it further.
>
> Maybe. Or maybe it's in the same class of problems as the writes. I'm
> not sure. I was floating the idea by.
>
> How about just providing function pointers and reduce()? Solves min/max,
> sum, avg, etc. Does not solve finding all sub-strings in a string.
>
> Mihael
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150121/3654ba60/attachment.html>


More information about the Swift-devel mailing list