[Swift-devel] on the semantics of 'array closing'
Mike Wilde
wilde at mcs.anl.gov
Sat Jun 16 10:05:39 CDT 2007
Hi all,
I'm jumping in late; I re-read the thread a few times but may have
missed something. So correct me as needed. Also, rather than
spending more time polishing the thoughts below I just put them out
here for discussion.
This discussion seems to me very important, as it can close down
several of the major open issues that are very critical to the
language, both to give it complete and consistent semantics and to
make it practical fr the problems that we are applying it to.
Four important but missing aspects of this discussion are:
pipelining, error handing, restart, and mapping.
I feel that swift needs the following semantics:
1. Pipelining:
The data dependency aspects of swift are carried out at the atomic
level in a pipelined manner.
-- elements of an array are written into the stream
-- readers of the array consume the stream
-- the entire program remains active in parallel, across function
boundaries
Array elements [k,v] are identified by their index, k, which can be
an int or string.
2. Error handling
In practice, many large-scale foreach() operations will never
complete, yet they will deliver a lot of useful results that we want
subsequent statements in a program to continue to operate on. Thus
closing needs to permit different criteria other than just "finishing".
An array is "closed" when its producer function/foreach "shuts
down". Can we permit shutdown/closing to occur based on finishing,
time, or quota/threshold. These would be parameters of the foreach
statements that could be overridden.
(For some practical examples, see map-reduce; it has similar
problems: parallel computations reach a level whwre there is lots of
parallelism, and as it proceeds, gets to a poiunt where only the
"stragglers" are left - things waiting in slow queues or for hung
data transfers, etc. Ive read this in m/r papers, and found that
our experiences match those reported by the google m/r people).
3. Restart
We want computations to be restartable. If 50% of a large
array/dataset gets created in a 10-hour run, and then fails, we want
the run to be restartable and continue where it left of with minimal
lost of "completed" results.
4. Mapping
Lastly, swift mapping should be connected to this whole process: the
mapped contents of a dataset should be a stream of xml elements
rather than a "completed" xml document, so that we can practically
handle very large datasets. So when a foreach() statement processes
a array, its processing the mapped stream of the array. mappers
should be parallel processes that produce and consume these streams
of xml elements.
- Mike
Ben Clifford wrote, On 6/16/2007 8:34 AM:
>
> On Sat, 16 Jun 2007, Mihael Hategan wrote:
>
>>> It works because Swift implicitly marks arrays returned from compound
>>> procedures as closed (which may or may not be correct).
>> We defined it as correct. Something created in one scope cannot be
>> modified in a parent scope.
>
> That's fine - what was unintuitive to me was that something created in one
> scope cannot be referred to in that same scope. i.e. you can create a
> piecewise using a[...]=... but cannot then refer to a.
>
--
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL 60439 USA
tel 630-252-7497 fax 630-252-1997
More information about the Swift-devel
mailing list