[Swift-devel] on the semantics of 'array closing'

Mike Wilde wilde at mcs.anl.gov
Sat Jun 16 10:50:25 CDT 2007


also to note: Ian has suggested several times that we explore 
map-reduce.  I think this is worth doing: its possible/likely that 
swift is already pretty close to m-r in many ways, and could benefit 
from a more detailed comparison and assessment of what we can 
borrow, adapt, and/or integrate.

We should use this as a chance to create a "swift library" page 
where we post good papers that we can cite in our discussions to get 
ourselves on a common page.

Some of these might be good material for Thu Grad seminar discussins 
as well.

- Mike


Mike Wilde wrote, On 6/16/2007 10:05 AM:
> Hi all,
> 
> I'm jumping in late; I re-read the thread a few times but may have 
> missed something. So correct me as needed. Also, rather than spending 
> more time polishing the thoughts below I just put them out here for 
> discussion.
> 
> This discussion seems to me very important, as it can close down several 
> of the major open issues that are very critical to the language, both to 
> give it complete and consistent semantics and to make it practical fr 
> the problems that we are applying it to.
> 
> Four important but missing aspects of this discussion are: pipelining, 
> error handing, restart, and mapping.
> 
> I feel that swift needs the following semantics:
> 
> 1. Pipelining:
> 
> The data dependency aspects of swift are carried out at the atomic level 
> in a pipelined manner.
> 
>  -- elements of an array are written into the stream
> 
>  -- readers of the array consume the stream
> 
>  -- the entire program remains active in parallel, across function 
> boundaries
> 
> Array elements [k,v] are identified by their index, k, which can be an 
> int or string.
> 
> 2. Error handling
> 
> In practice, many large-scale foreach() operations will never complete, 
> yet they will deliver a lot of useful results that we want subsequent 
> statements in a program to continue to operate on. Thus closing needs to 
> permit different criteria other than just "finishing".
> 
> An array is "closed" when its producer function/foreach "shuts down".  
> Can we permit shutdown/closing to occur based on finishing, time, or 
> quota/threshold.  These would be parameters of the foreach statements 
> that could be overridden.
> 
> (For some practical examples, see map-reduce; it has similar problems: 
> parallel computations reach a level whwre there is lots of parallelism, 
> and as it proceeds, gets to a poiunt where only the "stragglers" are 
> left - things waiting in slow queues or for hung data transfers, etc.  
> Ive read this in m/r papers, and found that our experiences match those 
> reported by the google m/r people).
> 
> 3. Restart
> 
> We want computations to be restartable.  If 50% of a large array/dataset 
> gets created in a 10-hour run, and then fails, we want the run to be 
> restartable and continue where it left of with minimal  lost of 
> "completed" results.
> 
> 4. Mapping
> 
> Lastly, swift mapping should be connected to this whole process: the 
> mapped contents of a dataset should be a stream of xml elements rather 
> than a "completed" xml document, so that we can practically handle very 
> large datasets.  So when a foreach() statement processes  a array, its 
> processing the mapped stream of the array. mappers should be parallel 
> processes that produce and consume these streams of xml elements.
> 
> - Mike
> 
> 
> 
> 
> Ben Clifford wrote, On 6/16/2007 8:34 AM:
>>
>> On Sat, 16 Jun 2007, Mihael Hategan wrote:
>>
>>>> It works because Swift implicitly marks arrays returned from 
>>>> compound procedures as closed (which may or may not be correct).
>>> We defined it as correct. Something created in one scope cannot be
>>> modified in a parent scope.
>>
>> That's fine - what was unintuitive to me was that something created in 
>> one scope cannot be referred to in that same scope. i.e. you can 
>> create a piecewise using a[...]=... but cannot then refer to a.
>>
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997



More information about the Swift-devel mailing list