[Swift-devel] on the semantics of 'array closing'

Ian Foster foster at mcs.anl.gov
Sat Jun 16 10:58:13 CDT 2007


Mike:

That's a great summary of requirements.

Ian.

Mike Wilde wrote:
> Hi all,
>
> I'm jumping in late; I re-read the thread a few times but may have 
> missed something. So correct me as needed. Also, rather than spending 
> more time polishing the thoughts below I just put them out here for 
> discussion.
>
> This discussion seems to me very important, as it can close down 
> several of the major open issues that are very critical to the 
> language, both to give it complete and consistent semantics and to 
> make it practical fr the problems that we are applying it to.
>
> Four important but missing aspects of this discussion are: pipelining, 
> error handing, restart, and mapping.
>
> I feel that swift needs the following semantics:
>
> 1. Pipelining:
>
> The data dependency aspects of swift are carried out at the atomic 
> level in a pipelined manner.
>
>  -- elements of an array are written into the stream
>
>  -- readers of the array consume the stream
>
>  -- the entire program remains active in parallel, across function 
> boundaries
>
> Array elements [k,v] are identified by their index, k, which can be an 
> int or string.
>
> 2. Error handling
>
> In practice, many large-scale foreach() operations will never 
> complete, yet they will deliver a lot of useful results that we want 
> subsequent statements in a program to continue to operate on. Thus 
> closing needs to permit different criteria other than just "finishing".
>
> An array is "closed" when its producer function/foreach "shuts down".  
> Can we permit shutdown/closing to occur based on finishing, time, or 
> quota/threshold.  These would be parameters of the foreach statements 
> that could be overridden.
>
> (For some practical examples, see map-reduce; it has similar problems: 
> parallel computations reach a level whwre there is lots of 
> parallelism, and as it proceeds, gets to a poiunt where only the 
> "stragglers" are left - things waiting in slow queues or for hung data 
> transfers, etc.  Ive read this in m/r papers, and found that our 
> experiences match those reported by the google m/r people).
>
> 3. Restart
>
> We want computations to be restartable.  If 50% of a large 
> array/dataset gets created in a 10-hour run, and then fails, we want 
> the run to be restartable and continue where it left of with minimal 
>  lost of "completed" results.
>
> 4. Mapping
>
> Lastly, swift mapping should be connected to this whole process: the 
> mapped contents of a dataset should be a stream of xml elements rather 
> than a "completed" xml document, so that we can practically handle 
> very large datasets.  So when a foreach() statement processes  a 
> array, its processing the mapped stream of the array. mappers should 
> be parallel processes that produce and consume these streams of xml 
> elements.
>
> - Mike
>
>
>
>
> Ben Clifford wrote, On 6/16/2007 8:34 AM:
>>
>> On Sat, 16 Jun 2007, Mihael Hategan wrote:
>>
>>>> It works because Swift implicitly marks arrays returned from 
>>>> compound procedures as closed (which may or may not be correct).
>>> We defined it as correct. Something created in one scope cannot be
>>> modified in a parent scope.
>>
>> That's fine - what was unintuitive to me was that something created 
>> in one scope cannot be referred to in that same scope. i.e. you can 
>> create a piecewise using a[...]=... but cannot then refer to a.
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.




More information about the Swift-devel mailing list