[Swift-devel] Re: How to wait on functions that return no data?

Wed Mar 26 10:51:33 CDT 2008

> I suspect you're not going to like this idea on first consideration. But 
> its related to ideas on how to leverage map-reduce, as I mentioned 
> earlier, and Ian's suggestion to explore collective operations. Mihael 
> thought my take on this was inelegant and inconsistent with data flow.

Somewhat. What I thought you suggested was pretty much "I don't want to
write my program as dataflow but I want to implement it in a dataflow
language". "And if it doesn't work, then the language should be changed
so that I can".

[...]

> 
> If a swift job could efficiently return a set of swift objects without 
> using a file

In the context of Globus, it seems a bit difficult.

>  (specifically without placing files back in the shared 
> directory) then many of these apps could work beautifully, by returning 
> strings or numeric objects, possibly as structs and/r arrays, that 
> travel back through the job submission interface rather than getting 
> fetched via the data provider. If a cluster of jobs could return data 
> efficiently in a single "package" from the cluster, then we could pretty 
> readily do map-reduce in swift, efficiently, in perfect concordance with 
> the current dataflow model.

One more time: we CAN do map-reduce in Swift. Stop saying we can't.
Please. It's getting silly.

The efficiency issue comes from the fact that the overhead for
distributing very very very small tasks across a wide area network is
very high compared to the task run time. And in the current Swift
implementation it is higher than in the implementation you seem to think
of.

> 
> Perhaps this later approach is the best to consider: I suspect it could 
> be readily implemented, could use a simple file to contain an arbitrary 
> set of swift object return values, possibly in a format similar to that 
> of readdata().

How is this different from the current scheme (besides the data files
being in a different format)?

> 
> - Mike
> 
> 
> 
> 
> 
> 
> 
> On 3/25/08 6:04 PM, Ben Clifford wrote:
> > On Tue, 25 Mar 2008, Michael Wilde wrote:
> > 
> >> From a pure language point of view, we should permit the return of data that
> >> can be grouped (batched) into files files in arbitrary chunks, determined and
> >> optimized by the implementation. Map-reduce tuples seem to work well for this
> >> model, and it seems that Swift could encompass it with minimal semantic change
> >> to the current language.
> > 
> > For your example, what way do you want to store the data on the remote 
> > side - I'm assuming not individual files.
> > 
> > The present dataset model should fairly easily accomodate the description 
> > of places to store data that aren't files - there's an abstraction in the 
> > implementation to help with that at the moment (DSHandle, which is what 
> > deals with the difference between in-memory values and on-disk files; and 
> > could fairly straightforwardly deal with other storage forms).
> > 
> > One of the project ideas I put in for the google summer of code was to 
> > play around with this, in fact.
> > 
>