[Swift-user] How to wait on functions that return no data?

Tue Mar 25 11:01:11 CDT 2008

I think there is some confusion here between language and
implementation. 

The language can express the problem just fine. That's why I'm saying
you should change doall() to return an array with all the outputs.

It's the implementation that behaves in a very poor way if the
applications are very fine grained. You seem to be trying to solve the
problem by:
1. Doing some magic with the way files are moved around
2. Convincing Swift that it should work without knowing about data
dependencies, despite the fact that it only works properly if it knows
about all data dependencies. By definition.

There is some middle ground here. It may be possible to let Swift know
what the data dependencies are, but also prevent it from dealing with
certain files, by marking them as "virtual" (or whatever the term).

Mihael

On Tue, 2008-03-25 at 10:45 -0500, Michael Wilde wrote:
> Your view has merits in terms of language purity, but I disagree with it.
> 
> This was posed as an academic question, and I think its interesting to 
> discuss.
> 
> The point here is that there's an application that could best be done by 
> batching up its output, and in fact perhaps by using the map-reduce 
> representation of tuples for that output.
> 
> Its still driven by dataflow and data dependencies, just not the 
> simplistic lock-step dependencies that swift implements today.
> 
> For example, one way to address the problem is to say that batching of 
> function calls, the way swift does today, is helpful but ignores the 
> problem that small tasks often have small data inputs and outputs, and 
> that these should be batched along with the job execution.
> 
> That would leave swift language semantics unchanged, but the 
> implementation would get more efficient and could handle finer-grained 
> tasks.
> 
> An even more efficient and interesting approach, fully in keeping with 
> the language as it stands today, would be to allow tuples to be 
> expressed as inputs and outputs, and to have swift efficiently and 
> automatically route (and batch) tuples in and out of jobs.
> 
> So I view what I was asking for here as a prototype or exploration of 
> that direction.  It would be good to test the performance of an 
> implementation that streamed output tuples into a subsequent ("reduce") 
> stage of processing, before we even consider what the language and/or 
> implementation would need to do for such a case.
> 
> 
> On 3/25/08 10:23 AM, Mihael Hategan wrote:
> ...
>  > Don't use Swift then. Seriously. If you don't want to express things in
>  > a dataflow oriented way, and are not satisfied with its performance for
>  > the given problem, don't use it.
> 
> I want to express things as dataflow, with high performance, in Swift.
> 
> Mike
> 
> 
> On 3/25/08 10:23 AM, Mihael Hategan wrote:
> > On Tue, 2008-03-25 at 10:14 -0500, Michael Wilde wrote:
> >>>> In the example below, I want collectResults() to get invoked after all
> >>  >> the runam() calls complete in doall().
> >>  >
> >>  > results = doall();
> >>  > collectResults(results);
> >>  >
> >>  > Mihael
> >>
> >> But thats the problem: doall() does not in this example return results. 
> > 
> > Then it should be fixed.
> > 
> >> If it would return an artificial result, how would we get such a return 
> >> to wait until all the runam() calls made within the freach() have completed?
> >>
> >> Each of the runam() call runs a small model, and in this proposed 
> >> scenario would leave those results on a local disk for later collection, 
> >> either in a single shared file that many invocations would append to, or 
> >> in a set of files.
> > 
> > I don't think the solution to performance problems in Swift is to hack
> > stuff like that.
> > 
> >> Then collectresults() would run a job that collects all the data when done.
> >>
> >> One approach can be to have collectresults() just run iteratively until 
> >> it has collected a sufficient number of results.  I.e., to have it not 
> >> depend on swift to find out when all the runam() calls have completed. 
> >> That might work.
> > 
> > Don't use Swift then. Seriously. If you don't want to express things in
> > a dataflow oriented way, and are not satisfied with its performance for
> > the given problem, don't use it.
> > 
> > Mihael
> > 
> >> - Mike
> >>
> >>
> >> On 3/25/08 10:00 AM, Mihael Hategan wrote:
> >>> On Tue, 2008-03-25 at 09:46 -0500, Michael Wilde wrote:
> >>>> For the petro-model app Im working on, it would be interesting to run 
> >>>> the parameter sweep in "map reduce" manner, in which each invocation 
> >>>> bites off a portion of the parameter space and processes it, resulting 
> >>>> in a set of result tuples. Each run of the model will produce a set of 
> >>>> tuples, and when all are done, we want to aggregate and plot the tuples.
> >>>>
> >>>> While with batching this is not strictly needed, it would be interesting 
> >>>> to let the model results accumulate on the local filesystem (as in this 
> >>>> case they are small) and collect them either at the end of the run, or 
> >>>> periodically and perhaps asynchronously during the run.
> >>>>
> >>>> To do this, we'd want to write the model invocation as a swift function 
> >>>> with only scalar numeric parameters, and no output.
> >>> That assertion I'm not sure about.
> >>>
> >>>> The question is how to call a zero-returns function in a swift foreach() 
> >>>> loop, and embed that foreach() in a function that doesnt return until 
> >>>> all members of the foreach() have been processed.
> >>> The very notion of "return" as it would appear in a strict language
> >>> doesn't make much sense in Swift, so I'm not quite sure.
> >>>
> >>>> I havent tried to code this yet, because I cant think of a way to 
> >>>> express it in swift, due to the data-dependency semantics.
> >>>>
> >>>> In the example below, I want collectResults() to get invoked after all 
> >>>> the runam() calls complete in doall().
> >>> results = doall();
> >>> collectResults(results);
> >>>
> >>> Mihael
> >>>
> >>>> Anyone have any ideas?
> >>>>
> >>>> This is a low-priority question, just food for thought, as the batched 
> >>>> way of running this parameter sweep should be straightforward and efficient.
> >>>>
> >>>> Mike
> >>>>
> >>>>
> >>>>
> >>>> // Amiga-Mars Parameter Sweep
> >>>>
> >>>> type amout;
> >>>>
> >>>> runam (string id , string p1, string p2) // no ret val
> >>>> {
> >>>>    app { runam3 id p1 p2 ; }
> >>>> }
> >>>>
> >>>> type params {
> >>>>    string id;
> >>>>    string p1;
> >>>>    string p2;
> >>>> };
> >>>>
> >>>> doall(params p[])
> >>>> {
> >>>>    foreach pset in p {
> >>>>      runam(pset.id, pset.p1, pset.p2);
> >>>>    }
> >>>>    // waitTillAllDone();
> >>>>    // want to block here till all above finish,
> >>>>    // but no data to wait on.  any way to
> >>>>    // achieve this???
> >>>> }
> >>>>
> >>>> // Main
> >>>>
> >>>> params p[];
> >>>> p = readdata("paramlist");
> >>>> doall(p);
> >>>> amout amdata <some mapping>;
> >>>> amdata = collectResults();
> >>>>
> >>>> // ^^^ Want collectresults to run AFTER all runam() calls finish
> >>>> //     in the doall() function.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Swift-user mailing list
> >>>> Swift-user at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >>>>
> >>>
> > 
> > 
>