[Swift-devel] Re: How to wait on functions that return no data?
Mihael Hategan
hategan at mcs.anl.gov
Tue Mar 25 12:41:54 CDT 2008
On Tue, 2008-03-25 at 11:36 -0500, Michael Wilde wrote:
> Related to this virtual idea, is it possible to add language semantics
> where a function defined as returning an object can decide to return
> "null", in which case its deemed to be complete but decided no to
> generate a result?
Yes. What I mentioned would be similar.
>
> So a foreach that calls 1000 functions could complete when 10 return
> files and 990 return null?
That we don't have.
>
> I'm moving this discussion to swift-devel by the way, as its now talking
> about future possibilities.
>
> From a pure language point of view, we should permit the return of data
> that can be grouped (batched) into files files in arbitrary chunks,
> determined and optimized by the implementation. Map-reduce tuples seem
> to work well for this model, and it seems that Swift could encompass it
> with minimal semantic change to the current language.
>
> This petro model app seems to be a good illustration of the use case.
Before the petro model, we had the Aphasia model which required pretty
much the same thing. I.e. for some inputs there was no output.
> The function the way Im calling it is basically z = f (x,y) where x,y,z
> are floats.
>
> To treat it as tuples, the return would be (x,y,z) = f(x,y) - ie the
> return is a triple, so that the reduce step simply merges all the output
> tuples and plots them. (example plot below)
That's a 2d array. t[x][y] = f(x, y); Or even a simple list of arrays
(which we would simulate with an array).
Overall, this does not deal with "missing" elements. That's where user
exceptions, which we spoke of before, would come in:
try {
t[x][y] = f(x, y);
}
catch (MissingValue) {
//discard
}
Mihael
>
> - Mike
>
>
> This sweep varied the Low-S Light LL and Med-S Light LL production
> yields for Diesel fuel and plotted the effect on the Discount Investment:
>
> It shows a sweep on $2 and $3 in this line of adj_crude.txt.
>
> > -3 Prod_Yields
> > ...
> > 3 Diesel $2 $3 $4 $5 $6 $7
>
> The production yield is plotted in:
>
> http://www.ci.uchicago.edu/~wilde/psweep1.png
>
>
>
> On 3/25/08 11:01 AM, Mihael Hategan wrote:
> > I think there is some confusion here between language and
> > implementation.
> >
> > The language can express the problem just fine. That's why I'm saying
> > you should change doall() to return an array with all the outputs.
> >
> > It's the implementation that behaves in a very poor way if the
> > applications are very fine grained. You seem to be trying to solve the
> > problem by:
> > 1. Doing some magic with the way files are moved around
> > 2. Convincing Swift that it should work without knowing about data
> > dependencies, despite the fact that it only works properly if it knows
> > about all data dependencies. By definition.
> >
> > There is some middle ground here. It may be possible to let Swift know
> > what the data dependencies are, but also prevent it from dealing with
> > certain files, by marking them as "virtual" (or whatever the term).
> >
> > Mihael
> >
> > On Tue, 2008-03-25 at 10:45 -0500, Michael Wilde wrote:
> >> Your view has merits in terms of language purity, but I disagree with it.
> >>
> >> This was posed as an academic question, and I think its interesting to
> >> discuss.
> >>
> >> The point here is that there's an application that could best be done by
> >> batching up its output, and in fact perhaps by using the map-reduce
> >> representation of tuples for that output.
> >>
> >> Its still driven by dataflow and data dependencies, just not the
> >> simplistic lock-step dependencies that swift implements today.
> >>
> >> For example, one way to address the problem is to say that batching of
> >> function calls, the way swift does today, is helpful but ignores the
> >> problem that small tasks often have small data inputs and outputs, and
> >> that these should be batched along with the job execution.
> >>
> >> That would leave swift language semantics unchanged, but the
> >> implementation would get more efficient and could handle finer-grained
> >> tasks.
> >>
> >> An even more efficient and interesting approach, fully in keeping with
> >> the language as it stands today, would be to allow tuples to be
> >> expressed as inputs and outputs, and to have swift efficiently and
> >> automatically route (and batch) tuples in and out of jobs.
> >>
> >> So I view what I was asking for here as a prototype or exploration of
> >> that direction. It would be good to test the performance of an
> >> implementation that streamed output tuples into a subsequent ("reduce")
> >> stage of processing, before we even consider what the language and/or
> >> implementation would need to do for such a case.
> >>
> >>
> >> On 3/25/08 10:23 AM, Mihael Hategan wrote:
> >> ...
> >> > Don't use Swift then. Seriously. If you don't want to express things in
> >> > a dataflow oriented way, and are not satisfied with its performance for
> >> > the given problem, don't use it.
> >>
> >> I want to express things as dataflow, with high performance, in Swift.
> >>
> >> Mike
> >>
> >>
> >> On 3/25/08 10:23 AM, Mihael Hategan wrote:
> >>> On Tue, 2008-03-25 at 10:14 -0500, Michael Wilde wrote:
> >>>>>> In the example below, I want collectResults() to get invoked after all
> >>>> >> the runam() calls complete in doall().
> >>>> >
> >>>> > results = doall();
> >>>> > collectResults(results);
> >>>> >
> >>>> > Mihael
> >>>>
> >>>> But thats the problem: doall() does not in this example return results.
> >>> Then it should be fixed.
> >>>
> >>>> If it would return an artificial result, how would we get such a return
> >>>> to wait until all the runam() calls made within the freach() have completed?
> >>>>
> >>>> Each of the runam() call runs a small model, and in this proposed
> >>>> scenario would leave those results on a local disk for later collection,
> >>>> either in a single shared file that many invocations would append to, or
> >>>> in a set of files.
> >>> I don't think the solution to performance problems in Swift is to hack
> >>> stuff like that.
> >>>
> >>>> Then collectresults() would run a job that collects all the data when done.
> >>>>
> >>>> One approach can be to have collectresults() just run iteratively until
> >>>> it has collected a sufficient number of results. I.e., to have it not
> >>>> depend on swift to find out when all the runam() calls have completed.
> >>>> That might work.
> >>> Don't use Swift then. Seriously. If you don't want to express things in
> >>> a dataflow oriented way, and are not satisfied with its performance for
> >>> the given problem, don't use it.
> >>>
> >>> Mihael
> >>>
> >>>> - Mike
> >>>>
> >>>>
> >>>> On 3/25/08 10:00 AM, Mihael Hategan wrote:
> >>>>> On Tue, 2008-03-25 at 09:46 -0500, Michael Wilde wrote:
> >>>>>> For the petro-model app Im working on, it would be interesting to run
> >>>>>> the parameter sweep in "map reduce" manner, in which each invocation
> >>>>>> bites off a portion of the parameter space and processes it, resulting
> >>>>>> in a set of result tuples. Each run of the model will produce a set of
> >>>>>> tuples, and when all are done, we want to aggregate and plot the tuples.
> >>>>>>
> >>>>>> While with batching this is not strictly needed, it would be interesting
> >>>>>> to let the model results accumulate on the local filesystem (as in this
> >>>>>> case they are small) and collect them either at the end of the run, or
> >>>>>> periodically and perhaps asynchronously during the run.
> >>>>>>
> >>>>>> To do this, we'd want to write the model invocation as a swift function
> >>>>>> with only scalar numeric parameters, and no output.
> >>>>> That assertion I'm not sure about.
> >>>>>
> >>>>>> The question is how to call a zero-returns function in a swift foreach()
> >>>>>> loop, and embed that foreach() in a function that doesnt return until
> >>>>>> all members of the foreach() have been processed.
> >>>>> The very notion of "return" as it would appear in a strict language
> >>>>> doesn't make much sense in Swift, so I'm not quite sure.
> >>>>>
> >>>>>> I havent tried to code this yet, because I cant think of a way to
> >>>>>> express it in swift, due to the data-dependency semantics.
> >>>>>>
> >>>>>> In the example below, I want collectResults() to get invoked after all
> >>>>>> the runam() calls complete in doall().
> >>>>> results = doall();
> >>>>> collectResults(results);
> >>>>>
> >>>>> Mihael
> >>>>>
> >>>>>> Anyone have any ideas?
> >>>>>>
> >>>>>> This is a low-priority question, just food for thought, as the batched
> >>>>>> way of running this parameter sweep should be straightforward and efficient.
> >>>>>>
> >>>>>> Mike
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> // Amiga-Mars Parameter Sweep
> >>>>>>
> >>>>>> type amout;
> >>>>>>
> >>>>>> runam (string id , string p1, string p2) // no ret val
> >>>>>> {
> >>>>>> app { runam3 id p1 p2 ; }
> >>>>>> }
> >>>>>>
> >>>>>> type params {
> >>>>>> string id;
> >>>>>> string p1;
> >>>>>> string p2;
> >>>>>> };
> >>>>>>
> >>>>>> doall(params p[])
> >>>>>> {
> >>>>>> foreach pset in p {
> >>>>>> runam(pset.id, pset.p1, pset.p2);
> >>>>>> }
> >>>>>> // waitTillAllDone();
> >>>>>> // want to block here till all above finish,
> >>>>>> // but no data to wait on. any way to
> >>>>>> // achieve this???
> >>>>>> }
> >>>>>>
> >>>>>> // Main
> >>>>>>
> >>>>>> params p[];
> >>>>>> p = readdata("paramlist");
> >>>>>> doall(p);
> >>>>>> amout amdata <some mapping>;
> >>>>>> amdata = collectResults();
> >>>>>>
> >>>>>> // ^^^ Want collectresults to run AFTER all runam() calls finish
> >>>>>> // in the doall() function.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Swift-user mailing list
> >>>>>> Swift-user at ci.uchicago.edu
> >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >>>>>>
> >>>
> >
> >
>
More information about the Swift-devel
mailing list