[Swift-devel] Re: [Swift-user] assigning file variables
Michael Wilde
wilde at mcs.anl.gov
Thu Feb 26 09:31:27 CST 2009
On 2/26/09 5:34 AM, Ben Clifford wrote:
> On Thu, 26 Feb 2009, Michael Wilde wrote:
>
>> foreach p, pn in protein {
>> file result[][]
>> <simple_mapper; prefix=@strcat("output/",p,"/"),suffix=".pdt">;
>> iterate i {
>> result[i] = doRound(p,i);
>> } until (roundDone(result[i],pn) == 1);
>> }
>
>> But, that test was over-simplified, because it didnt handle the fact
>> that these returns are really 6-file structs, which motivated me to try
>> ext mapper.
>
> Assuming the above is working, what breaks when you change file into a
> 6-member struct?
If I just move to the 6-file struct and leave all else the same, I think
I can get that to work (I'll be trying this next). But I was trying to
preserve the current output structure as well, which is not what I'll
get with the code above.
If you call the loops:
foreach $protein
iterate each $round
foreach $simulation
and the array indices result[$round][$simulation]
I wanted:
output/r$round/$protein.{pdt,energy,rmsd,...}
and what the working code I think will give me is:
output/$protein/$round.$simulation.{pdt,energy,rmsd,...}
Thats not bad, but I didn't expect it to be so hard to get a specific
output structure. Trying to do so was an interesting learning experience
about the nature of the language.
My conclusion is that the simplest thing that would let me do what I
want is to stay with the 2-d array structure, and extend the ext mapper
to be dynamically called once for each output mapping desired, passing
the ext script the path of the element being mapped.
Another seemingly-simple solution is a generalization of simple_mapper
that allows a more powerful sprintf-like expression to form the file name.
I wonder if we could actually move *all* our mappers to "ext"
implementations, and implement them with shell, perl, awk, etc scripts?
This would seem to make testing new ideas and enhancements pretty easy
(and in fact more user extensible), and would have virtually no
performance impact on most workflows.
(But dont implement anything yet; I think all this needs more thought
and discussion before we bounce around on solutions; I just want to
gather and organize the issues, then have a language review and see
whats most important based on real app needs).
>> - ext-mapper cant pre-map a dynamic output structure with any dimensions whose
>> size cant be passed to the mapper (I think?)
>
> yes.
Can this be lifted, as above?
>> - arrays can only be closed via return from functions
>
> no. Not since r1536 | benc at CI.UCHICAGO.EDU | 2008-01-03
>
> Since that commit, there is static analysis of source code, and when no
> more assignments are left to make to an array, its regarded as closed.
>
> However, in the case of multidimensional arrays, this only happens when
> the entire top level array has no more assignments at all, not as each
> subarray happens to become finished.
OK, so in my case, effectively that restriction remains (although I
appreciate the explanation below). Note that I'm not complaining about
that restriction in this example. In my case, moving the inner loop into
a separate procedure made the code read a bit nicer, in fact. But it led
to bumping into the other restrictions mentioned.
> Static analysis of arrays (and even runtime analysis to discover when no
> more assignments may happen to a particular piece) is extremely hard
> because you're allowed to construct your own indicies, and you're allowed
> to use them in a way that isn't single assignment; I think they're a
> fairly poor structure to have in SwiftScript the way its going.
By "theyre a fairly poor structure" do you mean user-specified array
indices? I fear that removing them will take us too deep into the
imperative/functional debate, but perhaps we need to keep that
discussion going.
> For example, in the code fragment:
>
>> file result[][]
>> <simple_mapper; prefix=@strcat("output/",p,"/"),suffix=".pdt">;
>> iterate i {
>> result[i] = doRound(p,i);
>> } until (roundDone(result[i],pn) == 1);
>
> You can look at that and reason that result[i] won't get assigned any more
> after the iterate statement for that i, but in general that i can be any
> expression. In the general case, how do you know that result[2] will never
> get any more assignments?
>
> There are other ways of doing things, for example Haskell's map, fold and
> unfold, that I think would be much easier to analyse in this case.
>
> (hey I get to mention map/reduce here!)
>
> foreach in that case could look like this map (making up ugly syntax)
> with syntax: output = map (range) (code)
>
> file results[] = map proteins (p -> { analyse(p); return p})
>
> This means the same as:
>
> file results[];
> foreach p,i in results {
> results[i] = analyse(p);
> }
>
> What is different is there is now only a single assignment to results. The
> idea of "array closing" collapses down to "has a single assignment been
> made?"
>
> Iterate would look more like an unfoldr:
>
> output = unfold seed step terminateCondition
>
> file results[] = unfold initalStep (\prev -> { evaluate(prev); return prev}
>
> Again, you know when results is fully assigned, because there is now only
> a single statement assigning to it.
We could discuss if such things could be added as experiments without
(yet) removing their imperative equivalents. I think that the question
of the attractiveness of the functional model to distributed and
parallel programming is a promising research topic. But its not at the
top of my priority list for the group, which is usability/productivity,
platform support, performance, and provenance. I do agree that it could
lead to these, but its uncertain if we can get as many people to use it,
and thats where we need to make progress right now.
If you think that going in the direction above could take us to the goal
quicker than improving the language in its current flavor, I'll listen
to a plan. My view right now is that swift is on the right track as-is
and is *very close* to becoming *very* usable/productive. If we can
identify and make the fewest tweaks we need to iron out current
difficulties, we'll be on the right track. And some of those tweaks
might be to documentation and examples, not even code changes. I do
realize that some of the *tweaks* might be hard.
> In addition, in both of these, you know exactly when a member of the array
> has been assigned - for any element of results, in both the map and unfold
> case, there is exactly one 'iteration' of the map or unfold which can
> assign to that element, and that is easily known to Swift because it knows
> how map/unfold work.
>
> These should be nestable, and in the case of a multidimensional array, you
> known when any particular sub-array has been assigned, because you know
> which iteration of the outer map/unfold generates that value.
>
>> - files and structs with files have limitations on assignments
>
> yes.
>
> Its easy to implement struct assignment, for structs where the members
> have defined assignment semantics already.
>
> for files, see other thread.
The conclusion of that thread (in my opinion) is that case (ii), what I
would call "value assignment of file handles", is what we want. (Where
"file handle" is that "marker type" term that I think the debate is
still open on).
>> - I cant set a mapping any time I want on any member (field or element) of any
>> structure.
>
> Yes.
But thats one of the critical things here. I seem to bump into this
limitation frequently. Does language consistency require these
limitations on setting mappings, or is it an implementation issue that
can be lifted? Is it the case that mapping does not affect data flow
semantics?
>> Here's a related question: Is it the case that if a function returns an array,
>> that array *must* be declared and mapped in the calling function, *not* in the
>> called function? Eg, I cant dynamically declare and map an array *within* a
>> function and return that array out? (I'll try this in the morning).
>
> By function, you mean procedure, I think (code referenced without a @
> prefix).
I was wondering about that difference - I thought it was inconsistent
usage in various documents/tutorials. So we should clarify that
terminology in the user guide. But better to erase the differnce - all
callable things, I feel, should have the same name - function or
procedure, and they are either built-in, or user (or eventually library)
defined.
Whats the semantic difference between the two today? One distinction I
see is that built-in things like trace() can take varying arg types, but
trace has no @ and thus looks more like a user-defined procedure
syntactically.
In that case, yes - procedure call semantics are that you pass in
> where the output belongs.
Then this dictates that the caller also do the mapping - hence the names
of the members of an array can not depend on values that will only be
known in the called function, which actually creates the array members.
(in my case, doRound())
>> This makes me more determined to re-open the discussion on the nature of
>> object, variables, handles, scope, and lifetime, as it seems to me that
>> part of the problem comes from an object model thats almost, but not
>> quite, as regular as it should be.
>
> yes, its riddled with prototypiness from before; mostly from
> imperativeness conflicting with data flow dependencies. Its substantially
> more consistent than it was a few years ago, though.
I agree, it's greatly improved and can do some amazing things.
My guts tell me that if we can address some of the issues I mentioned on
the nature of vars, handles, and mappings, we're in the home stretch.
I dont think that a more regular approach to object structure and
lifetime would conflict with the dataflow semantics.
Maybe we should start a new thread on that specific topic, or resume the
old thread.
For starters (and feel free to move this to a new thread), do you feel
comfortable with the current model of var, dsHandle, and by-value-like
assignment?
I would like to see a more Java-like model with a var being a typed
pointer or scalar value holder, and structs and arrays being dynamic
objects, and files being special vars with mapping and state.
scalar-var:
value (int/string/boolean/float)
state (set/unset)
object-var
pointer to array or struct
state (set/unset)
file-var
mapping
state (set/unset)
I have to confess that the above is pretty much the way I *thought*
Swift worked until we tried to write the latest paper, and had the
ensuing email discussions. Then I realized that (even after the
discussions) I still dont understand the model.
I dont feel that we have yet adequately described the model, neither for
a CS paper *nor* for the programmer. I think that a good start is to
write a data model description (in the user guide, in a detailed "skip
this on first reading" section, that specifies the data model in
language-reference-specification fashion).
From there we can discuss any proposed changes to either terminology
and/or implementation.
I *think* that with the model above, one should be able to more flexibly
set mappings - in fact, set them from swift code, with some kind of
assignment (like f=<> expression; or f<expression>).
More information about the Swift-devel
mailing list