[Swift-devel] Re: [Swift-user] assigning file variables

Michael Wilde wilde at mcs.anl.gov
Thu Feb 26 09:31:27 CST 2009



On 2/26/09 5:34 AM, Ben Clifford wrote:
> On Thu, 26 Feb 2009, Michael Wilde wrote:
> 
>> foreach p, pn in protein {                                                      
>>  file result[][]                                                               
>>    <simple_mapper; prefix=@strcat("output/",p,"/"),suffix=".pdt">;             
>>  iterate i {                                                                   
>>    result[i] = doRound(p,i);                                                   
>>  } until (roundDone(result[i],pn) == 1);                                       
>> }                                                                               
>                                                                                 
>> But, that test was over-simplified, because it didnt handle the fact 
>> that these returns are really 6-file structs, which motivated me to try 
>> ext mapper.
> 
> Assuming the above is working, what breaks when you change file into a 
> 6-member struct?

If I just move to the 6-file struct and leave all else the same, I think 
I can get that to work (I'll be trying this next). But I was trying to 
preserve the current output structure as well, which is not what I'll 
get with the code above.

If you call the loops:

foreach $protein
   iterate each $round
     foreach $simulation

and the array indices result[$round][$simulation]

I wanted:

output/r$round/$protein.{pdt,energy,rmsd,...}

and what the working code I think will give me is:
output/$protein/$round.$simulation.{pdt,energy,rmsd,...}

Thats not bad, but I didn't expect it to be so hard to get a specific 
output structure. Trying to do so was an interesting learning experience 
about the nature of the language.

My conclusion is that the simplest thing that would let me do what I 
want is to stay with the 2-d array structure, and extend the ext mapper 
to be dynamically called once for each output mapping desired, passing 
the ext script the path of the element being mapped.

Another seemingly-simple solution is a generalization of simple_mapper 
that allows a more powerful sprintf-like expression to form the file name.

I wonder if we could actually move *all* our mappers to "ext" 
implementations, and implement them with shell, perl, awk, etc scripts?
This would seem to make testing new ideas and enhancements pretty easy 
(and in fact more user extensible), and would have virtually no 
performance impact on most workflows.

(But dont implement anything yet; I think all this needs more thought 
and discussion before we bounce around on solutions; I just want to 
gather and organize the issues, then have a language review and see 
whats most important based on real app needs).

>> - ext-mapper cant pre-map a dynamic output structure with any dimensions whose
>> size cant be passed to the mapper (I think?)
> 
> yes.

Can this be lifted, as above?

>> - arrays can only be closed via return from functions
> 
> no. Not since r1536 | benc at CI.UCHICAGO.EDU | 2008-01-03
> 
> Since that commit, there is static analysis of source code, and when no 
> more assignments are left to make to an array, its regarded as closed.
> 
> However, in the case of multidimensional arrays, this only happens when 
> the entire top level array has no more assignments at all, not as each 
> subarray happens to become finished.

OK, so in my case, effectively that restriction remains (although I 
appreciate the explanation below). Note that I'm not complaining about 
that restriction in this example. In my case, moving the inner loop into 
a separate procedure made the code read a bit nicer, in fact. But it led 
to bumping into the other restrictions mentioned.

> Static analysis of arrays (and even runtime analysis to discover when no 
> more assignments may happen to a particular piece) is extremely hard 
> because you're allowed to construct your own indicies, and you're allowed 
> to use them in a way that isn't single assignment; I think they're a 
> fairly poor structure to have in SwiftScript the way its going.

By "theyre a fairly poor structure" do you mean user-specified array 
indices? I fear that removing them will take us too  deep into the 
imperative/functional debate, but perhaps we need to keep that 
discussion going.

> For example, in the code fragment:
> 
>>  file result[][]                                                             
>>    <simple_mapper; prefix=@strcat("output/",p,"/"),suffix=".pdt">;           
>>  iterate i {                                                                 
>>    result[i] = doRound(p,i);                                                 
>>  } until (roundDone(result[i],pn) == 1);
> 
> You can look at that and reason that result[i] won't get assigned any more 
> after the iterate statement for that i, but in general that i can be any 
> expression. In the general case, how do you know that result[2] will never 
> get any more assignments?
> 
> There are other ways of doing things, for example Haskell's map, fold and 
> unfold, that I think would be much easier to analyse in this case.
> 
> (hey I get to mention map/reduce here!)
> 
> foreach in that case could look like this map (making up ugly syntax)
> with syntax:  output =  map (range) (code)
> 
>    file results[] = map proteins (p -> { analyse(p); return p})
> 
> This means the same as:
> 
> file results[];
> foreach p,i in results {
>   results[i] = analyse(p);
> }
> 
> What is different is there is now only a single assignment to results. The 
> idea of "array closing" collapses down to "has a single assignment been 
> made?"
> 
> Iterate would look more like an unfoldr:
> 
> output = unfold seed step terminateCondition
> 
> file results[] = unfold initalStep (\prev -> { evaluate(prev); return prev}
> 
> Again, you know when results is fully assigned, because there is now only 
> a single statement assigning to it.

We could discuss if such things could be added as experiments without 
(yet) removing their imperative equivalents. I think that the question 
of the attractiveness of the functional model to distributed and 
parallel programming is a promising research topic. But its not at the 
top of my priority list for the group, which is  usability/productivity, 
platform support, performance, and provenance. I do agree that it could 
lead to these, but its uncertain if we can get as many people to use it, 
and thats where we need to make progress right now.

If you think that going in the direction above could take us to the goal 
quicker than improving the language in its current flavor, I'll listen 
to a plan. My view right now is that swift is on the right track as-is 
and is *very close* to becoming *very* usable/productive. If we can 
identify and make the fewest tweaks we need to iron out current 
difficulties, we'll be on the right track.  And some of those tweaks 
might be to documentation and examples, not even code changes. I do 
realize that some of the *tweaks* might be hard.

> In addition, in both of these, you know exactly when a member of the array 
> has been assigned - for any element of results, in both the map and unfold 
> case, there is exactly one 'iteration' of the map or unfold which can 
> assign to that element, and that is easily known to Swift because it knows 
> how map/unfold work.
> 
> These should be nestable, and in the case of a multidimensional array, you 
> known when any particular sub-array has been assigned, because you know 
> which iteration of the outer map/unfold generates that value.
> 
>> - files and structs with files have limitations on assignments
> 
> yes.
> 
> Its easy to implement struct assignment, for structs where the members 
> have defined assignment semantics already.
> 
> for files, see other thread.

The conclusion of that thread (in my opinion) is that case (ii), what I 
would call "value assignment of file handles", is what we want. (Where 
"file handle" is that "marker type" term that I think the debate is 
still open on).

>> - I cant set a mapping any time I want on any member (field or element) of any
>> structure.
> 
> Yes.

But thats one of the critical things here. I seem to bump into this 
limitation frequently. Does language consistency require these 
limitations on setting mappings, or is it an implementation issue that 
can be lifted? Is it the case that mapping does not affect data flow 
semantics?

>> Here's a related question: Is it the case that if a function returns an array,
>> that array *must* be declared and mapped in the calling function, *not* in the
>> called function?  Eg, I cant dynamically declare and map an array *within* a
>> function and return that array out? (I'll try this in the morning).
> 
> By function, you mean procedure, I think (code referenced without a @ 
> prefix).

I was wondering about that difference - I thought it was inconsistent 
usage in various documents/tutorials. So we should clarify that 
terminology in the user guide. But better to erase the differnce - all 
callable things, I feel, should have the same name - function or 
procedure, and they are either built-in, or user (or eventually library) 
defined.

Whats the semantic difference between the two today?  One distinction I 
see is that built-in things like trace() can take varying arg types, but 
trace has no @ and thus looks more like a user-defined procedure 
syntactically.

In that case, yes - procedure call semantics are that you pass in
> where the output belongs.

Then this dictates that the caller also do the mapping - hence the names 
of the members of an array can not depend on values that will only be 
known in the called function, which actually creates the array members.
(in my case, doRound())

>> This makes me more determined to re-open the discussion on the nature of 
>> object, variables, handles, scope, and lifetime, as it seems to me that 
>> part of the problem comes from an object model thats almost, but not 
>> quite, as regular as it should be.
> 
> yes, its riddled with prototypiness from before; mostly from 
> imperativeness conflicting with data flow dependencies. Its substantially 
> more consistent than it was a few years ago, though.

I agree, it's greatly improved and can do some amazing things.

My guts tell me that if we can address some of the issues I mentioned on 
the nature of vars, handles, and mappings, we're in the home stretch.
I dont think that a more regular approach to object structure and 
lifetime would conflict with the dataflow semantics.

Maybe we should start a new thread on that specific topic, or resume the 
old thread.

For starters (and feel free to move this to a new thread), do you feel 
comfortable with the current model of var, dsHandle, and by-value-like 
assignment?

I would like to see a more Java-like model with a var being a typed 
pointer or scalar value holder, and structs and arrays being dynamic 
objects, and files being special vars with mapping and state.

scalar-var:
   value (int/string/boolean/float)
   state (set/unset)

object-var
   pointer to array or struct
   state (set/unset)

file-var
   mapping
   state (set/unset)

I have to confess that the above is pretty much the way I *thought* 
Swift worked until we tried to write the latest paper, and had the 
ensuing email discussions. Then I realized that (even after the 
discussions) I still dont understand the model.

I dont feel that we have yet adequately described the model, neither for 
a CS paper *nor* for the programmer.  I think that a good start is to 
write a data model description (in the user guide, in a detailed "skip 
this on first reading" section, that specifies the data model in 
language-reference-specification fashion).

 From there we can discuss any proposed changes to either terminology 
and/or implementation.

I *think* that with the model above, one should be able to more flexibly 
set mappings - in fact, set them from swift code, with some kind of 
assignment (like f=<> expression; or f<expression>).





More information about the Swift-devel mailing list