[Swift-devel] concurrent mapper and restart

Ben Clifford benc at hawaga.org.uk
Fri May 23 17:05:37 CDT 2008


For simple tests, files mapped through the concurrent mapper do get 
handled apparently correctly by the present filename based restart 
mechanism.

This is contradictory to bug 107 comment 5:

> This fixes the latest problem, but will not recognize as done variables 
> mapped by the concurrent mapper.

which I interpret to mean that concurrently mapped values will be 
recomputed unnecessarily after a restart [thus leading to inefficiency 
(perhaps to the extent that the workflow can never finish in a real 
failure-prone environment)]

However, I'm more worried that different restarts of a workflow will have 
files mapped differently, such that sometimes a file will be mapped to a 
filename that was previously used for a different file in an earlier 
restart (or initial run).

That would lead to a situation where workflows might appear to complete, 
but would actually be jumbling up intermediate datafiles and delivering 
incorrect output results, which is extremely bad.

I haven't tried this to see if I can make it happen; nor am I sure I can 
(I think its probably very sensitive to the way in which restarts interact 
with foreach loops to create threads - if a restart causes threads to be 
created in a different order in a foreach loop, then I think this problem 
exists).

If this really is a problem, there are two approaches to avoiding this 
more serious problem that spring to mind:

i) make concurrent filenames different each restart (with a per-restart 
rather than per-kml-compilation unique identifier); this would change the 
problem to the efficiency-reducing problem - unpleasant but not producing 
incorrect results.

ii) the same lexical/runtime scope ID stuff that I talked about yesterday 
for identifying variables might apply here. Instead of using a karajan 
thread identifier on the end of a concurrent variable which is potentially 
random, use a SwiftScript level equivalent - the SwiftScript level scope 
identifiers that I talked about yesterday that I think are recreatable no 
matter the order in which Karajan evaluates things. That would give 
between-run repeatable mappings. Which in turn would mean filename based 
restarts are perhaps not so bad anymore in general.

-- 



More information about the Swift-devel mailing list