[Swift-devel] this feels almost swiftable but not.
Ben Clifford
benc at hawaga.org.uk
Tue Jul 5 09:03:36 CDT 2011
here's something i've been doing with ALCF machines. the workflow feels almost right for swift but not quite. I'm not interested in implementing this myself, really, because I finish working on this project in 6 weeks time, but I thought it would be interesting food for thought.
there's a big physics simulation app that runs on intrepid, the big bluegene. every so often it outputs a dump into its run directory. A dump takes the form of a directory, called runname-<stepnumber>.d/ where stepnumber is a number that increases as the simulation progresses (inside the simulation, it increases by one each step, but dumps are made somewhat irregularly so it goes up with each dump but by a different amount each time).
swift related note: dumps are directories. they contain some files, and the files are named according to the enclosing directory, and the enclosing directory is named according to the name of the job. that's unusual compared to how swift does things.
Once dumps are made, I want to post process them to make plots, in several different ways to generate 2d plots, 3d plots, and (4d!) movies.
Each 2d and 3d plot output comes from a single dump file. The plots are made by invoking the application again, this time on Eureka - the associated viz cluster.
Eureka shares a GPFS file system with intrepid. The dump directories are large: for example, one I just looked at is 184gb.
So: these directories pretty much cannot be copied. They can be symlinked easily enough, though.
The run progresses over many days/weeks. I don't want to wait for "all" the output dumps to be ready before post processing them. Indeed, the definition of "all" is maybe not known programmatically but is decided by looking at the output dumps and deciding (in-head, not in-computer) that we've seen enough.
Swift has almost but not quite enough behaviour at the moment to deal with mapping a data set that increases in size over time like this (rather than being atomic wrt a single swift run), though its been talked about on swift devel. I want to have swift generate the plots for a dump as soon as the dump is made (where "as soon as" might be hours later, given the timescales involved, but not weeks later).
I don't want to regenerate plots that have already been plotted.
Except that sometimes I want to make a new set of plots with different parameters (for example, different viewing angle, plotting different variables, different parameters to viz method).
Sometimes I will want both the older and the newer plots to continue being generated over time. Sometimes I will be done with the older (or the newer) one, and want to stop one of them from being generated any more.
The 3d plot happens as two stage: first the simulation application is invoked on eureka to produce a directory (per dump step) of .silo files. this end up occupying megabytes (so much smaller than the raw dumps but still not trivial to copy around. secondly, a third party application, VisIt, is used to render those silo files into PNGs.
When I talk about generating different 3d plots above - the silo files do not change. The different plots are specified by parameters to this visit step.
That interacts awkwardly with i) silo files being stored as collections of files in directories, where the silo dump is the directory itself; and ii) how ongoing runs happen: a dump (directory) should turn into a silo (directory) only once, soon after the dump itself is created.
There's an additional interesting thing with VisIt: it allocates its own workers, and this startup time is non trivial wrt the cost of rendering a single silo dump. VisIt has a python API and I feed in a list of silo dumps that should be rendered, and these are then all rendered inside a single run of VisIt, resulting in some of the silo dumps turning into output PNGs; and some not because I ran out of wall time. At which point, I manually trim the list and submit again. What's interesting there is that I'm launching VisIt with a specification of things to work on, knowing that the specific visit run will overall fail - because I'm interested in the more granular production of individual plot PNG file. How that interacts with Swift seems interesting too: there's an "unreliable worker" that I feed in multiple tasks to, and those individual tasks may succeed or fail and can be restarted/rerun independent of the worker. There would also be some scope for high level parallelisation. I am using 30 worker nodes per visit run which means I could have 3 sets of these going at any one time on Eureka if it was lightly loaded.
Like I said at the top, I'm not interested in implementing this in Swift, but I think it gives some interesting pokes at the envelope of what can/cannot be done with swift.
Ben
More information about the Swift-devel
mailing list