[Swift-user] Swift for large-scale batch jobs?

Ben Clifford benc at hawaga.org.uk
Thu Sep 6 03:16:44 CDT 2007


On Wed, 5 Sep 2007, Michael McCracken wrote:

> Can swift scripts control submission of jobs to a batch queue (local
> or remote), either sequentially linked or independent?

yes.

In terms of linking jobs together, the idea is that those relations are 
expressed by how they share data. So rather than saying 'job A runs before 
job B', you'd say 'job A generates file X, and job B needs file X as an 
input'.

> The experiments I'm working with involve many sequentially-dependent 
> full-system runs (thousands of processors), with large enough data to 
> require transfer to archive or secondary storage between runs.  Has 
> anyone tried something like this in swift?

I don't know what the statistics are for our recent large runs are - 
someone else on this list might comment. What you say is within the scope 
of what we're trying to do, though.

> What I'd like to do, if swift can support experiments like that, is
> build a tool to read swift scripts ( or the appropriate intermediate
> form ) and generate task descriptions for my current tool, which
> simulates large scale experiments to predict total time to solution
> (including queue wait and network transfer). Any advice on which point
> in the swift tool chain to start at would be helpful. I've scanned the
> code, but it is a lot to digest.

A couple of ideas: 

 i) Swift submits jobs through execution through the java cog kit 
execution providers. there are various providers available by default, 
such as one to run programs on the local machine, one to submit the job to 
globus, one to submit directly to the PBS batch queueing system.

Execution providers can be written for other systems. For example, our 
group has a research project called Falkon which does job submision and 
execution; this ties into swift through a specially written execution 
provider.

If you follow 'building swift' instructions on the swift download page, 
the provider source code for the various default providers lives in 
cog/modules/provider-*

Perhaps you could write your own provider which, rather than executing the 
task it is given, instead performs a simulation of that task.

 ii) There are a couple of options, -typecheck and -dryrun, which cause 
normal execution to be replaced by other code at the karajan runtime 
layer. 

The code that is changed here is in: cog/modules/vdsk/libexec/execute-*.k

By default, execute-default.k is used, which deals with actual execution. 
The much simpler execute-typecheck.k and execute-dryrun.k replace that 
execution code with different behaviour.

You could plug in at that point.

In case i) you could write the code in Java, but I think you would have to 
do a good job convincing swift that you really had produced output files 
and the like. In case ii) you'd have to write some code in the Karajan 
language which you are likely less familiar with, but you would have (I 
think) less to do in terms of simulating fake execution of your jobs.

Both of these approaches would use a large part of Swift as-is, so you'd 
be able to re-use a large part of our codebase (all the language parsing, 
etc).

-- 



More information about the Swift-user mailing list