[Swift-devel] swift with input and output sandboxes instead of a shared site filesystem

Wed Apr 22 07:12:28 CDT 2009

I implemented an extremely poor quality prototype to try to get my head 
around some of the execution semantics when running through:

  i) the gLite workload management system (hereafter, WMS) as used in the 
     South African Natioanl Grid (my short-term interest) and in EGEE (my 
     longer term interest)

 ii) condor used as an LRM to manage nodes which do not have a shared 
     filesystem, without any "grid stuff" involved

In the case of the WMS, it is a goal to have the WMS perform site 
selection, rather than submitting clients (such as Swift). I don't 
particularly agree with this, but there it is.

In the case of condor-with-no-shared-fs, one of the basic requirements of 
a Swift site is violated - that of an easily accessible shared file 
system.

Both the WMS and condor provide an alternative to Swift's file management 
model; and both of their approaches look similar.

In a job submission, one specifies the files to be staged into an 
arbitrary working directory before execution, and the files to be staged 
out after execution.

My prototype is intended to get practical experience interfacing Swift to 
a job submission with those semantics.

What I have done in my implementation is rip out almost the entirety of 
the execute2/site file cache/wrapper.sh layers, and replace it with a 
callout to a user-specified shell script. The shell script is passed the 
submit side paths of input files and of output files, and the commandline.

The shell script is then entirely responsible for causing the job to run 
somewhere and for doing appropriate input and output staging.

Into this shell interface, I then have two scripts, one for sagrid/glite 
and one for condor-with-no-shared-fs.

They are similar to each other, differing only in the syntax of the 
submission commands and files.

These scripts create a single input tarball, create a job submission file, 
submit to the appropriate submit command, hang round polling for status 
until the job is finished, and unpack an output tarball. Tarballs are used 
rather than explicitly listing each input and output file for two reasons: 
i) if an output file is missing (perhaps due to application failure) I 
would like the job submission to still return what it has (most especially 
remote log files). As long as a tarball is made with *something*, this 
works. ii) condor (and perhaps WMS) apparently cannot handle directory 
hierarchies in their stagein/stageout parameters.

I have tested on the SAgrid testing environment (for WMS) and this works 
(although quite slowly, as the WMS reports job status changes quite 
slowly); and on a condor installation on gwynn.bsd.uchicago.edu (this has 
a shared filesystem, so is not a totally satisfactory test). I also sent 
this to Mats to test in his environment (as a project he has was my 
immediate motivation for the condor side of this).

This prototype approach loses a huge chunk of Swift execution-side 
functionality such as replication, clustering, coasters (deliberately - I 
was targetting getting SwiftScript programs running, rather than getting a 
decent integration with the interesting execution stuff we have made).

As such, it is entirely inappropriate for production (or even most 
experimental) use.

However, it has given me another perspective on submitting jobs to the 
above two environments.

For condor:

The zipped input/output sandbox approach seems to work nicely.

To mould this into something more in tune with what is in Swift now, I 
think is not crazy hard - the input and output staging parts of execute2 
would need to change into something that creates/unpacks a tarball and 
appropriately modifies the job description so that when it is run by the 
existing execution mechanism, the tarballs get carried along. (to test if 
you bothered reading this, if you paste me the random string H14n$=N:t)Z 
you get a free beer)

As specified above, that approach does not work with clustering or with 
coasters, though both could be modified so as to support such (for 
example, clustering could be made to merge all stagein and stageout 
listings for jobs; and coasters could be given a different interface to 
the existing coaster file transfer mechanism). It might be that coasters 
and clusters are not particularly desired in this environment, though.

For glite execution - the big loss here I think is coasters, because its a 
very spread out grid environment. So with this approach, applications 
which work well without coasters will probably work well; but applications 
which are reliant on coasters for their performance will work as dismally 
as when run without coasters in any other grid environment. I can think of 
various modifications, similar to those mentioned in the condor section 
above, to try to make them work through this submission system, but it 
might be that a totally different approach to my above implementation is 
warranted for coaster based execution on glite, with more explicit 
specification of which sites to run on, rather than allowing the WMS any 
choice, and only running on sites which do have a shared filesystem 
available.

I think in the short term, my interest is in getting this stuff more 
closely integrated without focusing too much on coasters and clusters.

Comments.

--