[Swift-devel] swift with input and output sandboxes instead of a shared site filesystem

Wed Apr 22 07:52:23 CDT 2009

one clarification: when I said "pull input files to the worker node FS 
rather than push them multihop" the multi-hop was refering to the 
current system of source dir to work dir to job dir, not your new system 
below.

- mike

On 4/22/09 7:42 AM, Michael Wilde wrote:
> This sounds good, Ben. I only skimmed, need to read, and will comment 
> later.
> 
> We should compare to the experiments on collective I/O (currently named 
> "many-task data management) that Allan, Zhao, Ian, and I are doing.
> 
> The approach involves some things that would apply in the two 
> environments you mention as well:
> 
> - pull input files to the worker node FS rather than push them multihop
> - batch output file up into tarballs and expand them back on their 
> target filesystem (on the submit host for now)
> 
> It also involves some things that apply more to large clusters, but may 
> be generalizable to generic grid environments:
> 
> - broadcast common files used by many jobs from the submit host to the 
> worker nodes
> - use "intermediate" filesystems striped across cluster nodes, rather 
> than local filesystems on cluster nodes, where this is more efficient or 
> is needed
> - have the worker nodes selectively access files from local, 
> intermediate, or global storage, depending on where the submit host 
> workflow decided to place them
> - keep a catalog in the cluster of what files are on what local host, 
> and a protocol to transfer them from where they were produced to where 
> they need to be consumed (this feature is like data diffusion, but needs 
> more thought and experience to determine how useful it is; not may 
> workflows need it).
> 
> This is being done so far in discussion with me, Ioan, and Ian, but we'd 
> like to get you and Mihael and anyone else interested to join in; Kamil 
> Iskra and Justin Wozniak from the MCS ZeptoOS and Radix groups are 
> involved as well).
> 
> We should use this thread to discuss your IO strategy below first, 
> before we involve the MTDM experiments, but one common thread seems to 
> be mastersing the changes in Swift data management that allow for us to 
> explore these new data management modes.
> 
> If you recall, we've had discussions in the past on having something 
> like "pluggable data management strategies" that allowed a given script 
> to be executed in different environments with different strategies - 
> either globallu set or set by site.
> 
> I'm offline a lot untill Monday with a proposal deadline, and hope to 
> comment and rejoin the discussion by then or shorty after.
> 
> - Mike
> 
> 
> On 4/22/09 7:12 AM, Ben Clifford wrote:
>> I implemented an extremely poor quality prototype to try to get my 
>> head around some of the execution semantics when running through:
>>
>>   i) the gLite workload management system (hereafter, WMS) as used in 
>> the      South African Natioanl Grid (my short-term interest) and in 
>> EGEE (my      longer term interest)
>>
>>  ii) condor used as an LRM to manage nodes which do not have a shared 
>>      filesystem, without any "grid stuff" involved
>>
>> In the case of the WMS, it is a goal to have the WMS perform site 
>> selection, rather than submitting clients (such as Swift). I don't 
>> particularly agree with this, but there it is.
>>
>> In the case of condor-with-no-shared-fs, one of the basic requirements 
>> of a Swift site is violated - that of an easily accessible shared file 
>> system.
>>
>> Both the WMS and condor provide an alternative to Swift's file 
>> management model; and both of their approaches look similar.
>>
>> In a job submission, one specifies the files to be staged into an 
>> arbitrary working directory before execution, and the files to be 
>> staged out after execution.
>>
>> My prototype is intended to get practical experience interfacing Swift 
>> to a job submission with those semantics.
>>
>> What I have done in my implementation is rip out almost the entirety 
>> of the execute2/site file cache/wrapper.sh layers, and replace it with 
>> a callout to a user-specified shell script. The shell script is passed 
>> the submit side paths of input files and of output files, and the 
>> commandline.
>>
>> The shell script is then entirely responsible for causing the job to 
>> run somewhere and for doing appropriate input and output staging.
>>
>> Into this shell interface, I then have two scripts, one for 
>> sagrid/glite and one for condor-with-no-shared-fs.
>>
>> They are similar to each other, differing only in the syntax of the 
>> submission commands and files.
>>
>> These scripts create a single input tarball, create a job submission 
>> file, submit to the appropriate submit command, hang round polling for 
>> status until the job is finished, and unpack an output tarball. 
>> Tarballs are used rather than explicitly listing each input and output 
>> file for two reasons: i) if an output file is missing (perhaps due to 
>> application failure) I would like the job submission to still return 
>> what it has (most especially remote log files). As long as a tarball 
>> is made with *something*, this works. ii) condor (and perhaps WMS) 
>> apparently cannot handle directory hierarchies in their 
>> stagein/stageout parameters.
>>
>> I have tested on the SAgrid testing environment (for WMS) and this 
>> works (although quite slowly, as the WMS reports job status changes 
>> quite slowly); and on a condor installation on gwynn.bsd.uchicago.edu 
>> (this has a shared filesystem, so is not a totally satisfactory test). 
>> I also sent this to Mats to test in his environment (as a project he 
>> has was my immediate motivation for the condor side of this).
>>
>> This prototype approach loses a huge chunk of Swift execution-side 
>> functionality such as replication, clustering, coasters (deliberately 
>> - I was targetting getting SwiftScript programs running, rather than 
>> getting a decent integration with the interesting execution stuff we 
>> have made).
>>
>> As such, it is entirely inappropriate for production (or even most 
>> experimental) use.
>>
>> However, it has given me another perspective on submitting jobs to the 
>> above two environments.
>>
>> For condor:
>>
>> The zipped input/output sandbox approach seems to work nicely.
>>
>> To mould this into something more in tune with what is in Swift now, I 
>> think is not crazy hard - the input and output staging parts of 
>> execute2 would need to change into something that creates/unpacks a 
>> tarball and appropriately modifies the job description so that when it 
>> is run by the existing execution mechanism, the tarballs get carried 
>> along. (to test if you bothered reading this, if you paste me the 
>> random string H14n$=N:t)Z you get a free beer)
>>
>> As specified above, that approach does not work with clustering or 
>> with coasters, though both could be modified so as to support such 
>> (for example, clustering could be made to merge all stagein and 
>> stageout listings for jobs; and coasters could be given a different 
>> interface to the existing coaster file transfer mechanism). It might 
>> be that coasters and clusters are not particularly desired in this 
>> environment, though.
>>
>> For glite execution - the big loss here I think is coasters, because 
>> its a very spread out grid environment. So with this approach, 
>> applications which work well without coasters will probably work well; 
>> but applications which are reliant on coasters for their performance 
>> will work as dismally as when run without coasters in any other grid 
>> environment. I can think of various modifications, similar to those 
>> mentioned in the condor section above, to try to make them work 
>> through this submission system, but it might be that a totally 
>> different approach to my above implementation is warranted for coaster 
>> based execution on glite, with more explicit specification of which 
>> sites to run on, rather than allowing the WMS any choice, and only 
>> running on sites which do have a shared filesystem available.
>>
>> I think in the short term, my interest is in getting this stuff more 
>> closely integrated without focusing too much on coasters and clusters.
>>
>> Comments.
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel