[Swift-devel] swift with input and output sandboxes instead of a shared site filesystem
Michael Wilde
wilde at mcs.anl.gov
Wed Apr 22 07:52:23 CDT 2009
one clarification: when I said "pull input files to the worker node FS
rather than push them multihop" the multi-hop was refering to the
current system of source dir to work dir to job dir, not your new system
below.
- mike
On 4/22/09 7:42 AM, Michael Wilde wrote:
> This sounds good, Ben. I only skimmed, need to read, and will comment
> later.
>
> We should compare to the experiments on collective I/O (currently named
> "many-task data management) that Allan, Zhao, Ian, and I are doing.
>
> The approach involves some things that would apply in the two
> environments you mention as well:
>
> - pull input files to the worker node FS rather than push them multihop
> - batch output file up into tarballs and expand them back on their
> target filesystem (on the submit host for now)
>
> It also involves some things that apply more to large clusters, but may
> be generalizable to generic grid environments:
>
> - broadcast common files used by many jobs from the submit host to the
> worker nodes
> - use "intermediate" filesystems striped across cluster nodes, rather
> than local filesystems on cluster nodes, where this is more efficient or
> is needed
> - have the worker nodes selectively access files from local,
> intermediate, or global storage, depending on where the submit host
> workflow decided to place them
> - keep a catalog in the cluster of what files are on what local host,
> and a protocol to transfer them from where they were produced to where
> they need to be consumed (this feature is like data diffusion, but needs
> more thought and experience to determine how useful it is; not may
> workflows need it).
>
> This is being done so far in discussion with me, Ioan, and Ian, but we'd
> like to get you and Mihael and anyone else interested to join in; Kamil
> Iskra and Justin Wozniak from the MCS ZeptoOS and Radix groups are
> involved as well).
>
> We should use this thread to discuss your IO strategy below first,
> before we involve the MTDM experiments, but one common thread seems to
> be mastersing the changes in Swift data management that allow for us to
> explore these new data management modes.
>
> If you recall, we've had discussions in the past on having something
> like "pluggable data management strategies" that allowed a given script
> to be executed in different environments with different strategies -
> either globallu set or set by site.
>
> I'm offline a lot untill Monday with a proposal deadline, and hope to
> comment and rejoin the discussion by then or shorty after.
>
> - Mike
>
>
> On 4/22/09 7:12 AM, Ben Clifford wrote:
>> I implemented an extremely poor quality prototype to try to get my
>> head around some of the execution semantics when running through:
>>
>> i) the gLite workload management system (hereafter, WMS) as used in
>> the South African Natioanl Grid (my short-term interest) and in
>> EGEE (my longer term interest)
>>
>> ii) condor used as an LRM to manage nodes which do not have a shared
>> filesystem, without any "grid stuff" involved
>>
>> In the case of the WMS, it is a goal to have the WMS perform site
>> selection, rather than submitting clients (such as Swift). I don't
>> particularly agree with this, but there it is.
>>
>> In the case of condor-with-no-shared-fs, one of the basic requirements
>> of a Swift site is violated - that of an easily accessible shared file
>> system.
>>
>> Both the WMS and condor provide an alternative to Swift's file
>> management model; and both of their approaches look similar.
>>
>> In a job submission, one specifies the files to be staged into an
>> arbitrary working directory before execution, and the files to be
>> staged out after execution.
>>
>> My prototype is intended to get practical experience interfacing Swift
>> to a job submission with those semantics.
>>
>> What I have done in my implementation is rip out almost the entirety
>> of the execute2/site file cache/wrapper.sh layers, and replace it with
>> a callout to a user-specified shell script. The shell script is passed
>> the submit side paths of input files and of output files, and the
>> commandline.
>>
>> The shell script is then entirely responsible for causing the job to
>> run somewhere and for doing appropriate input and output staging.
>>
>> Into this shell interface, I then have two scripts, one for
>> sagrid/glite and one for condor-with-no-shared-fs.
>>
>> They are similar to each other, differing only in the syntax of the
>> submission commands and files.
>>
>> These scripts create a single input tarball, create a job submission
>> file, submit to the appropriate submit command, hang round polling for
>> status until the job is finished, and unpack an output tarball.
>> Tarballs are used rather than explicitly listing each input and output
>> file for two reasons: i) if an output file is missing (perhaps due to
>> application failure) I would like the job submission to still return
>> what it has (most especially remote log files). As long as a tarball
>> is made with *something*, this works. ii) condor (and perhaps WMS)
>> apparently cannot handle directory hierarchies in their
>> stagein/stageout parameters.
>>
>> I have tested on the SAgrid testing environment (for WMS) and this
>> works (although quite slowly, as the WMS reports job status changes
>> quite slowly); and on a condor installation on gwynn.bsd.uchicago.edu
>> (this has a shared filesystem, so is not a totally satisfactory test).
>> I also sent this to Mats to test in his environment (as a project he
>> has was my immediate motivation for the condor side of this).
>>
>> This prototype approach loses a huge chunk of Swift execution-side
>> functionality such as replication, clustering, coasters (deliberately
>> - I was targetting getting SwiftScript programs running, rather than
>> getting a decent integration with the interesting execution stuff we
>> have made).
>>
>> As such, it is entirely inappropriate for production (or even most
>> experimental) use.
>>
>> However, it has given me another perspective on submitting jobs to the
>> above two environments.
>>
>> For condor:
>>
>> The zipped input/output sandbox approach seems to work nicely.
>>
>> To mould this into something more in tune with what is in Swift now, I
>> think is not crazy hard - the input and output staging parts of
>> execute2 would need to change into something that creates/unpacks a
>> tarball and appropriately modifies the job description so that when it
>> is run by the existing execution mechanism, the tarballs get carried
>> along. (to test if you bothered reading this, if you paste me the
>> random string H14n$=N:t)Z you get a free beer)
>>
>> As specified above, that approach does not work with clustering or
>> with coasters, though both could be modified so as to support such
>> (for example, clustering could be made to merge all stagein and
>> stageout listings for jobs; and coasters could be given a different
>> interface to the existing coaster file transfer mechanism). It might
>> be that coasters and clusters are not particularly desired in this
>> environment, though.
>>
>> For glite execution - the big loss here I think is coasters, because
>> its a very spread out grid environment. So with this approach,
>> applications which work well without coasters will probably work well;
>> but applications which are reliant on coasters for their performance
>> will work as dismally as when run without coasters in any other grid
>> environment. I can think of various modifications, similar to those
>> mentioned in the condor section above, to try to make them work
>> through this submission system, but it might be that a totally
>> different approach to my above implementation is warranted for coaster
>> based execution on glite, with more explicit specification of which
>> sites to run on, rather than allowing the WMS any choice, and only
>> running on sites which do have a shared filesystem available.
>>
>> I think in the short term, my interest is in getting this stuff more
>> closely integrated without focusing too much on coasters and clusters.
>>
>> Comments.
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list