[Swift-devel] swift-on-ec2

Ioan Raicu iraicu at cs.uchicago.edu
Tue May 15 16:16:14 CDT 2007


Hi,
See below:

Ben Clifford wrote:
> Ian asked about this elsewhere, but its perhaps interesting for 
> swift-devel people to look at the questions too.
>
> On Tue, 15 May 2007, Ian Foster wrote:
>
>   
>> Dear All:
>>     
>                                                                                 
>   
>> I asked Kate if she and Tim could look into creating VM images that 
>> would allow us to run Swift applications on Amazon EC2. I think Kate is 
>> meeting with Ioan about this on Thursday (?).
>>     
>                                                                                 
>   
>> One issue that I thought would be good to discuss is what we'd want in 
>> that VM image. Perhaps this is obvious to the rest of you, but it isn't 
>> to me. A few thoughts:
>>     
>
>   
>> * I'm assuming that we want to run "workers" on EC2 nodes, and have the "task
>> dispatch" logic run on some external frontend system outside EC2.
>>     
>
>   
>> * I would think that we want to use Falkon to do the task dispatch. If so, we
>> need a Falkon executor on each VM, configured to check in with the Falkon
>> dispatcher. (Alternatively, we could use, say, SGE: in that case, we would
>> want an SGE agent.)
>>     
>
>   
>> *  We need a way of getting data to and from the worker nodes. Do we want to
>> run a file system across the EC2 nodes and the external frontend node? That
>> seems rather inefficient. Other options?
>>     
>
>   
>> * Should we preload the application code on each EC2 node?
>>     
>
> Here's a couple of approaches:
>
>  1) swift regards all the EC2 nodes that we are paying for as a single 
>     site.
>
> Something like falkon handles all the task dispatch and worker node 
> management. I don't know what that looks like at the moment in Falkon, but 
> the interface for Swift to send jobs into Falkon sounds pretty 
> straightforward and shouldn't need changing.
>
> All the nodes in a site are required by our site model to have a shared 
> filesystem - we've talked about removing it, but I think that is still the 
> case and if so, isn't going to change soon. timf probably knows more than 
> the people on this list about making shared filesystems.
>   
If we can get the data caching working in Falkon, we might be able to 
run Swift over Falkon without a shared file system.  This is still work 
in progress, but we might be closer to achieving this that not.  BTW, 
the data caching would mean that Swift does not stage in any data 
anymore, but wold essentially stand up a GridFTP server from where 
Falkon workers would get the needed data just when they need it.  We are 
still ironing out all this stuff, but it could potentially do away with 
the shared file sytem assumption.
> In this case, falkon would be doing the site selection.
>
>  2) swift regards each EC2 node as a separate site.
>
> So Swift would be doing site selection between each site (i.e. between 
> each EC2 node), and then submitting to that site.
>
> I don't know if the interface between Swift and eg. Falkon allows swift to 
> tell Falkon which remote node to run on.
>   
No, it does not... but the data caching work has added a data-aware 
scheduler that allows jobs to be run on nodes that have the data, and if 
they don't have the data, allow the respective node to get the data.
> However, Swift would then be able to use something like gridftp to stage 
> to each EC2 node (assuming that EC2 nodes can act as ftp servers - I don't 
> know what their network connectivity is like) - a shared filesystem 
> between all nodes in a site is pretty simple when there is only a single 
> node in the site.
>
>
> Amazon also has a storage cloud, alongside its compute cloud. I know very 
> little about that and have never thought about how it would fit into the 
> above (if at all). Maybe someone else knows more.
>   
I think the idea would be to use the Amazon 3S storage service as a 
common medium from where to get data and where to put it back.

Ioan

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================




More information about the Swift-devel mailing list