[Swift-devel] swift-on-ec2

Thu May 17 11:24:49 CDT 2007

On Thu, 17 May 2007 11:10:16 -0500
Ioan Raicu <iraicu at cs.uchicago.edu> wrote:

> 
> 
> Kate Keahey wrote:
> >
> >
> > Ian Foster wrote:
> >> Kate:
> >>
> >> I want to emphasize that I was *not* dismissing the issues below as 
> >> distractions.
> >>
> >> What I meant was: given that you are working on developing a "virtual 
> >> cluster", which I am pretty sure will be able to execute Swift apps, 
> >> let's focus on getting that done, rather than worrying about "special 
> >> casing" it for Falkon, adding dynamic node acquisition, or the other 
> >> things that people started discussing as potential extensions.
> >
> > We only now really began to discuss how to use VMs with Swift/Falkon 
> > -- the original set of issues you posted was just what was needed, it 
> > clearly inspired a very good discussion, and made me realize that I 
> > should have been talking to a wider set of people about this. Please, 
> > don't go back on us now... It also looks to me like there may be 
> > solutions that will make more sense both from the perspective of the 
> > architecture and will also be easier to implement with the current 
> > state of virtualization tools. For example, if we can set up Falkon to 
> > provision single nodes operating in pull mode (pulling work from a 
> > "master") various contextualization issues will have become much easier.
> >
> >>
> >> I understand from our IM conversation today that the "virtual 
> >> cluster" is ready for us in a "static environment" such as some 
> >> machines in our lab. In a "dynamic environment" such as EC2, it is 
> >> not quite ready for use yet. Thus, you won't be able to get Swift 
> >> running on EC2 tomorrow.
> >
> > This is not quite accurate; static refers to statically assigned IPs 
> > -- we have control over our IPs and can assign them to the cluster 
> > nodes in the same way each time we deploy it. Amazon will choose new 
> > IPs for the nodes each time the cluster is deployed, so each time the 
> > configuration of the cluster will have to be adjusted to reflect 
> > different IP assignment to the nodes (but if we were to change the IPs 
> > on the cluster nodes in a local environment we would be just as dynamic).
> >
> > But if you deploy just one node (e.g., a node operating in the pull 
> > mode as in the example above) the need for this configuration 
> > adjustment may go away (depending on what the node does) so everything 
> > may become much simpler.
> Currently, a Falkon executor (the worker code) upon bootstrapping, makes 
> 1 WS call to the Falkon dispatcher (running in a GT4 container) to 
> register its name and the port on which the notification engine is 
> listening on.  Once this is done, the executors go into a listen mode 
> for notifications, and only acts (send WS calls out) upon the reception 
> of notifications.  So, the VMs that run the Falkon executors can get 
> DHCP addresses, and the registration message will include all the 
> necessary information about where the Falkon dispatcher needs to contact 
> the respective Falkon executor

On EC2 the VM has a private address with a corresponding public one that it
can discover (through very EC2-specific mechanisms).  We've been working on
abstractions and software for doing this in a non ad-hoc way.  I'll let Kate
expound at your meeting. 

> Now, the one configuration parameter 
> that we must have is the location of the Falkon dispatcher.  If we have 
> it running in a static location (a well known machine and port), then 
> this can be hard coded into the bootstrapping scripts, and there is no 
> configuration needed!  If the dispatcher does not have a static resource 
> to run on (i.e. it runs in another VM), then this information needs to 
> be passed to the executor bootstrapping scripts

Through those EC2-specific mechanisms you can push per VM instance deployment
and the VM instance can be coded to discover this bit of information just like
its public IP.

Tying VMs + grid computing to EC2 specific mechanisms is the totally wrong way
to go, but it may be necessary to case for it specifically in the VM's boot +
contextualization process since we (the grid computing people) don't control
the middleware there. 

Tim