[Swift-devel] swift-on-ec2
Tim Freeman
tfreeman at mcs.anl.gov
Thu May 17 11:24:49 CDT 2007
On Thu, 17 May 2007 11:10:16 -0500
Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
>
>
> Kate Keahey wrote:
> >
> >
> > Ian Foster wrote:
> >> Kate:
> >>
> >> I want to emphasize that I was *not* dismissing the issues below as
> >> distractions.
> >>
> >> What I meant was: given that you are working on developing a "virtual
> >> cluster", which I am pretty sure will be able to execute Swift apps,
> >> let's focus on getting that done, rather than worrying about "special
> >> casing" it for Falkon, adding dynamic node acquisition, or the other
> >> things that people started discussing as potential extensions.
> >
> > We only now really began to discuss how to use VMs with Swift/Falkon
> > -- the original set of issues you posted was just what was needed, it
> > clearly inspired a very good discussion, and made me realize that I
> > should have been talking to a wider set of people about this. Please,
> > don't go back on us now... It also looks to me like there may be
> > solutions that will make more sense both from the perspective of the
> > architecture and will also be easier to implement with the current
> > state of virtualization tools. For example, if we can set up Falkon to
> > provision single nodes operating in pull mode (pulling work from a
> > "master") various contextualization issues will have become much easier.
> >
> >>
> >> I understand from our IM conversation today that the "virtual
> >> cluster" is ready for us in a "static environment" such as some
> >> machines in our lab. In a "dynamic environment" such as EC2, it is
> >> not quite ready for use yet. Thus, you won't be able to get Swift
> >> running on EC2 tomorrow.
> >
> > This is not quite accurate; static refers to statically assigned IPs
> > -- we have control over our IPs and can assign them to the cluster
> > nodes in the same way each time we deploy it. Amazon will choose new
> > IPs for the nodes each time the cluster is deployed, so each time the
> > configuration of the cluster will have to be adjusted to reflect
> > different IP assignment to the nodes (but if we were to change the IPs
> > on the cluster nodes in a local environment we would be just as dynamic).
> >
> > But if you deploy just one node (e.g., a node operating in the pull
> > mode as in the example above) the need for this configuration
> > adjustment may go away (depending on what the node does) so everything
> > may become much simpler.
> Currently, a Falkon executor (the worker code) upon bootstrapping, makes
> 1 WS call to the Falkon dispatcher (running in a GT4 container) to
> register its name and the port on which the notification engine is
> listening on. Once this is done, the executors go into a listen mode
> for notifications, and only acts (send WS calls out) upon the reception
> of notifications. So, the VMs that run the Falkon executors can get
> DHCP addresses, and the registration message will include all the
> necessary information about where the Falkon dispatcher needs to contact
> the respective Falkon executor
On EC2 the VM has a private address with a corresponding public one that it
can discover (through very EC2-specific mechanisms). We've been working on
abstractions and software for doing this in a non ad-hoc way. I'll let Kate
expound at your meeting.
> Now, the one configuration parameter
> that we must have is the location of the Falkon dispatcher. If we have
> it running in a static location (a well known machine and port), then
> this can be hard coded into the bootstrapping scripts, and there is no
> configuration needed! If the dispatcher does not have a static resource
> to run on (i.e. it runs in another VM), then this information needs to
> be passed to the executor bootstrapping scripts
Through those EC2-specific mechanisms you can push per VM instance deployment
and the VM instance can be coded to discover this bit of information just like
its public IP.
Tying VMs + grid computing to EC2 specific mechanisms is the totally wrong way
to go, but it may be necessary to case for it specifically in the VM's boot +
contextualization process since we (the grid computing people) don't control
the middleware there.
Tim
More information about the Swift-devel
mailing list