[Swift-devel] coaster status summary

Fri Apr 4 07:12:47 CDT 2008

On Fri, 2008-04-04 at 06:59 -0500, Michael Wilde wrote:
> Mihael, this is great progress - very exciting.
> Some questions (dont need answers right away):
> 
> How would the end user use it? Manually start a service?
> Is the service a separate process, or in the swift jvm?

I though the lines below answered some of these.

A user would specify the coaster provider in sites.xml. The provider
will then automatically deploy a service on the target machine without
the user having to do so. Given that the service is on a different
machine than the client, they can't be in the same JVM.

> How are the number of workers set or adjusted?

Currently workers are requested as much as needed, up to a maximum. This
is preliminary hence "Better allocation strategy for workers".

> Does a service manage workers on one cluster or many?

One service per cluster.

> At 180 jobs/sec with 10 workers, what were the CPU loads on swift, 
> worker and service?

I faintly recall them being at less than 50% for some reason I don't
understand.

> 
> Do you want to try this on the workflows we're running on Falkon on the 
> BGP and SiCortex?

Let me repeat "prototype" and "more testing". In no way do I want to do
preliminary testing with an application that is shaky on an architecture
that is also shaky.

Mihael

> 
> Im eager to try it when you feel its ready for others to test.
> 
> Nice work!
> 
> - Mike
> 
> 
> 
> On 4/4/08 4:39 AM, Mihael Hategan wrote:
> > I've been asked for a summary of the status of the coaster prototype, so
> > here it is:
> > - It's a prototype so bugs are plenty
> > - It's self deployed (you don't need to start a service on the target
> > cluster)
> > - You can also use it while starting a service on the target cluster
> > - There is a worker written in Perl
> > - It uses encryption between client and coaster service
> > - It uses UDP between the service and the workers (this may prove to be
> > better or worse choice than TCP)
> > - A preliminary test done locally shows an amortized throughput of
> > around 180 jobs/s (/bin/date). This was done with encryption and with 10
> > workers. Pretty picture attached (total time vs. # of jobs)
> > 
> > To do:
> > - The scheduling algorithm in the service needs a bit more work
> > - When worker messages are lost, some jobs may get lost (i.e. needs more
> > fault tolerance)
> > - Start testing it on actual clusters
> > - Do some memory consumption benchmarks
> > - Better allocation strategy for workers
> > 
> > Mihael
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>