[Swift-devel] Re: replication vs site score

Mihael Hategan hategan at mcs.anl.gov
Thu Apr 9 12:22:49 CDT 2009


On Thu, 2009-04-09 at 10:12 -0700, Ioan Raicu wrote:
> 
> Mihael Hategan wrote:
> > Why I don't get is (and this is what I understand by "static
> > provisioning") where is the benefit in having a barrier that waits for
> > all requested workers to start, given that some workers will start
> > before others and will invariably have to sit idle until all workers are
> > started.
> >
> >   
> No workers sit idle, waiting for other workers to start. The resource 
> allocation takes some amount of time to boot up the OS on each node, 
> mount GPFS, start Falkon service, start Falkon workers, etc... see 
> http://dev.globus.org/wiki/Image:Falkon-BGP-startup-time.jpg. Its true 
> that there is some difference between the 1st worker starting, and the 
> last worker starting, probably on the order of seconds to maybe minutes 
> at the largest scale of 160K processors. If this is a concern, the idle 
> time as the system starts up, you can start Swift before 100% of the 
> system is operational. The system is partitioned in 64 node chunks, so, 
> in theory, Swift could start as soon as 64 nodes are online. Although, 
> this could also have its own problems.

This assumes a single site and exact knowledge of how to fit the
workload.

I also assume this works when you have a reservation, otherwise you may
have better chances with smaller chunks.




More information about the Swift-devel mailing list