[Swift-devel] Re: replication vs site score
Mihael Hategan
hategan at mcs.anl.gov
Thu Apr 9 12:22:49 CDT 2009
On Thu, 2009-04-09 at 10:12 -0700, Ioan Raicu wrote:
>
> Mihael Hategan wrote:
> > Why I don't get is (and this is what I understand by "static
> > provisioning") where is the benefit in having a barrier that waits for
> > all requested workers to start, given that some workers will start
> > before others and will invariably have to sit idle until all workers are
> > started.
> >
> >
> No workers sit idle, waiting for other workers to start. The resource
> allocation takes some amount of time to boot up the OS on each node,
> mount GPFS, start Falkon service, start Falkon workers, etc... see
> http://dev.globus.org/wiki/Image:Falkon-BGP-startup-time.jpg. Its true
> that there is some difference between the 1st worker starting, and the
> last worker starting, probably on the order of seconds to maybe minutes
> at the largest scale of 160K processors. If this is a concern, the idle
> time as the system starts up, you can start Swift before 100% of the
> system is operational. The system is partitioned in 64 node chunks, so,
> in theory, Swift could start as soon as 64 nodes are online. Although,
> this could also have its own problems.
This assumes a single site and exact knowledge of how to fit the
workload.
I also assume this works when you have a reservation, otherwise you may
have better chances with smaller chunks.
More information about the Swift-devel
mailing list