[Swift-devel] Re: replication vs site score
Ioan Raicu
iraicu at cs.uchicago.edu
Thu Apr 9 12:12:00 CDT 2009
Mihael Hategan wrote:
> Why I don't get is (and this is what I understand by "static
> provisioning") where is the benefit in having a barrier that waits for
> all requested workers to start, given that some workers will start
> before others and will invariably have to sit idle until all workers are
> started.
>
>
No workers sit idle, waiting for other workers to start. The resource
allocation takes some amount of time to boot up the OS on each node,
mount GPFS, start Falkon service, start Falkon workers, etc... see
http://dev.globus.org/wiki/Image:Falkon-BGP-startup-time.jpg. Its true
that there is some difference between the 1st worker starting, and the
last worker starting, probably on the order of seconds to maybe minutes
at the largest scale of 160K processors. If this is a concern, the idle
time as the system starts up, you can start Swift before 100% of the
system is operational. The system is partitioned in 64 node chunks, so,
in theory, Swift could start as soon as 64 nodes are online. Although,
this could also have its own problems.
Its not clear to me how dynamic the sites.xml file is. The location of
the Falkon services is placed in the sites.xml file. Lets take an
example of 4096 processor run, which would have 16 Falkon services when
its 100% allocated. That means 16 entries in the sites.xml. If we wait
for all 16 entries, we might waste a few idle cycles. If we start when
the 1st entry is in the sites.xml (the first 64 nodes), and later the
sites.xml file is updated with the rest of the 15 entries, will Swift
re-read the sites.xml and figure out that there are additional sites to
consider? How often does Swift re-read the sites.xml? If it does not
re-read it, then in the current setup, we can do this, and have to wait
for all resources to be 100% allocated before we start.
Ioan
>
>
--
===================================================
Ioan Raicu, Ph.D.
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================
More information about the Swift-devel
mailing list