[Swift-devel] Re: replication vs site score

Ioan Raicu iraicu at cs.uchicago.edu
Thu Apr 9 12:12:00 CDT 2009



Mihael Hategan wrote:
> Why I don't get is (and this is what I understand by "static
> provisioning") where is the benefit in having a barrier that waits for
> all requested workers to start, given that some workers will start
> before others and will invariably have to sit idle until all workers are
> started.
>
>   
No workers sit idle, waiting for other workers to start. The resource 
allocation takes some amount of time to boot up the OS on each node, 
mount GPFS, start Falkon service, start Falkon workers, etc... see 
http://dev.globus.org/wiki/Image:Falkon-BGP-startup-time.jpg. Its true 
that there is some difference between the 1st worker starting, and the 
last worker starting, probably on the order of seconds to maybe minutes 
at the largest scale of 160K processors. If this is a concern, the idle 
time as the system starts up, you can start Swift before 100% of the 
system is operational. The system is partitioned in 64 node chunks, so, 
in theory, Swift could start as soon as 64 nodes are online. Although, 
this could also have its own problems.

Its not clear to me how dynamic the sites.xml file is. The location of 
the Falkon services is placed in the sites.xml file. Lets take an 
example of 4096 processor run, which would have 16 Falkon services when 
its 100% allocated. That means 16 entries in the sites.xml. If we wait 
for all 16 entries, we might waste a few idle cycles. If we start when 
the 1st entry is in the sites.xml (the first 64 nodes), and later the 
sites.xml file is updated with the rest of the 15 entries, will Swift 
re-read the sites.xml and figure out that there are additional sites to 
consider? How often does Swift re-read the sites.xml? If it does not 
re-read it, then in the current setup, we can do this, and have to wait 
for all resources to be 100% allocated before we start.

Ioan
>
>   

-- 
===================================================
Ioan Raicu, Ph.D.
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================




More information about the Swift-devel mailing list