[Swift-devel] Re: replication vs site score
Ioan Raicu
iraicu at cs.uchicago.edu
Thu Apr 9 12:34:26 CDT 2009
Mihael Hategan wrote:
> On Thu, 2009-04-09 at 10:12 -0700, Ioan Raicu wrote:
>
>>>
>>>
>> No workers sit idle, waiting for other workers to start. The resource
>> allocation takes some amount of time to boot up the OS on each node,
>> mount GPFS, start Falkon service, start Falkon workers, etc... see
>> http://dev.globus.org/wiki/Image:Falkon-BGP-startup-time.jpg. Its true
>> that there is some difference between the 1st worker starting, and the
>> last worker starting, probably on the order of seconds to maybe minutes
>> at the largest scale of 160K processors. If this is a concern, the idle
>> time as the system starts up, you can start Swift before 100% of the
>> system is operational. The system is partitioned in 64 node chunks, so,
>> in theory, Swift could start as soon as 64 nodes are online. Although,
>> this could also have its own problems.
>>
>
> This assumes a single site and exact knowledge of how to fit the
> workload.
>
Nope, its a single site if you want to start at the earliest possible
time, but once all nodes are started, it becomes a multi-site
allocation, where each site is a 64 node chunk of the allocation.
> I also assume this works when you have a reservation, otherwise you may
> have better chances with smaller chunks.
>
Up to 8K cores, we usually run without reservations. Beyond that, we do
get reservations.
>
>
--
===================================================
Ioan Raicu, Ph.D.
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090409/5e622013/attachment.html>
More information about the Swift-devel
mailing list