[Swift-devel] Re: 2 rack run with swift

Thu Jul 24 09:50:30 CDT 2008

On 7/24/08 9:24 AM, Ioan Raicu wrote:
> Hi,
> I did a similar run through Falkon only, and got:
> Number of Tasks: 16384
> Task Duration: 30 sec
> Average Task Execution Time (from Client point of view): 31.851 sec
> Number of CPUs: 8192
> Startup: 5.185 sec
> Execute: 80.656 sec
> Ideal time: 60 sec
> 
> Swift took some 600 seconds, and had an average per task run time of 
> 240.97 sec.  Zhao, was Swift patched up, with Ben's 3 patches from 
> April/May?

No.  But one or more of those patches may have been integrated into the 
source.  Still needs to be enabled.  We'll look into this more in the 
next few days.  I dont want to spend much time discussing this, though, 
until we have a chance to sort through all the issues we already know 
about: scheduler parameters, data management, wrapper script settings 
and patches, GPFS issues.

Tests at longer job durations are worth doing.

- Mike

I am curious what would happen if we throw 256 second tasks
> through Swift, at the same 2 rack scale?
> Ioan
> 
> Michael Wilde wrote:
>> Thanks, Zhao,
>>
>> This is a great initial snapshot of performance on the new BG/P Falkon 
>> server mechanism (1 server per pset).
>>
>> Its also the largest Swift run to date I know of in terms of "sites" 
>> (32) and processors used (8192).
>>
>> From a quick scan of the plots, it seems like we have some tuning to do:
>>
>> The ideal time for this run would be 120 seconds. It took 600 seconds. 
>> Thats in fact "not bad at all" for a first attempt at this scale, and 
>> very reasonable if the job length were longer. 16K jobs in 10 minutes 
>> is pretty good. The nearest real-world Falkon-only run I can compare 
>> to is the 15Kx9 DOCK run, which did 138K jobs in 40 minutes. This run 
>> performed at somewhat under half that rate.
>>
>> I suspect that the main bottleneck this is hitting is creation of job 
>> directories on the BGP. As we learned in the past few months of 
>> Falkon-only runs, creation of filesystem objects on GPFS is very 
>> expensive, and creation of two objects within the same parent 
>> directory by > 1 host is extremely expensive in locking contention.
>>
>> I *think* the plots bear this out, but need more assessment.
>>
>> I'd like to start by writing down a detailed description of the 
>> runtime file environment and management logic (i.e. job setup by swift 
>> and file management by wrapper.sh.  Then look to see which of the 
>> options Ben provided when we last did this, in March, were properly 
>> enabled. (Some may still be un-applied test patches). Then turn on 
>> some of the timing metrics in wrapper.sh to see where time is spent.
>>
>> I also see that job distribution among servers is pretty good - 
>> ranging from 490 to 600 jobs, but for the most part staying within 10 
>> jobs of the ideal, 512.
>>
>> I can't work on this today till our Swift report is done, but can then 
>> turn to it.  Ben, once you're done with the SA Grid School, we could 
>> use your help on this. Mihael, as well, if you're interested and able 
>> to help.
>>
>> For now, I think we know a few steps we can take to measure and 
>> improve things.
>>
>> - Mike
>>
>>
>> On 7/24/08 1:19 AM, Zhao Zhang wrote:
>>> Hi, All
>>>
>>> I just made a swift run of 16384 sleep_30 tasks on 2 racks, which are 
>>> 8192 cores. The log is at
>>> http://www.ci.uchicago.edu/~zzhang/report-sleep-20080724-0030-3zbv20j6/
>>>
>>> Tomorrow, I will  try to make a mars run with swift.
>>>
>>> zhao
>>
>