[Swift-devel] Re: Another performance comparison of DOCK

Sun Apr 13 15:23:39 CDT 2008

Sorry for being late to the party, putting out other fires :)

Here is what Falkon logs say for this run:
2544.996 0 0 35 2048 2048 0 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 0 
0 100 1536 1331 1536
2545.996 1 1 35 2048 2008 0 40 0 0 40 0 0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 
1 0 99 1536 1322 1536
...
3814.999 1 1 35 2048 2047 0 1 0 0 1 0 6083 0.0 6083 0 0 0 0 0 0 0 0 0.0 
0.0 0 0 100 1536 1291 1536
3815.999 0 1 35 2048 2048 0 0 0 0 0 0 6084 1.0 6084 0 0 0 0 0 0 0 0 0.0 
0.0 1 1 98 1536 1291 1536

At 2545.996, it was the first time that Swift sent anything...
and at 3815, it was the time that the last job exit code was reported to 
Swift. 

So, runtime of 1270 seconds.  BTW, time 0 in the log maps back to
//0-time is 1208032721191ms

Also, the total CPU time from Falkon's point of view (accurate to the 
ms), is 1914115.25 CPU seconds, not 1190260.  So, by my numbers, I get:
1914115.25 / (1270 * 2048 ) = 0.735926446

This is already looking OK, isn't it?  Now, this doesn't actually look 
at the efficiency of the app, as it scaled up, which we would have to do 
by either repeating the same workload on 1 node, or taking a small 
sample of the workload and running on 1 node to compare against.

Ioan

Michael Wilde wrote:
> Ben, can you point me to the graphs for this run? (Zhao's *99cy0z4g.log)
>
> Here's a high-level summary of this run:
>
> Swift end   16:42:17
> Swift start 16:09:07
> Runtime        33:10 = 1990 seconds
>
> 2048 cores
>
> Total app wall time = 1190260 seconds
>
> 1190260 / ( 1990 * 2048 ) = .29 efficiency
>
> Once stage-ins start to complete, are the corresponding jobs initiated 
> quickly, or is Swift doing mostly stage-ins for some period?
>
> Zhao indicated he saw data indicating there was about a 700 second lag 
> from workflow start time till the first Falkon jobs started, if I 
> understood correctly. Do the graphs confirm this or say something 
> different?
>
> If the 700-second delay figure is true, and stage-in was eliminated by 
> copying input files right to the /tmp workdir rather than first to 
> /shared, then we'd have:
>
> 1190260 / ( 1290 * 2048 ) = .45 efficiency
>
> A good gain, but only partway to a number that looks good.
>
> I assume we're paying the same staging price on the output side?
>
> What I think we learned from the MARS app run, which had no input data 
> and only tiny output data files (10 bytes vs 10K bytes), was that the 
> optimized wrapper achieved somewhere between .7 to .8 efficiency.
>
> I'd like to look at whatever data we can get from this or similar 
> subsequent runs to learn what steps we could take next to increase the 
> efficiency metric.  Guidance welcome.
>
> Thanks,
>
> Mike
>
>
> On 4/13/08 11:37 AM, Ben Clifford wrote:
>> On Sat, 12 Apr 2008, Zhao Zhang wrote:
>>
>>> Hi, Ben
>>>
>>> I got a log file of 6084 successful runs on BGP. Check it here,
>>> terninable:/home/zzhang/swift_file/dock2-20080412-1609-99cy0z4g.log
>>
>> This one runs better - it gets up to a peak of 5000 jobs submitted 
>> into Falkon simultaneously, and spends a considerable amount of time 
>> over the 2048 level that I suppose is what you need to be over to get 
>> all 2048 CPUs used.
>>
>> There's a lot of stage-in activity that probably could be eliminated 
>> / changed for the single-filesytem case.
>>
>
>
> - Mike
>

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================