[Swift-devel] Swift and BGP plots

Ioan Raicu iraicu at cs.uchicago.edu
Mon Oct 26 23:19:01 CDT 2009



Mihael Hategan wrote:
> On Mon, 2009-10-26 at 16:36 -0500, Ioan Raicu wrote:
> [...]
>
>   
>> For all the above stats, I don't understand why the minimum time is 
>> 10~20 seconds, when the jobs are sleep 60? Or perhaps you were doing 
>> sleep 0 here?
>>     
>
> Well, I don't understand what the log plot tools mean by those numbers,
> but it's clearly not the job time.
>   
OK. So which plot/data should I be looking at to get the summary of the 
per tasks performance?
>   
>>> 64k jobs, 4000 workers:
>>> http://www.mcs.anl.gov/~hategan/report-dc-4000/
>>>   
>>>       
>> Shortest event (s): 106.119999885559
>> Longest event (s): 1246.60699987411
>> Mean event duration (s): 334.987874176266
>> Standard deviation of event duration (s): 290.212811366649
>>
>> Efficiency: 18%?
>>     
>
> Well, I was a bit afraid that this will turn out into a numerology
> exercise. I'm not sure whether to be happy or sad that I was right.
>
> I'm not sure what "efficiency" is supposed to mean, 
The simplest definition of efficiency I used here, was:
ideal event duration / mean event duration

In the case of the above 18%, I took 60 / 334 ~ 0.18 = 18%

This doesn't take into account the ramp up, and ramp down time. In 
reality, the actual end-to-end efficiency is usually even lower.
> but you can look at
> a few things:
> 1. The coaster panel. That gives you average utilization of workers in
> each block measured by the code that sends the requests to the workers.
> It's obtained by dividing the time a CPU is known to be running a job
> divided by the time a CPU is known to be running a job plus the time a
> CPU is known to be sitting idle. I need to revise that a bit to account
> for delays in sending the messages to the workers, but it should be
> reasonably accurate. Those numbers are above 99% and therefore look
> suspiciously high.
>   
I see the block utilization near 100% all the time, so that doesn't seem 
to match the other data I saw.
> 2. Multiply 60s with the number of jobs (65535), divide by the number of
> workers (6*1024) and then by the total time since the first job starts
> to when the last job finishes (or you could choose the middle of the
> ramp-up to the middle of the ramp-down to get some sort of amortized
> efficiency). That gives you about 91% end-to end and 96% amortized. Or
> you could divide by the total time, including swift startup, partition
> boot time, etc. to get 64%.
>   
65535*60/(6*1024) ~ 640 sec. I see the end-to-end time being about 1300 
sec, or 1100 sec if we look at just Karajan. The 64% efficiency is in 
the ballpark, but I don't see where the 91% and 96% are coming from.

Under Karajan tab 
(http://www.mcs.anl.gov/~hategan/report-dc-4000/karajan.html), at the 
end of the page, I found:
Total number of events: 131076
Shortest event (s): 0
Longest event (s): 1200.02400016785
Total duration of all events (s): 8119724.183007
Mean event duration (s): 61.9466888141765
Standard deviation of event duration (s): 4.95689953321352
Maximum number of events at one time: 8194

This looks better in a sense, 61.95 sec mean, with 4.95 sec std dev. 
Although, it still looks a bit odd, as the shortest event is 0 sec, and 
longest is 1200 sec. This would infer an efficiency of 96%, which 
matches what you said above.

Where are the Swift logs for these runs? I have a tool that will convert 
Swift logs to Falkon log format, so I might understand better the data.

Ioan

> >From these you can derive various speedups, by multiplying those
> percentages with the number of workers (6*1024).
>
> [...]
>
>
>
>   

-- 
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384 
Evanston, IL 60208-3118
=================================================================
Cel:   1-847-722-0876
Tel:   1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web:   http://www.eecs.northwestern.edu/~iraicu/
       https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20091026/ba58cc2e/attachment.html>


More information about the Swift-devel mailing list