[Swift-devel] Swift and BGP plots
Ioan Raicu
iraicu at cs.uchicago.edu
Mon Oct 26 23:19:01 CDT 2009
Mihael Hategan wrote:
> On Mon, 2009-10-26 at 16:36 -0500, Ioan Raicu wrote:
> [...]
>
>
>> For all the above stats, I don't understand why the minimum time is
>> 10~20 seconds, when the jobs are sleep 60? Or perhaps you were doing
>> sleep 0 here?
>>
>
> Well, I don't understand what the log plot tools mean by those numbers,
> but it's clearly not the job time.
>
OK. So which plot/data should I be looking at to get the summary of the
per tasks performance?
>
>>> 64k jobs, 4000 workers:
>>> http://www.mcs.anl.gov/~hategan/report-dc-4000/
>>>
>>>
>> Shortest event (s): 106.119999885559
>> Longest event (s): 1246.60699987411
>> Mean event duration (s): 334.987874176266
>> Standard deviation of event duration (s): 290.212811366649
>>
>> Efficiency: 18%?
>>
>
> Well, I was a bit afraid that this will turn out into a numerology
> exercise. I'm not sure whether to be happy or sad that I was right.
>
> I'm not sure what "efficiency" is supposed to mean,
The simplest definition of efficiency I used here, was:
ideal event duration / mean event duration
In the case of the above 18%, I took 60 / 334 ~ 0.18 = 18%
This doesn't take into account the ramp up, and ramp down time. In
reality, the actual end-to-end efficiency is usually even lower.
> but you can look at
> a few things:
> 1. The coaster panel. That gives you average utilization of workers in
> each block measured by the code that sends the requests to the workers.
> It's obtained by dividing the time a CPU is known to be running a job
> divided by the time a CPU is known to be running a job plus the time a
> CPU is known to be sitting idle. I need to revise that a bit to account
> for delays in sending the messages to the workers, but it should be
> reasonably accurate. Those numbers are above 99% and therefore look
> suspiciously high.
>
I see the block utilization near 100% all the time, so that doesn't seem
to match the other data I saw.
> 2. Multiply 60s with the number of jobs (65535), divide by the number of
> workers (6*1024) and then by the total time since the first job starts
> to when the last job finishes (or you could choose the middle of the
> ramp-up to the middle of the ramp-down to get some sort of amortized
> efficiency). That gives you about 91% end-to end and 96% amortized. Or
> you could divide by the total time, including swift startup, partition
> boot time, etc. to get 64%.
>
65535*60/(6*1024) ~ 640 sec. I see the end-to-end time being about 1300
sec, or 1100 sec if we look at just Karajan. The 64% efficiency is in
the ballpark, but I don't see where the 91% and 96% are coming from.
Under Karajan tab
(http://www.mcs.anl.gov/~hategan/report-dc-4000/karajan.html), at the
end of the page, I found:
Total number of events: 131076
Shortest event (s): 0
Longest event (s): 1200.02400016785
Total duration of all events (s): 8119724.183007
Mean event duration (s): 61.9466888141765
Standard deviation of event duration (s): 4.95689953321352
Maximum number of events at one time: 8194
This looks better in a sense, 61.95 sec mean, with 4.95 sec std dev.
Although, it still looks a bit odd, as the shortest event is 0 sec, and
longest is 1200 sec. This would infer an efficiency of 96%, which
matches what you said above.
Where are the Swift logs for these runs? I have a tool that will convert
Swift logs to Falkon log format, so I might understand better the data.
Ioan
> >From these you can derive various speedups, by multiplying those
> percentages with the number of workers (6*1024).
>
> [...]
>
>
>
>
--
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384
Evanston, IL 60208-3118
=================================================================
Cel: 1-847-722-0876
Tel: 1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web: http://www.eecs.northwestern.edu/~iraicu/
https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20091026/ba58cc2e/attachment.html>
More information about the Swift-devel
mailing list