[Swift-devel] Swift and BGP plots

Thu Oct 29 09:30:00 CDT 2009


Mihael Hategan wrote:
> On Wed, 2009-10-28 at 23:11 -0500, Ioan Raicu wrote:
>   
>> Mihael,
>> Did you figure out why I am seeing 8K and 12K active tasks, when we
>> only had 4K and 6K CPU cores?
>>     
>
> Haven't tried.
>
>   
>>  Were there really 128K tasks in the workflow?
>>     
>
> Nope. 64K.
>   
OK, it would be good to look at why we have double the # of tasks. It 
must be my filtering of the Swift log. Here was my filtered log:
http://www.ece.northwestern.edu/~iraicu/scratch/logs/dc-4000-active-completed.txt

This filtered log was generated by:
cat dc-4000.log | grep "JOB_SUBMISSION" | grep "TaskImpl" | grep 
"Active" > dc-4000-active-completed.txt
cat dc-4000.log | grep "JOB_SUBMISSION" | grep "TaskImpl" | grep 
"Completed" >> dc-4000-active-completed.txt
>   
>>  Just want to make sure the log conversion worked as expected.
>>
>> Also, assuming there were really 128K tasks of 60 sec each, and 8K
>> CPUs, the ideal time to complete the run 4K would be 960 sec.
>>     
>
> That's one calculation that won't be bothered by doubling everything.
> But no, there were 64k tasks.
>
>   
If there were 64K tasks and 4K CPUs, then the ideal time will be the 
same, 960 sec.
>>  Run4K ran in 1183 sec, giving us an end-to-end efficiency of 81%.
>>
>> For the run6K, the ideal time was 640 sec, so with an actual time of
>> 884, we got an end-to-end efficiency of 72%.
>>     
>
> It depends whether you count from the time the partition boots or from
> the time swift starts. We could count the queue/partition boot time, but
> that doesn't tell us much about swift. On the other hand, if we don't
> there's still some submission happening during that time, so that
> counts.
>   
I count from where the log starts. There is about 20 seconds of 
inactivity at the beginning of the log, but at around 20 sec in one log, 
and 24 sec in the other log, 1 job is submitted and running. At about 
120 second into the run, the floodgate is opened and many jobs are 
submitted and start running. So, should we count from time 0, 20, or 
120? I guess its all about what you are trying to measure and show. In 
all cases, I think the workers were provisioned, its just a matter of 
how much of the Swift overhead you want to take into account I think.
> The numbers for Falkon, were the workers started already?
>
>   
Yes, the workers were already provisioned in that case.
>> Not quite the 90%+ efficiencies when looking at a per task level, but
>> still quite good! 
>>     
>
> I'm not quite sure what's happening. Maybe I wasn't clear. Though I was.
> Is there some misunderstanding here about the different things being
> measured and how?
>   
No. The real way to compute efficiency is to use the end-to-end time of 
the real run compared to the ideal run. The other efficiency I sometimes 
throw out is the per task efficiency, where you take the average real 
run time of all tasks, and compare it to the ideal time of a task. This 
second measure of efficiency is usually optimistic, but it allows us to 
measure efficiency between various different runs that might be too 
difficult to compare using the traditional efficiency metric.

Ioan
>
>   

-- 
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384 
Evanston, IL 60208-3118
=================================================================
Cel:   1-847-722-0876
Tel:   1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web:   http://www.eecs.northwestern.edu/~iraicu/
       https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20091029/ff308bfd/attachment.html>