[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Ioan Raicu iraicu at cs.uchicago.edu
Wed Apr 2 15:17:17 CDT 2008


Hi Ben,
Thanks again for the patches, they made a huge difference, increased 
efficiency from 21% to 81%!

Here are the numbers:

	1 Node Perf 	Falkon 	Swift+Falkon 	Swift+Falkon (patched)
Min 	63.618 	53.782 	169.139 	58.538
Average 	64.76 	65.47253 	309.1945 	80.21246
Median 	64.74072 	64.774 	313.5535 	76.5245
Max 	65.863 	94.447 	605.654 	115.237
Standard Deviation 	0.488984 	3.863944 	52.13821 	10.95652
Efficiency 	100% 	99% 	21% 	81%


The first column shows the per task statistic when running on 1 node (4 
CPUs) through Falkon.  The second column are the statistics for running 
the application at large scale, on 2048 CPUs.  The 3rd column is running 
Swift+Falkon (both from SVN) on 256 CPUs.  The 4th column is 
Swift+Falkon, but Swift has the 3 patches applied.  Essentially, the per 
task execution time was reduced from 309 seconds to 80 seconds, where 
the ideal would have been 64 seconds.  It brought the efficiency from 
21% to 81% for this particular workload.  This looks fantastic! 

We'll have to verify that we can maintain this 81% efficiency to higher 
number of CPUs.  In the meantime, if you can think of anything else that 
we could do to keep pushing the 81% efficiency number higher, let us know.4

Thanks again,
Ioan

Ben Clifford wrote:
> On Mon, 31 Mar 2008, Ben Clifford wrote:
>
>   
>> This temporary directory handling is pretty ugly - it should be a couple 
>> lines change to wrapper.sh to get similar functionality using the existing 
>> swift temporary direcotry handling - change the path to /tmp and use cp 
>> instead of ln -s. That way you can take advantage of Swift's existing 
>> unique job IDs and error handling too.
>>     
>
> Attached are three patches that will apply against svn r1775:
>
> The first puts temporary directories in /tmp rather than on shared fs.
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
>
> The second copies the application file to the worker in each job execution 
> (though doesn't do any worker-node caching of such between jobs)
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
>
> The third creates the worker node log on /tmp and copies it at the end.
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
>
> The three modify all wrapper.sh and should be applied in the above order.
>
> With the first two patches, the timestamps in the usual info logs will 
> provide information about how long the copies take, in the same way that 
> they usually indicate times for other execution stages.
>
>   

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080402/96847267/attachment.html>


More information about the Swift-user mailing list