[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Ioan Raicu iraicu at cs.uchicago.edu
Wed Apr 2 15:17:17 CDT 2008

Hi Ben,
Thanks again for the patches, they made a huge difference, increased 
efficiency from 21% to 81%!

Here are the numbers:

	1 Node Perf 	Falkon 	Swift+Falkon 	Swift+Falkon (patched)
Min 	63.618 	53.782 	169.139 	58.538
Average 	64.76 	65.47253 	309.1945 	80.21246
Median 	64.74072 	64.774 	313.5535 	76.5245
Max 	65.863 	94.447 	605.654 	115.237
Standard Deviation 	0.488984 	3.863944 	52.13821 	10.95652
Efficiency 	100% 	99% 	21% 	81%

The first column shows the per task statistic when running on 1 node (4 
CPUs) through Falkon.  The second column are the statistics for running 
the application at large scale, on 2048 CPUs.  The 3rd column is running 
Swift+Falkon (both from SVN) on 256 CPUs.  The 4th column is 
Swift+Falkon, but Swift has the 3 patches applied.  Essentially, the per 
task execution time was reduced from 309 seconds to 80 seconds, where 
the ideal would have been 64 seconds.  It brought the efficiency from 
21% to 81% for this particular workload.  This looks fantastic! 

We'll have to verify that we can maintain this 81% efficiency to higher 
number of CPUs.  In the meantime, if you can think of anything else that 
we could do to keep pushing the 81% efficiency number higher, let us know.4

Thanks again,

Ben Clifford wrote:
> On Mon, 31 Mar 2008, Ben Clifford wrote:
>> This temporary directory handling is pretty ugly - it should be a couple 
>> lines change to wrapper.sh to get similar functionality using the existing 
>> swift temporary direcotry handling - change the path to /tmp and use cp 
>> instead of ln -s. That way you can take advantage of Swift's existing 
>> unique job IDs and error handling too.
> Attached are three patches that will apply against svn r1775:
> The first puts temporary directories in /tmp rather than on shared fs.
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
> The second copies the application file to the worker in each job execution 
> (though doesn't do any worker-node caching of such between jobs)
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
> The third creates the worker node log on /tmp and copies it at the end.
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
> The three modify all wrapper.sh and should be applied in the above order.
> With the first two patches, the timestamps in the usual info logs will 
> provide information about how long the copies take, in the same way that 
> they usually indicate times for other execution stages.

Ioan Raicu
Ph.D. Candidate
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080402/96847267/attachment.html>

More information about the Swift-user mailing list