[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?
Ioan Raicu
iraicu at cs.uchicago.edu
Wed Apr 2 15:17:17 CDT 2008
Hi Ben,
Thanks again for the patches, they made a huge difference, increased
efficiency from 21% to 81%!
Here are the numbers:
1 Node Perf Falkon Swift+Falkon Swift+Falkon (patched)
Min 63.618 53.782 169.139 58.538
Average 64.76 65.47253 309.1945 80.21246
Median 64.74072 64.774 313.5535 76.5245
Max 65.863 94.447 605.654 115.237
Standard Deviation 0.488984 3.863944 52.13821 10.95652
Efficiency 100% 99% 21% 81%
The first column shows the per task statistic when running on 1 node (4
CPUs) through Falkon. The second column are the statistics for running
the application at large scale, on 2048 CPUs. The 3rd column is running
Swift+Falkon (both from SVN) on 256 CPUs. The 4th column is
Swift+Falkon, but Swift has the 3 patches applied. Essentially, the per
task execution time was reduced from 309 seconds to 80 seconds, where
the ideal would have been 64 seconds. It brought the efficiency from
21% to 81% for this particular workload. This looks fantastic!
We'll have to verify that we can maintain this 81% efficiency to higher
number of CPUs. In the meantime, if you can think of anything else that
we could do to keep pushing the 81% efficiency number higher, let us know.4
Thanks again,
Ioan
Ben Clifford wrote:
> On Mon, 31 Mar 2008, Ben Clifford wrote:
>
>
>> This temporary directory handling is pretty ugly - it should be a couple
>> lines change to wrapper.sh to get similar functionality using the existing
>> swift temporary direcotry handling - change the path to /tmp and use cp
>> instead of ln -s. That way you can take advantage of Swift's existing
>> unique job IDs and error handling too.
>>
>
> Attached are three patches that will apply against svn r1775:
>
> The first puts temporary directories in /tmp rather than on shared fs.
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
>
> The second copies the application file to the worker in each job execution
> (though doesn't do any worker-node caching of such between jobs)
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
>
> The third creates the worker node log on /tmp and copies it at the end.
> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
>
> The three modify all wrapper.sh and should be applied in the above order.
>
> With the first two patches, the timestamps in the usual info logs will
> provide information about how long the copies take, in the same way that
> they usually indicate times for other execution stages.
>
>
--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080402/96847267/attachment.html>
More information about the Swift-user
mailing list