[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?
Zhao Zhang
zhaozhang at uchicago.edu
Thu Apr 3 06:45:14 CDT 2008
Sorry, Ben.
I didn't save the swift log file. If you really need the old -info file,
I could redo the test, and try to send them to you.
But for now, I have several urgent issues.
zhao
Ben Clifford wrote:
> I just asked zhao for the log files (both swift and -info) for the patched
> run; but I think I'd like to see the unpatched run logs too.
>
> On Wed, 2 Apr 2008, Ioan Raicu wrote:
>
>
>> Hi Ben,
>> Thanks again for the patches, they made a huge difference, increased
>> efficiency from 21% to 81%!
>>
>> Here are the numbers:
>>
>> 1 Node Perf Falkon Swift+Falkon Swift+Falkon (patched)
>> Min 63.618 53.782 169.139 58.538
>> Average 64.76 65.47253 309.1945 80.21246
>> Median 64.74072 64.774 313.5535 76.5245
>> Max 65.863 94.447 605.654 115.237
>> Standard Deviation 0.488984 3.863944 52.13821
>> 10.95652
>> Efficiency 100% 99% 21% 81%
>>
>>
>> The first column shows the per task statistic when running on 1 node (4 CPUs)
>> through Falkon. The second column are the statistics for running the
>> application at large scale, on 2048 CPUs. The 3rd column is running
>> Swift+Falkon (both from SVN) on 256 CPUs. The 4th column is Swift+Falkon, but
>> Swift has the 3 patches applied. Essentially, the per task execution time was
>> reduced from 309 seconds to 80 seconds, where the ideal would have been 64
>> seconds. It brought the efficiency from 21% to 81% for this particular
>> workload. This looks fantastic!
>> We'll have to verify that we can maintain this 81% efficiency to higher number
>> of CPUs. In the meantime, if you can think of anything else that we could do
>> to keep pushing the 81% efficiency number higher, let us know.4
>>
>> Thanks again,
>> Ioan
>>
>> Ben Clifford wrote:
>>
>>> On Mon, 31 Mar 2008, Ben Clifford wrote:
>>>
>>>
>>>
>>>> This temporary directory handling is pretty ugly - it should be a couple
>>>> lines change to wrapper.sh to get similar functionality using the existing
>>>> swift temporary direcotry handling - change the path to /tmp and use cp
>>>> instead of ln -s. That way you can take advantage of Swift's existing
>>>> unique job IDs and error handling too.
>>>>
>>>>
>>> Attached are three patches that will apply against svn r1775:
>>>
>>> The first puts temporary directories in /tmp rather than on shared fs.
>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
>>>
>>> The second copies the application file to the worker in each job execution
>>> (though doesn't do any worker-node caching of such between jobs)
>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
>>>
>>> The third creates the worker node log on /tmp and copies it at the end.
>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
>>>
>>> The three modify all wrapper.sh and should be applied in the above order.
>>>
>>> With the first two patches, the timestamps in the usual info logs will
>>> provide information about how long the copies take, in the same way that
>>> they usually indicate times for other execution stages.
>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080403/eda9b16f/attachment.html>
More information about the Swift-user
mailing list