[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Zhao Zhang zhaozhang at uchicago.edu
Thu Apr 3 06:45:14 CDT 2008


Sorry, Ben.

I didn't save the swift log file. If you really need the old -info file, 
I could redo the test, and try to send them to you.
But for now, I have several urgent issues.

zhao

Ben Clifford wrote:
> I just asked zhao for the log files (both swift and -info) for the patched 
> run; but I think I'd like to see the unpatched run logs too.
>
> On Wed, 2 Apr 2008, Ioan Raicu wrote:
>
>   
>> Hi Ben,
>> Thanks again for the patches, they made a huge difference, increased
>> efficiency from 21% to 81%!
>>
>> Here are the numbers:
>>
>> 	1 Node Perf 	Falkon 	Swift+Falkon 	Swift+Falkon (patched)
>> Min 	63.618 	53.782 	169.139 	58.538
>> Average 	64.76 	65.47253 	309.1945 	80.21246
>> Median 	64.74072 	64.774 	313.5535 	76.5245
>> Max 	65.863 	94.447 	605.654 	115.237
>> Standard Deviation 	0.488984 	3.863944 	52.13821
>> 10.95652
>> Efficiency 	100% 	99% 	21% 	81%
>>
>>
>> The first column shows the per task statistic when running on 1 node (4 CPUs)
>> through Falkon.  The second column are the statistics for running the
>> application at large scale, on 2048 CPUs.  The 3rd column is running
>> Swift+Falkon (both from SVN) on 256 CPUs.  The 4th column is Swift+Falkon, but
>> Swift has the 3 patches applied.  Essentially, the per task execution time was
>> reduced from 309 seconds to 80 seconds, where the ideal would have been 64
>> seconds.  It brought the efficiency from 21% to 81% for this particular
>> workload.  This looks fantastic! 
>> We'll have to verify that we can maintain this 81% efficiency to higher number
>> of CPUs.  In the meantime, if you can think of anything else that we could do
>> to keep pushing the 81% efficiency number higher, let us know.4
>>
>> Thanks again,
>> Ioan
>>
>> Ben Clifford wrote:
>>     
>>> On Mon, 31 Mar 2008, Ben Clifford wrote:
>>>
>>>   
>>>       
>>>> This temporary directory handling is pretty ugly - it should be a couple
>>>> lines change to wrapper.sh to get similar functionality using the existing
>>>> swift temporary direcotry handling - change the path to /tmp and use cp
>>>> instead of ln -s. That way you can take advantage of Swift's existing
>>>> unique job IDs and error handling too.
>>>>     
>>>>         
>>> Attached are three patches that will apply against svn r1775:
>>>
>>> The first puts temporary directories in /tmp rather than on shared fs.
>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
>>>
>>> The second copies the application file to the worker in each job execution
>>> (though doesn't do any worker-node caching of such between jobs)
>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
>>>
>>> The third creates the worker node log on /tmp and copies it at the end.
>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
>>>
>>> The three modify all wrapper.sh and should be applied in the above order.
>>>
>>> With the first two patches, the timestamps in the usual info logs will
>>> provide information about how long the copies take, in the same way that
>>> they usually indicate times for other execution stages.
>>>
>>>   
>>>       
>>     
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080403/eda9b16f/attachment.html>


More information about the Swift-user mailing list