[Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure?

Zhao Zhang zhaozhang at uchicago.edu
Thu Apr 3 14:47:04 CDT 2008


Thanks, Ben

zhao

Ben Clifford wrote:
> its fine for now.
>
> There's a convention for storing log files - put the .log file and the 
> whole .d director somewhere in ~benc/swift-logs/ in CI NFS space.
>
> Most simply, put files directly in there; for a more structured layout see 
> how mike has organised his stuff under ~benc/swift-logs/wilde/
>
> On Thu, 3 Apr 2008, Zhao Zhang wrote:
>
>   
>> Sorry, Ben.
>>
>> I didn't save the swift log file. If you really need the old -info file, I
>> could redo the test, and try to send them to you.
>> But for now, I have several urgent issues.
>>
>> zhao
>>
>> Ben Clifford wrote:
>>     
>>> I just asked zhao for the log files (both swift and -info) for the patched
>>> run; but I think I'd like to see the unpatched run logs too.
>>>
>>> On Wed, 2 Apr 2008, Ioan Raicu wrote:
>>>
>>>   
>>>       
>>>> Hi Ben,
>>>> Thanks again for the patches, they made a huge difference, increased
>>>> efficiency from 21% to 81%!
>>>>
>>>> Here are the numbers:
>>>>
>>>> 	1 Node Perf 	Falkon 	Swift+Falkon 	Swift+Falkon (patched)
>>>> Min 	63.618 	53.782 	169.139 	58.538
>>>> Average 	64.76 	65.47253 	309.1945 	80.21246
>>>> Median 	64.74072 	64.774 	313.5535 	76.5245
>>>> Max 	65.863 	94.447 	605.654 	115.237
>>>> Standard Deviation 	0.488984 	3.863944 	52.13821
>>>> 10.95652
>>>> Efficiency 	100% 	99% 	21% 	81%
>>>>
>>>>
>>>> The first column shows the per task statistic when running on 1 node (4
>>>> CPUs)
>>>> through Falkon.  The second column are the statistics for running the
>>>> application at large scale, on 2048 CPUs.  The 3rd column is running
>>>> Swift+Falkon (both from SVN) on 256 CPUs.  The 4th column is Swift+Falkon,
>>>> but
>>>> Swift has the 3 patches applied.  Essentially, the per task execution time
>>>> was
>>>> reduced from 309 seconds to 80 seconds, where the ideal would have been 64
>>>> seconds.  It brought the efficiency from 21% to 81% for this particular
>>>> workload.  This looks fantastic! We'll have to verify that we can maintain
>>>> this 81% efficiency to higher number
>>>> of CPUs.  In the meantime, if you can think of anything else that we could
>>>> do
>>>> to keep pushing the 81% efficiency number higher, let us know.4
>>>>
>>>> Thanks again,
>>>> Ioan
>>>>
>>>> Ben Clifford wrote:
>>>>     
>>>>         
>>>>> On Mon, 31 Mar 2008, Ben Clifford wrote:
>>>>>
>>>>>         
>>>>>           
>>>>>> This temporary directory handling is pretty ugly - it should be a
>>>>>> couple
>>>>>> lines change to wrapper.sh to get similar functionality using the
>>>>>> existing
>>>>>> swift temporary direcotry handling - change the path to /tmp and use
>>>>>> cp
>>>>>> instead of ln -s. That way you can take advantage of Swift's existing
>>>>>> unique job IDs and error handling too.
>>>>>>             
>>>>>>             
>>>>> Attached are three patches that will apply against svn r1775:
>>>>>
>>>>> The first puts temporary directories in /tmp rather than on shared fs.
>>>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp
>>>>>
>>>>> The second copies the application file to the worker in each job
>>>>> execution
>>>>> (though doesn't do any worker-node caching of such between jobs)
>>>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable
>>>>>
>>>>> The third creates the worker node log on /tmp and copies it at the end.
>>>>> http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally
>>>>>
>>>>> The three modify all wrapper.sh and should be applied in the above
>>>>> order.
>>>>>
>>>>> With the first two patches, the timestamps in the usual info logs will
>>>>> provide information about how long the copies take, in the same way that
>>>>> they usually indicate times for other execution stages.
>>>>>
>>>>>         
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080403/2cafe8ff/attachment.html>


More information about the Swift-user mailing list