[Swift-devel] Clustering and Temp Dirs with Swift

Ioan Raicu iraicu at cs.uchicago.edu
Sun Oct 28 19:51:00 CDT 2007


At the Microsoft workshop I just attended, someone had a 25 million task 
application that dealt with AIDS research :)

Mihael Hategan wrote:
>   
>> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
>> of values:
>>
>> 00000000000000/angle4-00000000000001-kickstart.xml
>>     
>
> Although that's silly. We'll never have more than 10 million jobs of a
> kind (pretty much like 640K should be enough for everybody).
>
>   
>>     
>>> 000000/angle4-00000002-kickstart.xml
>>> ...
>>> 000000/angle4-00000099-kickstart.xml
>>> ...
>>> 000020/angle4-00002076-kickstart.xml
>>> etc.
>>>
>>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other 
>>> splits can be done with mod() functions.
>>>
>>> Can we start heading in this or some similar direction?
>>>
>>> We need to coordinate a plan for this, I suspect, to make Andrew's 
>>> workflows perform acceptably.
>>>
>>> - Mike
>>>
>>>
>>>
>>> On 10/27/07 2:08 PM, Ben Clifford wrote:
>>>       
>>>> On Sat, 27 Oct 2007, Mihael Hategan wrote:
>>>>
>>>>         
>>>>> Quickly before I leave the house:
>>>>> Perhaps we could try copying to local FS instead of linking from shared
>>>>> dir and hence running the jobs on the local FS.
>>>>>           
>>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too 
>>>> much.
>>>>
>>>> I think the directories where there are lots of files being read/written 
>>>> by lots of hosts are:
>>>>
>>>> the top directory (one job directory per job)
>>>> the info directory
>>>> the kickstart directory
>>>> the file cache
>>>>
>>>> In the case where directories get too many files in them because of 
>>>> directory size constraints, its common to split that directory into many 
>>>> smaller directories (eg. how squid caching, or git object storage works). 
>>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some 
>>>> short hash of the filename (with the hash here being 'extract the first 
>>>> two characters).
>>>>
>>>> Pretty much I think Andrew wanted to do that for his data files anyway, 
>>>> which would then reflect in the layout of the data cache directory 
>>>> structure.
>>>>
>>>> For job directories, it may not be too hard to split the big directories 
>>>> into smaller ones. There will still be write-lock conflicts, but this 
>>>> might mean the contention for each directories write-lock is lower.
>>>>
>>>>         
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>     
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20071028/3280cb37/attachment.html>


More information about the Swift-devel mailing list