[Swift-devel] Clustering and Temp Dirs with Swift

Ioan Raicu iraicu at cs.uchicago.edu
Sun Oct 28 22:37:40 CDT 2007


I remember the guy who gave the talk, so when they send out the slides, 
I can point you to the exact source.  In the meantime, from what I 
remember, it was an app that ran over a Microsoft Windows Cluster 
Edition with 300 processors, and the application completed in some 24 
hours (~1 sec / job).  That is an average throughput of 300+ jobs/sec, 
pretty impressive.  Now, I don't know if the app was using any workflow 
system, or if it was simply an app that could talk to a cluster to 
submit jobs.  I'll try to find out more details on this, as I think it 
would be great to be able to compare even with Falkon at some level.

Ioan

Mihael Hategan wrote:
> On Sun, 2007-10-28 at 19:51 -0500, Ioan Raicu wrote:
>   
>> At the Microsoft workshop I just attended, someone had a 25 million
>> task application that dealt with AIDS research :)
>>     
>
> :)
>
> We might also get there at some undetermined point in the future.
> Luckily we can easily change the scheme at that time without causing too
> much trouble.
>
> Do you know the name of the system? It may be very useful to learn how
> they do it, and what problems they have hit.
>
>   
>> Mihael Hategan wrote: 
>>     
>>>> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
>>>> of values:
>>>>
>>>> 00000000000000/angle4-00000000000001-kickstart.xml
>>>>     
>>>>         
>>> Although that's silly. We'll never have more than 10 million jobs of a
>>> kind (pretty much like 640K should be enough for everybody).
>>>
>>>   
>>>       
>>>>> 000000/angle4-00000002-kickstart.xml
>>>>> ...
>>>>> 000000/angle4-00000099-kickstart.xml
>>>>> ...
>>>>> 000020/angle4-00002076-kickstart.xml
>>>>> etc.
>>>>>
>>>>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other 
>>>>> splits can be done with mod() functions.
>>>>>
>>>>> Can we start heading in this or some similar direction?
>>>>>
>>>>> We need to coordinate a plan for this, I suspect, to make Andrew's 
>>>>> workflows perform acceptably.
>>>>>
>>>>> - Mike
>>>>>
>>>>>
>>>>>
>>>>> On 10/27/07 2:08 PM, Ben Clifford wrote:
>>>>>       
>>>>>           
>>>>>> On Sat, 27 Oct 2007, Mihael Hategan wrote:
>>>>>>
>>>>>>         
>>>>>>             
>>>>>>> Quickly before I leave the house:
>>>>>>> Perhaps we could try copying to local FS instead of linking from shared
>>>>>>> dir and hence running the jobs on the local FS.
>>>>>>>           
>>>>>>>               
>>>>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too 
>>>>>> much.
>>>>>>
>>>>>> I think the directories where there are lots of files being read/written 
>>>>>> by lots of hosts are:
>>>>>>
>>>>>> the top directory (one job directory per job)
>>>>>> the info directory
>>>>>> the kickstart directory
>>>>>> the file cache
>>>>>>
>>>>>> In the case where directories get too many files in them because of 
>>>>>> directory size constraints, its common to split that directory into many 
>>>>>> smaller directories (eg. how squid caching, or git object storage works). 
>>>>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some 
>>>>>> short hash of the filename (with the hash here being 'extract the first 
>>>>>> two characters).
>>>>>>
>>>>>> Pretty much I think Andrew wanted to do that for his data files anyway, 
>>>>>> which would then reflect in the layout of the data cache directory 
>>>>>> structure.
>>>>>>
>>>>>> For job directories, it may not be too hard to split the big directories 
>>>>>> into smaller ones. There will still be write-lock conflicts, but this 
>>>>>> might mean the contention for each directories write-lock is lower.
>>>>>>
>>>>>>         
>>>>>>             
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>>     
>>>>         
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>   
>>>       
>> -- 
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>>        http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>     
>
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20071028/c54a2d0e/attachment.html>


More information about the Swift-devel mailing list