[Swift-devel] Clustering and Temp Dirs with Swift
Ioan Raicu
iraicu at cs.uchicago.edu
Sun Oct 28 22:37:40 CDT 2007
I remember the guy who gave the talk, so when they send out the slides,
I can point you to the exact source. In the meantime, from what I
remember, it was an app that ran over a Microsoft Windows Cluster
Edition with 300 processors, and the application completed in some 24
hours (~1 sec / job). That is an average throughput of 300+ jobs/sec,
pretty impressive. Now, I don't know if the app was using any workflow
system, or if it was simply an app that could talk to a cluster to
submit jobs. I'll try to find out more details on this, as I think it
would be great to be able to compare even with Falkon at some level.
Ioan
Mihael Hategan wrote:
> On Sun, 2007-10-28 at 19:51 -0500, Ioan Raicu wrote:
>
>> At the Microsoft workshop I just attended, someone had a 25 million
>> task application that dealt with AIDS research :)
>>
>
> :)
>
> We might also get there at some undetermined point in the future.
> Luckily we can easily change the scheme at that time without causing too
> much trouble.
>
> Do you know the name of the system? It may be very useful to learn how
> they do it, and what problems they have hit.
>
>
>> Mihael Hategan wrote:
>>
>>>> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
>>>> of values:
>>>>
>>>> 00000000000000/angle4-00000000000001-kickstart.xml
>>>>
>>>>
>>> Although that's silly. We'll never have more than 10 million jobs of a
>>> kind (pretty much like 640K should be enough for everybody).
>>>
>>>
>>>
>>>>> 000000/angle4-00000002-kickstart.xml
>>>>> ...
>>>>> 000000/angle4-00000099-kickstart.xml
>>>>> ...
>>>>> 000020/angle4-00002076-kickstart.xml
>>>>> etc.
>>>>>
>>>>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other
>>>>> splits can be done with mod() functions.
>>>>>
>>>>> Can we start heading in this or some similar direction?
>>>>>
>>>>> We need to coordinate a plan for this, I suspect, to make Andrew's
>>>>> workflows perform acceptably.
>>>>>
>>>>> - Mike
>>>>>
>>>>>
>>>>>
>>>>> On 10/27/07 2:08 PM, Ben Clifford wrote:
>>>>>
>>>>>
>>>>>> On Sat, 27 Oct 2007, Mihael Hategan wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Quickly before I leave the house:
>>>>>>> Perhaps we could try copying to local FS instead of linking from shared
>>>>>>> dir and hence running the jobs on the local FS.
>>>>>>>
>>>>>>>
>>>>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too
>>>>>> much.
>>>>>>
>>>>>> I think the directories where there are lots of files being read/written
>>>>>> by lots of hosts are:
>>>>>>
>>>>>> the top directory (one job directory per job)
>>>>>> the info directory
>>>>>> the kickstart directory
>>>>>> the file cache
>>>>>>
>>>>>> In the case where directories get too many files in them because of
>>>>>> directory size constraints, its common to split that directory into many
>>>>>> smaller directories (eg. how squid caching, or git object storage works).
>>>>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some
>>>>>> short hash of the filename (with the hash here being 'extract the first
>>>>>> two characters).
>>>>>>
>>>>>> Pretty much I think Andrew wanted to do that for his data files anyway,
>>>>>> which would then reflect in the layout of the data cache directory
>>>>>> structure.
>>>>>>
>>>>>> For job directories, it may not be too hard to split the big directories
>>>>>> into smaller ones. There will still be write-lock conflicts, but this
>>>>>> might mean the contention for each directories write-lock is lower.
>>>>>>
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>>
>> --
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web: http://www.cs.uchicago.edu/~iraicu
>> http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>
>
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20071028/c54a2d0e/attachment.html>
More information about the Swift-devel
mailing list