[Swift-devel] Clustering and Temp Dirs with Swift

Mihael Hategan hategan at mcs.anl.gov
Sun Oct 28 17:54:10 CDT 2007


The odds of that seem low, indeed :)

On Sun, 2007-10-28 at 17:47 -0500, Michael Wilde wrote:
> We're gonna have a pretty serious party when we complete our first 
> 10M-job workflow.  I look forward to this problem !!!
> 
> :) Mike
> 
> 
> On 10/28/07 5:34 PM, Mihael Hategan wrote:
> > 
> >> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
> >> of values:
> >>
> >> 00000000000000/angle4-00000000000001-kickstart.xml
> > 
> > Although that's silly. We'll never have more than 10 million jobs of a
> > kind (pretty much like 640K should be enough for everybody).
> > 
> >>
> >>> 000000/angle4-00000002-kickstart.xml
> >>> ...
> >>> 000000/angle4-00000099-kickstart.xml
> >>> ...
> >>> 000020/angle4-00002076-kickstart.xml
> >>> etc.
> >>>
> >>> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other 
> >>> splits can be done with mod() functions.
> >>>
> >>> Can we start heading in this or some similar direction?
> >>>
> >>> We need to coordinate a plan for this, I suspect, to make Andrew's 
> >>> workflows perform acceptably.
> >>>
> >>> - Mike
> >>>
> >>>
> >>>
> >>> On 10/27/07 2:08 PM, Ben Clifford wrote:
> >>>> On Sat, 27 Oct 2007, Mihael Hategan wrote:
> >>>>
> >>>>> Quickly before I leave the house:
> >>>>> Perhaps we could try copying to local FS instead of linking from shared
> >>>>> dir and hence running the jobs on the local FS.
> >>>> Maybe. I'd be suspicious that doesn't reduce access to the directory too 
> >>>> much.
> >>>>
> >>>> I think the directories where there are lots of files being read/written 
> >>>> by lots of hosts are:
> >>>>
> >>>> the top directory (one job directory per job)
> >>>> the info directory
> >>>> the kickstart directory
> >>>> the file cache
> >>>>
> >>>> In the case where directories get too many files in them because of 
> >>>> directory size constraints, its common to split that directory into many 
> >>>> smaller directories (eg. how squid caching, or git object storage works). 
> >>>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some 
> >>>> short hash of the filename (with the hash here being 'extract the first 
> >>>> two characters).
> >>>>
> >>>> Pretty much I think Andrew wanted to do that for his data files anyway, 
> >>>> which would then reflect in the layout of the data cache directory 
> >>>> structure.
> >>>>
> >>>> For job directories, it may not be too hard to split the big directories 
> >>>> into smaller ones. There will still be write-lock conflicts, but this 
> >>>> might mean the contention for each directories write-lock is lower.
> >>>>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> > 
> > 
> 




More information about the Swift-devel mailing list