[Swift-devel] Clustering and Temp Dirs with Swift
Mihael Hategan
hategan at mcs.anl.gov
Sun Oct 28 17:34:06 CDT 2007
> Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
> of values:
>
> 00000000000000/angle4-00000000000001-kickstart.xml
Although that's silly. We'll never have more than 10 million jobs of a
kind (pretty much like 640K should be enough for everybody).
>
>
> > 000000/angle4-00000002-kickstart.xml
> > ...
> > 000000/angle4-00000099-kickstart.xml
> > ...
> > 000020/angle4-00002076-kickstart.xml
> > etc.
> >
> > This makes splitting based on powers of 10 (or 26 or 36) trivial. Other
> > splits can be done with mod() functions.
> >
> > Can we start heading in this or some similar direction?
> >
> > We need to coordinate a plan for this, I suspect, to make Andrew's
> > workflows perform acceptably.
> >
> > - Mike
> >
> >
> >
> > On 10/27/07 2:08 PM, Ben Clifford wrote:
> > >
> > > On Sat, 27 Oct 2007, Mihael Hategan wrote:
> > >
> > >> Quickly before I leave the house:
> > >> Perhaps we could try copying to local FS instead of linking from shared
> > >> dir and hence running the jobs on the local FS.
> > >
> > > Maybe. I'd be suspicious that doesn't reduce access to the directory too
> > > much.
> > >
> > > I think the directories where there are lots of files being read/written
> > > by lots of hosts are:
> > >
> > > the top directory (one job directory per job)
> > > the info directory
> > > the kickstart directory
> > > the file cache
> > >
> > > In the case where directories get too many files in them because of
> > > directory size constraints, its common to split that directory into many
> > > smaller directories (eg. how squid caching, or git object storage works).
> > > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some
> > > short hash of the filename (with the hash here being 'extract the first
> > > two characters).
> > >
> > > Pretty much I think Andrew wanted to do that for his data files anyway,
> > > which would then reflect in the layout of the data cache directory
> > > structure.
> > >
> > > For job directories, it may not be too hard to split the big directories
> > > into smaller ones. There will still be write-lock conflicts, but this
> > > might mean the contention for each directories write-lock is lower.
> > >
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list