[Swift-devel] Clustering and Temp Dirs with Swift
Mihael Hategan
hategan at mcs.anl.gov
Sun Oct 28 17:53:04 CDT 2007
On Sun, 2007-10-28 at 17:46 -0500, Michael Wilde wrote:
> Workflow IDs dont need to be unique outside of a user or group.
They should.
>
> Im happy to call my runs angle001, angle002, cnari001, etc.
You're forgetting I2U2, which will likely have lots of worklfows
running. I think the workflow IDs should stay as they are.
>
> Having said all that, I dont have strong feelings on it at this point,
> except to note that the small easy numbers make it easier on *most*
> user, for a long time, till their needs outgrow smaller local ID spaces.
>
> I'd rather revisit UUID strategies again down the road when we hit that
> as a scalability problem, and keep simple things simpler for now.
>
> This will be much nicer for examples, tutorials, etc in addition to most
> normal usage.
>
> - Mike
>
> >
> >> it would be easy to eg put 100 files per
> >> dir by taking say the leftmost 6 characters and making that a dirname
> >> within which the rightmost 2 chars would vary:
> >
> > With alpha-numeric ones, it's fairly easy to put 37 files per dir.
> >
> > Anyway. It doesn't matter. Either way. The problem isn't what exact
> > numbering base we're using, but how exactly we put them in
> > subdirectories.
> >
> >> tlivaj/angle4-tlivajim-kickstart.xml
> >> tlivaj/angle4-tlivajin-kickstart.xml
> >> tlivaj/angle4-tlivajio-kickstart.xml
> >> tlivaj/angle4-tlivajip-kickstart.xml
> >> tlivaj/angle4-tlivajiq-kickstart.xml
> >>
> >> but easier on my eyes would be:
> >> 000000/angle4-00000001-kickstart.xml
> >
> > Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
> > of values:
> >
> > 00000000000000/angle4-00000000000001-kickstart.xml
> >
> >
> >> 000000/angle4-00000002-kickstart.xml
> >> ...
> >> 000000/angle4-00000099-kickstart.xml
> >> ...
> >> 000020/angle4-00002076-kickstart.xml
> >> etc.
> >>
> >> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other
> >> splits can be done with mod() functions.
> >>
> >> Can we start heading in this or some similar direction?
> >>
> >> We need to coordinate a plan for this, I suspect, to make Andrew's
> >> workflows perform acceptably.
> >>
> >> - Mike
> >>
> >>
> >>
> >> On 10/27/07 2:08 PM, Ben Clifford wrote:
> >>> On Sat, 27 Oct 2007, Mihael Hategan wrote:
> >>>
> >>>> Quickly before I leave the house:
> >>>> Perhaps we could try copying to local FS instead of linking from shared
> >>>> dir and hence running the jobs on the local FS.
> >>> Maybe. I'd be suspicious that doesn't reduce access to the directory too
> >>> much.
> >>>
> >>> I think the directories where there are lots of files being read/written
> >>> by lots of hosts are:
> >>>
> >>> the top directory (one job directory per job)
> >>> the info directory
> >>> the kickstart directory
> >>> the file cache
> >>>
> >>> In the case where directories get too many files in them because of
> >>> directory size constraints, its common to split that directory into many
> >>> smaller directories (eg. how squid caching, or git object storage works).
> >>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some
> >>> short hash of the filename (with the hash here being 'extract the first
> >>> two characters).
> >>>
> >>> Pretty much I think Andrew wanted to do that for his data files anyway,
> >>> which would then reflect in the layout of the data cache directory
> >>> structure.
> >>>
> >>> For job directories, it may not be too hard to split the big directories
> >>> into smaller ones. There will still be write-lock conflicts, but this
> >>> might mean the contention for each directories write-lock is lower.
> >>>
> >
> >
>
More information about the Swift-devel
mailing list