[Swift-devel] Clustering and Temp Dirs with Swift

Mihael Hategan hategan at mcs.anl.gov
Sun Oct 28 17:53:04 CDT 2007


On Sun, 2007-10-28 at 17:46 -0500, Michael Wilde wrote:
> Workflow IDs dont need to be unique outside of a user or group.

They should.

> 
> Im happy to call my runs angle001, angle002, cnari001, etc.

You're forgetting I2U2, which will likely have lots of worklfows
running. I think the workflow IDs should stay as they are.

> 
> Having said all that, I dont have strong feelings on it at this point, 
> except to note that the small easy numbers make it easier on *most* 
> user, for a long time, till their needs outgrow smaller local ID spaces.
> 
> I'd rather revisit UUID strategies again down the road when we hit that 
> as a scalability problem, and keep simple things simpler for now.
> 
> This will be much nicer for examples, tutorials, etc in addition to most 
> normal usage.
> 
> - Mike
> 
> > 
> >> it would be easy to eg put 100 files per 
> >> dir by taking say the leftmost 6 characters and making that a dirname 
> >> within which the rightmost 2 chars would vary:
> > 
> > With alpha-numeric ones, it's fairly easy to put 37 files per dir.
> > 
> > Anyway. It doesn't matter. Either way. The problem isn't what exact
> > numbering base we're using, but how exactly we put them in
> > subdirectories.
> > 
> >> tlivaj/angle4-tlivajim-kickstart.xml
> >> tlivaj/angle4-tlivajin-kickstart.xml
> >> tlivaj/angle4-tlivajio-kickstart.xml
> >> tlivaj/angle4-tlivajip-kickstart.xml
> >> tlivaj/angle4-tlivajiq-kickstart.xml
> >>
> >> but easier on my eyes would be:
> >> 000000/angle4-00000001-kickstart.xml
> > 
> > Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
> > of values:
> > 
> > 00000000000000/angle4-00000000000001-kickstart.xml
> > 
> > 
> >> 000000/angle4-00000002-kickstart.xml
> >> ...
> >> 000000/angle4-00000099-kickstart.xml
> >> ...
> >> 000020/angle4-00002076-kickstart.xml
> >> etc.
> >>
> >> This makes splitting based on powers of 10 (or 26 or 36) trivial. Other 
> >> splits can be done with mod() functions.
> >>
> >> Can we start heading in this or some similar direction?
> >>
> >> We need to coordinate a plan for this, I suspect, to make Andrew's 
> >> workflows perform acceptably.
> >>
> >> - Mike
> >>
> >>
> >>
> >> On 10/27/07 2:08 PM, Ben Clifford wrote:
> >>> On Sat, 27 Oct 2007, Mihael Hategan wrote:
> >>>
> >>>> Quickly before I leave the house:
> >>>> Perhaps we could try copying to local FS instead of linking from shared
> >>>> dir and hence running the jobs on the local FS.
> >>> Maybe. I'd be suspicious that doesn't reduce access to the directory too 
> >>> much.
> >>>
> >>> I think the directories where there are lots of files being read/written 
> >>> by lots of hosts are:
> >>>
> >>> the top directory (one job directory per job)
> >>> the info directory
> >>> the kickstart directory
> >>> the file cache
> >>>
> >>> In the case where directories get too many files in them because of 
> >>> directory size constraints, its common to split that directory into many 
> >>> smaller directories (eg. how squid caching, or git object storage works). 
> >>> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some 
> >>> short hash of the filename (with the hash here being 'extract the first 
> >>> two characters).
> >>>
> >>> Pretty much I think Andrew wanted to do that for his data files anyway, 
> >>> which would then reflect in the layout of the data cache directory 
> >>> structure.
> >>>
> >>> For job directories, it may not be too hard to split the big directories 
> >>> into smaller ones. There will still be write-lock conflicts, but this 
> >>> might mean the contention for each directories write-lock is lower.
> >>>
> > 
> > 
> 




More information about the Swift-devel mailing list