[Swift-devel] Clustering and Temp Dirs with Swift

Mihael Hategan hategan at mcs.anl.gov
Sat Oct 27 14:17:18 CDT 2007


On Sat, 2007-10-27 at 19:08 +0000, Ben Clifford wrote:
> 
> On Sat, 27 Oct 2007, Mihael Hategan wrote:
> 
> > Quickly before I leave the house:

Hmm. How naive.

> > Perhaps we could try copying to local FS instead of linking from shared
> > dir and hence running the jobs on the local FS.
> 
> Maybe. I'd be suspicious that doesn't reduce access to the directory too 
> much.
> 
> I think the directories where there are lots of files being read/written 
> by lots of hosts are:
> 
> the top directory (one job directory per job)
> the info directory
> the kickstart directory
> the file cache
> 
> In the case where directories get too many files in them because of 
> directory size constraints, its common to split that directory into many 
> smaller directories (eg. how squid caching, or git object storage works). 
> eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some 
> short hash of the filename (with the hash here being 'extract the first 
> two characters).
> 
> Pretty much I think Andrew wanted to do that for his data files anyway, 
> which would then reflect in the layout of the data cache directory 
> structure.
> 
> For job directories, it may not be too hard to split the big directories 
> into smaller ones. There will still be write-lock conflicts, but this 
> might mean the contention for each directories write-lock is lower.

Right. Some of these are easy to avoid and some are harder.

The hash idea is brilliant. I think.

> 




More information about the Swift-devel mailing list