[Swift-devel] Clustering and Temp Dirs with Swift

Mihael Hategan hategan at mcs.anl.gov
Sun Oct 28 21:33:49 CDT 2007


On Sun, 2007-10-28 at 19:51 -0500, Ioan Raicu wrote:
> At the Microsoft workshop I just attended, someone had a 25 million
> task application that dealt with AIDS research :)

:)

We might also get there at some undetermined point in the future.
Luckily we can easily change the scheme at that time without causing too
much trouble.

Do you know the name of the system? It may be very useful to learn how
they do it, and what problems they have hit.

> 
> Mihael Hategan wrote: 
> > > Well lg(37^9) =~ 14, so you need about 14 digits to cover the same range
> > > of values:
> > > 
> > > 00000000000000/angle4-00000000000001-kickstart.xml
> > >     
> > 
> > Although that's silly. We'll never have more than 10 million jobs of a
> > kind (pretty much like 640K should be enough for everybody).
> > 
> >   
> > > > 000000/angle4-00000002-kickstart.xml
> > > > ...
> > > > 000000/angle4-00000099-kickstart.xml
> > > > ...
> > > > 000020/angle4-00002076-kickstart.xml
> > > > etc.
> > > > 
> > > > This makes splitting based on powers of 10 (or 26 or 36) trivial. Other 
> > > > splits can be done with mod() functions.
> > > > 
> > > > Can we start heading in this or some similar direction?
> > > > 
> > > > We need to coordinate a plan for this, I suspect, to make Andrew's 
> > > > workflows perform acceptably.
> > > > 
> > > > - Mike
> > > > 
> > > > 
> > > > 
> > > > On 10/27/07 2:08 PM, Ben Clifford wrote:
> > > >       
> > > > > On Sat, 27 Oct 2007, Mihael Hategan wrote:
> > > > > 
> > > > >         
> > > > > > Quickly before I leave the house:
> > > > > > Perhaps we could try copying to local FS instead of linking from shared
> > > > > > dir and hence running the jobs on the local FS.
> > > > > >           
> > > > > Maybe. I'd be suspicious that doesn't reduce access to the directory too 
> > > > > much.
> > > > > 
> > > > > I think the directories where there are lots of files being read/written 
> > > > > by lots of hosts are:
> > > > > 
> > > > > the top directory (one job directory per job)
> > > > > the info directory
> > > > > the kickstart directory
> > > > > the file cache
> > > > > 
> > > > > In the case where directories get too many files in them because of 
> > > > > directory size constraints, its common to split that directory into many 
> > > > > smaller directories (eg. how squid caching, or git object storage works). 
> > > > > eg, given a file fubar.txt store it in fu/fubar.txt, with 'fu' being some 
> > > > > short hash of the filename (with the hash here being 'extract the first 
> > > > > two characters).
> > > > > 
> > > > > Pretty much I think Andrew wanted to do that for his data files anyway, 
> > > > > which would then reflect in the layout of the data cache directory 
> > > > > structure.
> > > > > 
> > > > > For job directories, it may not be too hard to split the big directories 
> > > > > into smaller ones. There will still be write-lock conflicts, but this 
> > > > > might mean the contention for each directories write-lock is lower.
> > > > > 
> > > > >         
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > >     
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> >   
> 
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================




More information about the Swift-devel mailing list