[Swift-devel] Clustering and Temp Dirs with Swift

Fri Oct 26 23:27:26 CDT 2007

On Fri, 2007-10-26 at 23:02 -0500, Ioan Raicu wrote:
> If it doesn't apply to meta-data operations, such as directories, then
> it means that meta-data changes in the file system is rather
> centralized (maybe this explains the relatively poor performance for
> creating and removing directories).

On GPFS, according to my understanding of their documentation, exactly
one node controls access to one file at any given time. If, for all
observable aspects of the implementation, a directory is a file with a
bunch of metadata for the files it contains, then doing things in a
directory from multiple places is similar to accessing the same file
from multiple places.

Unless I'm blatantly wrong. Probably some complications of that model
exist even if I'm not.

>   I would be curious to see how well the solution works to move data
> to the local disk first prior to processing, to avoid working from the
> shared file system (including the creation and removal of the scratch
> temp directory on GPFS).
> 
> Ioan  
> 
> Mihael Hategan wrote: 
> > On Fri, 2007-10-26 at 15:11 -0500, Ioan Raicu wrote:
> >   
> > > I am not sure what configuration exists on TP, but on the TeraGrid 
> > > ANL/UC cluster, with 8 servers behind GPFS, the wrapper script 
> > > performance (create dir, create symbolic links, remove directory... all 
> > > on GPFS) is anywhere between 20~40 / sec, depending on how many nodes 
> > > you have doing this concurrently.  The throughput increases first as you 
> > > add nodes, but then decreases down to about 20/sec with 20~30+ nodes.  
> > > What this means is that even if you bundle jobs up, you will not get 
> > > anything better than this, throughput wise, regardless of how short the 
> > > jobs are.  Now, if TP has less than 8 servers, its likely that the 
> > > throughput it can sustain is even lower,
> > >     
> > 
> > Perhaps in terms of bytes/s. But I wouldn't be so sure that this applies
> > to other file stuff.
> > 
> >   
> > > and if you push it over the 
> > > edge, even to the point of thrashing where the throughput can be 
> > > extremely small.   I don't have any suggestions of how you can get 
> > > around this, with the exception of making your job sizes larger on 
> > > average, and hence have fewer jobs over the same period of time.
> > > 
> > > Ioan
> > > 
> > > Andrew Robert Jamieson wrote:
> > >     
> > > > I am kind of at a stand still for getting anything done on TP right 
> > > > now with this problem. Are there any suggestions to overcome this for 
> > > > the time being?
> > > > 
> > > > On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote:
> > > > 
> > > >       
> > > > > Hello all,
> > > > > 
> > > > >  I am encountering the following problem on Teraport.  I submit a 
> > > > > clustered swift WF which should amount to something on the order of 
> > > > > 850x3 individual jobs total. I have clustered the jobs because they 
> > > > > are very fast (somewhere around 20 sec to 1 min long).  When I submit 
> > > > > the WF on TP things start out fantastic, I get 10s of output files in 
> > > > > a matter of seconds and nodes would start and finish clustered 
> > > > > batches in a matter of minutes or less. However, after waiting about 
> > > > > 3-5 mins, when clustered jobs are begin to line up in the queue and 
> > > > > more start running at the same time, things start to slow down to a 
> > > > > trickle in terms of output.
> > > > > 
> > > > > One thing I noticed is when I try a simply ls on TP in the swift temp 
> > > > > running directory where the temp job dirs are created and destroyed, 
> > > > > it take a very long time.  And when it is done only five or so things 
> > > > > are in the dir. (this is the dir with "info  kickstart  shared  
> > > > > status wrapper.log" in it).  What I think is happening is that TP's 
> > > > > filesystem cant handle this extremely rapid creation/destruction of 
> > > > > directories in that shared location. From what I have been told these 
> > > > > temp dirs come and go as long as the job runs successfully.
> > > > > 
> > > > > What I am wondering is if there is anyway to move that dir to the 
> > > > > local node tmp diretory not the shared file system, while it is 
> > > > > running and if something fails then have it sent to the appropriate 
> > > > > place.
> > > > > 
> > > > > Or, if another layer of temp dir wrapping could be applied with 
> > > > > labeld perhaps with respect to the clustered job grouping and not 
> > > > > simply the individual jobs (since there are thousands being computed 
> > > > > at once).
> > > > > That these things would only be generated/deleted every 5 mins or 10 
> > > > > mins (if clustered properly on my part) instead of one event every 
> > > > > milli second or what have you.
> > > > > 
> > > > > I don't know which solution is feasible or if any are at all, but 
> > > > > this seems to be a major problem for my WFs.  In general it is never 
> > > > > good to have a million things coming and going on a shared file 
> > > > > system in one place, from my experience at least.
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > Andrew
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > 
> > > > >         
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > 
> > > >       
> > > -- 
> > > ============================================
> > > Ioan Raicu
> > > Ph.D. Student
> > > ============================================
> > > Distributed Systems Laboratory
> > > Computer Science Department
> > > University of Chicago
> > > 1100 E. 58th Street, Ryerson Hall
> > > Chicago, IL 60637
> > > ============================================
> > > Email: iraicu at cs.uchicago.edu
> > > Web:   http://www.cs.uchicago.edu/~iraicu
> > >        http://dsl.cs.uchicago.edu/
> > > ============================================
> > > ============================================
> > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > >     
> > 
> > 
> >   
> 
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================