[Swift-devel] Clustering and Temp Dirs with Swift

Fri Oct 26 16:31:30 CDT 2007

On Fri, 2007-10-26 at 16:23 -0500, Ioan Raicu wrote:
> Hi,
> 
> Andrew Robert Jamieson wrote:
> > Ioan,
> >
> >   Thanks for the explaination.  It seems like you characterized what 
> > is going on pretty well.
> >
> > One question I have is, does this case occur only for situations in 
> > which it is in the same directory or is it anywhere at any given time 
> > in the shared GPFS?
> >
> I don't know, but as far as I can tell, Swift will create these temp 
> scratch directories per job in the same subdirectory (Mihael or Ben, 
> please correct me if I am wrong on this).  I have seen this behavior for 
> certain in this case, but am not sure if things get better if you were 
> to work in completely separate parts of the filesystem. 
> > Furthermore, why can't the short lived directory live on the local 
> > node's /tmp/* somewhere?  I have wrapped all my programs to ensure 
> > that things are ONLY executed on the local node directories to 
> > specifically aviod this type of problem. Now Swift is making that 
> > effort irrelevant it seems.

Right. And Swift has an inefficient implementation there which needs to
be fixed.

> They could, with some modifications to the wrapper script.  Or with some 
> higher level logic that manages the data on the local disk and moves it 
> in and out from and to the shared file system.   Your short term 
> solution would probably be the first option, changing the wrapper script 
> to support local disk usage.  Maybe there are other solutions as well.
> 
> Ioan
> >
> > Does this seem reasonable?
> >
> > Thanks,
> > Andrew
> >
> > On Fri, 26 Oct 2007, Ioan Raicu wrote:
> >
> >> I am not sure what configuration exists on TP, but on the TeraGrid 
> >> ANL/UC cluster, with 8 servers behind GPFS, the wrapper script 
> >> performance (create dir, create symbolic links, remove directory... 
> >> all on GPFS) is anywhere between 20~40 / sec, depending on how many 
> >> nodes you have doing this concurrently.  The throughput increases 
> >> first as you add nodes, but then decreases down to about 20/sec with 
> >> 20~30+ nodes.  What this means is that even if you bundle jobs up, 
> >> you will not get anything better than this, throughput wise, 
> >> regardless of how short the jobs are.  Now, if TP has less than 8 
> >> servers, its likely that the throughput it can sustain is even lower, 
> >> and if you push it over the edge, even to the point of thrashing 
> >> where the throughput can be extremely small.   I don't have any 
> >> suggestions of how you can get around this, with the exception of 
> >> making your job sizes larger on average, and hence have fewer jobs 
> >> over the same period of time.
> >>
> >> Ioan
> >>
> >> Andrew Robert Jamieson wrote:
> >>> I am kind of at a stand still for getting anything done on TP right 
> >>> now with this problem. Are there any suggestions to overcome this 
> >>> for the time being?
> >>>
> >>> On Fri, 26 Oct 2007, Andrew Robert Jamieson wrote:
> >>>
> >>>> Hello all,
> >>>>
> >>>>  I am encountering the following problem on Teraport.  I submit a 
> >>>> clustered swift WF which should amount to something on the order of 
> >>>> 850x3 individual jobs total. I have clustered the jobs because they 
> >>>> are very fast (somewhere around 20 sec to 1 min long).  When I 
> >>>> submit the WF on TP things start out fantastic, I get 10s of output 
> >>>> files in a matter of seconds and nodes would start and finish 
> >>>> clustered batches in a matter of minutes or less. However, after 
> >>>> waiting about 3-5 mins, when clustered jobs are begin to line up in 
> >>>> the queue and more start running at the same time, things start to 
> >>>> slow down to a trickle in terms of output.
> >>>>
> >>>> One thing I noticed is when I try a simply ls on TP in the swift 
> >>>> temp running directory where the temp job dirs are created and 
> >>>> destroyed, it take a very long time.  And when it is done only five 
> >>>> or so things are in the dir. (this is the dir with "info  
> >>>> kickstart  shared  status wrapper.log" in it).  What I think is 
> >>>> happening is that TP's filesystem cant handle this extremely rapid 
> >>>> creation/destruction of directories in that shared location. From 
> >>>> what I have been told these temp dirs come and go as long as the 
> >>>> job runs successfully.
> >>>>
> >>>> What I am wondering is if there is anyway to move that dir to the 
> >>>> local node tmp diretory not the shared file system, while it is 
> >>>> running and if something fails then have it sent to the appropriate 
> >>>> place.
> >>>>
> >>>> Or, if another layer of temp dir wrapping could be applied with 
> >>>> labeld perhaps with respect to the clustered job grouping and not 
> >>>> simply the individual jobs (since there are thousands being 
> >>>> computed at once).
> >>>> That these things would only be generated/deleted every 5 mins or 
> >>>> 10 mins (if clustered properly on my part) instead of one event 
> >>>> every milli second or what have you.
> >>>>
> >>>> I don't know which solution is feasible or if any are at all, but 
> >>>> this seems to be a major problem for my WFs.  In general it is 
> >>>> never good to have a million things coming and going on a shared 
> >>>> file system in one place, from my experience at least.
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Andrew
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>
> >>
> >> -- 
> >> ============================================
> >> Ioan Raicu
> >> Ph.D. Student
> >> ============================================
> >> Distributed Systems Laboratory
> >> Computer Science Department
> >> University of Chicago
> >> 1100 E. 58th Street, Ryerson Hall
> >> Chicago, IL 60637
> >> ============================================
> >> Email: iraicu at cs.uchicago.edu
> >> Web:   http://www.cs.uchicago.edu/~iraicu
> >>      http://dsl.cs.uchicago.edu/
> >> ============================================
> >> ============================================
> >>
> >>
> >
>