[Swift-devel] Clustering and Temp Dirs with Swift

Andrew Robert Jamieson andrewj at uchicago.edu
Fri Oct 26 13:59:53 CDT 2007


Hello all,

   I am encountering the following problem on Teraport.  I submit a 
clustered swift WF which should amount to something on the order of 850x3 
individual jobs total. I have clustered the jobs because they are very 
fast (somewhere around 20 sec to 1 min long).  When I submit the WF on TP 
things start out fantastic, I get 10s of output files in a matter of 
seconds and nodes would start and finish clustered batches in a matter of 
minutes or less. However, after waiting about 3-5 mins, when clustered 
jobs are begin to line up in the queue and more start running at the same 
time, things start to slow down to a trickle in terms of output.

One thing I noticed is when I try a simply ls on TP in the swift temp 
running directory where the temp job dirs are created and destroyed, it 
take a very long time.  And when it is done only five or so things are in 
the dir. (this is the dir with "info  kickstart  shared  status 
wrapper.log" in it).  What I think is happening is that TP's filesystem 
cant handle this extremely rapid creation/destruction of directories in 
that shared location. From what I have been told these temp dirs come and 
go as long as the job runs successfully.

What I am wondering is if there is anyway to move that dir to the local 
node tmp diretory not the shared file system, while it is running and if 
something fails then have it sent to the appropriate place.

Or, if another layer of temp dir wrapping could be applied with 
labeld perhaps with respect to the clustered job grouping and not simply 
the individual jobs (since there are thousands being computed at once).
That these things would only be generated/deleted every 5 mins or 10 mins 
(if clustered properly on my part) instead of one event every milli second 
or what have you.

I don't know which solution is feasible or if any are at all, but this 
seems to be a major problem for my WFs.  In general it is never good to 
have a million things coming and going on a shared file system in one 
place, from my experience at least.


Thanks,
Andrew



More information about the Swift-devel mailing list