[Swift-devel] Clustering and Temp Dirs with Swift
Andrew Robert Jamieson
andrewj at uchicago.edu
Fri Oct 26 13:59:53 CDT 2007
Hello all,
I am encountering the following problem on Teraport. I submit a
clustered swift WF which should amount to something on the order of 850x3
individual jobs total. I have clustered the jobs because they are very
fast (somewhere around 20 sec to 1 min long). When I submit the WF on TP
things start out fantastic, I get 10s of output files in a matter of
seconds and nodes would start and finish clustered batches in a matter of
minutes or less. However, after waiting about 3-5 mins, when clustered
jobs are begin to line up in the queue and more start running at the same
time, things start to slow down to a trickle in terms of output.
One thing I noticed is when I try a simply ls on TP in the swift temp
running directory where the temp job dirs are created and destroyed, it
take a very long time. And when it is done only five or so things are in
the dir. (this is the dir with "info kickstart shared status
wrapper.log" in it). What I think is happening is that TP's filesystem
cant handle this extremely rapid creation/destruction of directories in
that shared location. From what I have been told these temp dirs come and
go as long as the job runs successfully.
What I am wondering is if there is anyway to move that dir to the local
node tmp diretory not the shared file system, while it is running and if
something fails then have it sent to the appropriate place.
Or, if another layer of temp dir wrapping could be applied with
labeld perhaps with respect to the clustered job grouping and not simply
the individual jobs (since there are thousands being computed at once).
That these things would only be generated/deleted every 5 mins or 10 mins
(if clustered properly on my part) instead of one event every milli second
or what have you.
I don't know which solution is feasible or if any are at all, but this
seems to be a major problem for my WFs. In general it is never good to
have a million things coming and going on a shared file system in one
place, from my experience at least.
Thanks,
Andrew
More information about the Swift-devel
mailing list