[Swift-devel] Re: [ci.uchicago.edu #349] Clustering and Temp Dirs with Swift (fwd)

Sat Oct 27 11:36:54 CDT 2007

---------- Forwarded message ----------
Date: Fri, 26 Oct 2007 17:00:33 -0500 (CDT)
From: "Greg Cross (CI) via RT" <teraport-support at ci.uchicago.edu>
To: andrewj at uchicago.edu
Subject: Re: [ci.uchicago.edu #349] Clustering and Temp Dirs with Swift

You're correct that a lot of localized traffic on GPFS can cause
noticeable degradations in performance.  The question is, though, are
you asking to have scratch space available on all nodes available at
any arbitrary time, or if you want to instrument your workflow to
copy the data to a scratch space as part of your job.  It's difficult
to group filesystem operations on a per-job basis.  We do not allow
access to compute nodes that aren't assigned to an active job for a
given user.

In theory, you could specify your job to request a group of nodes at
once.  You could then disperse the data accordingly and instrument a
wrapper script to run your single-node tasks in a parallel fashion.
You could write the wrapper script such that it uses the same job to
run many tasks serially.  (On top of this, the parallel and serial
techniques could be combined.)

There are environmental variables that are coupled with job
submissions that can be used to identify hostnames for a multi-node
job.  This may not be ideal or easy to instrument in a script, but if
it's feasible, I'd suggest submitting an interactive job for multiple
nodes, and you can test and debug any wrapper scripts accordingly.

-- Greg

On Fri 26 Oct 2007, at 13:59, Andrew Jamieson via RT wrote:

>
> Fri Oct 26 13:59:56 2007: Request 349 was acted upon.
> Transaction: Ticket created by andrewj at uchicago.edu
>        Queue: General
>      Subject: Clustering and Temp Dirs with Swift
>        Owner: Nobody
>   Requestors: andrewj at uchicago.edu
>       Status: new
>  Ticket <URL: http://teraport-support.ci.uchicago.edu/Ticket/
> Display.html?id=349 >
>
>
> Hello all,
>
>    I am encountering the following problem on Teraport.  I submit a
> clustered swift WF which should amount to something on the order of
> 850x3
> individual jobs total. I have clustered the jobs because they are very
> fast (somewhere around 20 sec to 1 min long).  When I submit the WF
> on TP
> things start out fantastic, I get 10s of output files in a matter of
> seconds and nodes would start and finish clustered batches in a
> matter of
> minutes or less. However, after waiting about 3-5 mins, when clustered
> jobs are begin to line up in the queue and more start running at
> the same
> time, things start to slow down to a trickle in terms of output.
>
> One thing I noticed is when I try a simply ls on TP in the swift temp
> running directory where the temp job dirs are created and
> destroyed, it
> take a very long time.  And when it is done only five or so things
> are in
> the dir. (this is the dir with "info  kickstart  shared  status
> wrapper.log" in it).  What I think is happening is that TP's
> filesystem
> cant handle this extremely rapid creation/destruction of
> directories in
> that shared location. From what I have been told these temp dirs
> come and
> go as long as the job runs successfully.
>
> What I am wondering is if there is anyway to move that dir to the
> local
> node tmp diretory not the shared file system, while it is running
> and if
> something fails then have it sent to the appropriate place.
>
> Or, if another layer of temp dir wrapping could be applied with
> labeld perhaps with respect to the clustered job grouping and not
> simply
> the individual jobs (since there are thousands being computed at
> once).
> That these things would only be generated/deleted every 5 mins or
> 10 mins
> (if clustered properly on my part) instead of one event every milli
> second
> or what have you.
>
> I don't know which solution is feasible or if any are at all, but this
> seems to be a major problem for my WFs.  In general it is never
> good to
> have a million things coming and going on a shared file system in one
> place, from my experience at least.
>
>
> Thanks,
> Andrew
>