[Swift-user] Deleting no longer necessary anonymous files in _concurrent

Mihael Hategan hategan at mcs.anl.gov
Tue Sep 7 13:53:16 CDT 2010


On Tue, 2010-09-07 at 12:22 -0600, Matthew Woitaszek wrote:
[...]
> 
> > There is "beta" functionality in the Swift trunk to directly utilize a local
> > filesystem (that at least two applications are using).  If there is a
> > "scratch" filesystem that you can use, I can direct you to that.
> 
> By this, do you mean a something like a node-local scratch system,
> where files could be staged directly from _concurrent to a node
> instead of a "site", or is it something else?
> 
> If node-local, I fear that might be a step backwards for our
> application. In our case, the staging time vs. capacity tradeoff is
> becoming quite problematic. On one hand, I really only want to keep
> one copy of everything (_concurrent), but limiting the amount of
> storage on the a site might increase staging, which negates the
> parallelism, so I'm back to prefering a big site cache to minimize
> that.

The data, intermediate or not, has to be at least in one place.
The stable/traditional version of swift tends to have at least 3 copies
of each piece of data:
- on the client (1)
- on the shared fs of a target cluster (2)
- on the compute node (3)

(3) is arguable. One can run apps using data directly from (2). However,
it's been our experience that, due to the way SFSes work, copying the
data to the compute node yields better performance in most cases
(actually pretty much all cases we've measured). This may not
necessarily apply to your case, and we'd like to hear if that's the
case. You can switch between the two behaviors by specifying an
additional <scratch> directory in sites.xml. If that's there, (3)
applies. If not symlinks to (2) are used instead. I'll call this issue
(A).

Stuff we're working on currently includes bypassing (2) and copying data
directly between (1) and (3). It turns out that shared file systems are
pretty poor when it comes to parallelism, due to distributed
consistencies they have to enforce. However, given that in swift all
data is single-assignment (which translates into files being written at
most once), most of the problems that SFSes need to deal with don't
really exist, but there is no way to tell them that. So we've got some
prototypes there. At least on the BG/P we get clear (a few times)
performance improvements if we do (1) <-> (3).

Ideally we would also want to bypass (3) -> (1) -> (3) for intermediate
data, since we can do (3) -> (3) instead. This is something Justin has
been working on, I believe on single clusters. I'd personally like to
see it working between multiple clusters, too.

> 
> Is there a way to get tasks to read/write directly out of _concurrent
> without the staging to the remote site at all? I suspect the answer is
> "no" due to your description of _concurrent's importance as the
> permanent file system and its use in staging to site file systems. But
> in our case, we're coincidentally at one site, so the big GPFS scratch
> file system area ends up holding both _concurrent as well as the swift
> site temporary directory in different paths.

It is possible, but not currently there. Again, issue (A) may apply
here, so provider.staging/sfs may be better.

Mihael




More information about the Swift-user mailing list