[Swift-user] Data transfer error

Mihael Hategan hategan at mcs.anl.gov
Fri May 30 14:06:17 CDT 2014


On Fri, 2014-05-30 at 16:24 +0000, Bronevetsky, Greg wrote:
> I just ran a test where I varied <profile namespace="globus"
> key="maxwalltime"> between 1 and 10 minutes. At 1 it gave me errors
> and for larger values it did not. So, assuming that this is the true
> root cause, how can I resolve it? I can use node-local storage as my
> <workdirectory>. However, when I run my real workload, I'm still
> getting errors even if I use node-local storage.

Assuming you used <scratch>/local/disk</scratch>, there is still some
load on the shared filesystem since swift still needs to copy data from
it to the scratch directory and back.

The only true way of avoiding the shared FS is with provider staging
enabled, and having both the swift run directory and the workdirectory
on local disk.

>  I'm still following up with our file systems folks but the key issue
> appears to be the large number of meta-data operations that are sent
> at the shared file system (Lustre or NFS here). Is there a way to
> reduce that or at least measure it so that I can tell our admins
> exactly the throughput I need?

This is hard to quantify. It is possible to measure the rate of I/O
requests using strace, and recent versions of swift have some flags that
allow you to strace the worker and its sub-processes.

The actual bandwidth, I don't know. Perhaps iotop or something like it,
but I have never personally used it to measure disk bandwidth with swift
apps on shared FSs.

Mihael




More information about the Swift-user mailing list