[Swift-user] Data transfer error

Mihael Hategan hategan at mcs.anl.gov
Fri May 30 17:22:00 CDT 2014


On Fri, 2014-05-30 at 20:14 +0000, Bronevetsky, Greg wrote:
> The issues I'm running into seem more related to metadata operations
> since in Lustre the metadata server is not distributed. When I used 10
> or 20 nodes I was generating thousands of file opens per second, which
> Lustre cannot deal with. Even when I use node-local storage as scratch
> I still get timeouts. Is there a way to just track metadata
> operations?

strace with the appropriate syscalls selected.

You'd probably want to do this for a single app, since the strace logs
can be large.

> 
> 
> 	The only true way of avoiding the shared FS is with provider staging enabled, and having both the swift run directory and the workdirectory on local disk.
> 
> Does this mean that I'd only be able to do single-node runs or is there a way to shuttle data between the node-local storage of different nodes?

Swift does that automatically, although it stages files back to and from
the swift client node*. It is no different in bandwidth consumption than
"staging" files to the shared FS storage nodes, but since posix
semantics don't need to be enforced, metadata operations are
significantly faster.

If local (ram) disk space on the computed nodes isn't an issue, this
scheme significantly improves performance, especially at higher scales.

Mihael

(*) provider.staging.pin.files=true provides additional caching on the
compute nodes which can further improve things if multiple apps need the
same input file(s).





More information about the Swift-user mailing list