[Swift-devel] transfers of small files

Mihael Hategan hategan at mcs.anl.gov
Wed Nov 28 18:31:58 CST 2007


On Wed, 2007-11-28 at 18:24 -0600, Ian Foster wrote:
> Mihael:
> 
> It isn't clear to me--are you using the "lots of small files" 
> optimization here?

It depends what you mean by "lots of small files optimization".
Obviously this is an optimization for the lots of small files case.

I'm re-using clients with mode E and only sending PASV once per client.
Let's call this A. There was word of "pipelining". We'll call that B. I
assume it to be different from what I did (A) for the following reasons:
1. Jarek had tests for A in JGlobus, so A is not a new deal.
2. Buzz recently committed some code to JGlobus to enable B, which
assumes B was not possible before, therefore B != A.

> 
> I've CCed John Bresnahan so he can comment.
> 
> Ian.
> 
> Mihael Hategan wrote:
> > So I've been playing with that issue. I've made some measurements
> > outside Swift. Here's a summary:
> >
> > 32k files. From terminable to tg-uc
> >
> > 1 - karajan with connection caching. transfers in parallel. tops at
> > 200KB/s
> >
> > 2 - n*globus-url-copy - With 32 parallel transfers it starts failing and
> > gets about 10KB/s
> >
> > 3 - globus-url-copy with a list of files: around 300KB/s
> >
> > 4 - globus-url-copy with a list of files, E mode, and data channel
> > re-use: 500KB/s
> >
> > So I figured I should hack the GridFTP provider to re-use data channels
> > by default. This is where it gets strange. I get averages (over multiple
> > runs) of over 1MB/s, with mins of about 130KB and max of 1.9MB/s, but
> > with a lot of variability. I'll debug this. However, I think there is
> > still value in enabling this by default.
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >   
> 




More information about the Swift-devel mailing list