[Swift-devel] lots of very small files vs gridftp

Mihael Hategan hategan at mcs.anl.gov
Tue Sep 30 00:23:39 CDT 2008


On Mon, 2008-09-29 at 20:34 -0500, Ian Foster wrote:
> Remind me again why we aren't just using TAR and GridFTP?

I don't think we're using anything at this point, hence all the testing
and exploring.

I think there is some complexity in figuring out dynamically what
exactly to tar up and how to untar on the remote site. So more (or more
complex) code than the other choice.

But I'm not otherwise opposed to anything in particular. I suppose
taring/untaring could be done manually, at the expense of messing the
abstractness of swift.

> 
> Ian.
> 
> On Sep 29, 2008, at 6:37 PM, Mihael Hategan wrote:
> 
> > On Thu, 2008-09-25 at 13:10 +0000, Ben Clifford wrote:
> >
> >> To transfer 1000 files:
> >>
> >>   # concurrent conncetions  |   duration of copy (seconds, multiple  
> >> runs)
> >>                       16          7, 16, 16
> >>                        4         14, 14, 14
> >>                        2         26, 25
> >>                        1         48, 52
> >>
> >
> > I tried a similar experiment, this time with the java libraries, to  
> > see
> > how that works.
> >
> > The setup was transfer 1024 files of 1024 bytes each with parallelism
> > (at the karajan level, though this should cause corresponding gridftp
> > connection parallelism) of 1 to 16 in powers of 2.
> >
> > I got this for Ranger (in ms):
> > 1: 242030
> > 2: 121916
> > 4: 61787
> > 8: 31903
> > 16: died (probably trying to start too many connections concurrently)
> >
> > Then UC:
> > 1: 212192
> > 2: 106872
> > 4: 54790
> > 8: 28838
> > 16: 18166
> >
> > Then I made a quick file provider for coasters, which sends the data
> > over the same connection (and upped the parallelism):
> > UC-coaster
> > 1: 102624
> > 2: 31388
> > 4: 18042
> > 8: 8823
> > 16: 5510
> > 32: 5053
> > 64: 6686
> > 128: 5551
> >
> > Then I ran the same, but instead of transferring to a nfs directory,
> > things went to /dev/null:
> > 1: 93997
> > 2: 35694
> > 4: 16269
> > 8: 7349
> > 16: 4462
> > 32: 1865
> > 64: 1332
> > 128: 1304
> >
> > I suppose the bad speed with coasters is because things go up on an
> > encrypted connection, but it may be something else.
> >
> > So otherwise, if files are small, one can look at this as the task of
> > sending (acknowledged) messages from one side to the other, where the
> > communication lag is the problem and the way to solve it is by
> > increasing parallelism (which essentially is what tarring things up
> > does). That and whatever FS limitations the remote side has.
> >
> > Mihael
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 




More information about the Swift-devel mailing list