[Swift-devel] lots of very small files vs gridftp

Mihael Hategan hategan at mcs.anl.gov
Mon Sep 29 18:37:50 CDT 2008


On Thu, 2008-09-25 at 13:10 +0000, Ben Clifford wrote:

> To transfer 1000 files:
> 
>    # concurrent conncetions  |   duration of copy (seconds, multiple runs)
>                        16          7, 16, 16
>                         4         14, 14, 14
>                         2         26, 25
>                         1         48, 52
> 

I tried a similar experiment, this time with the java libraries, to see
how that works.

The setup was transfer 1024 files of 1024 bytes each with parallelism
(at the karajan level, though this should cause corresponding gridftp
connection parallelism) of 1 to 16 in powers of 2.

I got this for Ranger (in ms):
1: 242030
2: 121916
4: 61787
8: 31903
16: died (probably trying to start too many connections concurrently)

Then UC:
1: 212192
2: 106872
4: 54790
8: 28838
16: 18166

Then I made a quick file provider for coasters, which sends the data
over the same connection (and upped the parallelism):
UC-coaster
1: 102624
2: 31388
4: 18042
8: 8823
16: 5510
32: 5053
64: 6686
128: 5551

Then I ran the same, but instead of transferring to a nfs directory,
things went to /dev/null:
1: 93997
2: 35694
4: 16269
8: 7349
16: 4462
32: 1865
64: 1332
128: 1304

I suppose the bad speed with coasters is because things go up on an
encrypted connection, but it may be something else.

So otherwise, if files are small, one can look at this as the task of
sending (acknowledged) messages from one side to the other, where the
communication lag is the problem and the way to solve it is by
increasing parallelism (which essentially is what tarring things up
does). That and whatever FS limitations the remote side has.

Mihael




More information about the Swift-devel mailing list