[Swift-devel] lots of very small files vs gridftp

Ben Clifford benc at hawaga.org.uk
Tue Sep 30 19:02:05 CDT 2008


On Tue, 30 Sep 2008, Mihael Hategan wrote:

> But I'm not otherwise opposed to anything in particular. I suppose
> taring/untaring could be done manually, at the expense of messing the
> abstractness of swift.

I played some making Swift do tar/untar of stageins automatically (so no 
modifications are needed to the SwiftScript code).

Theres a plot here 
http://www.ci.uchicago.edu/~benc/tmp/report-fakecnari-20080930-1820-0nmtamxg/

Basically the first 600s are taken up allocating coaster workers, and the 
remaining time uses quite a lot of cores at once. So the total duration of 
run doesn't seem that different; but I think that the behaviour as number 
of jobs increases will be better- the 600s startup is a fixed cost (which 
I also think can be massively reduced in a couple of ways) and the bit 
that is proportional to the number of jobs is the remaining three hundred 
seconds.


This is a fairly dirty hack - there's no clustering for stageouts; there 
is fairly crude decision of whether to cluster transfers or not 
(basically, queue file transfers for 30s and after that, if there's more 
than one, make a cluster).

The initial startup is slow, I think, because the initial startup of 
coaster workers is done based on a malformed job submission caused by the 
low quality of this clustering code - it doesn't pass through the 
coastersPerNode parameter for initial jobs so the initial coaster worker 
is very slow.

-- 



More information about the Swift-devel mailing list