[Swift-devel] lots of very small files vs gridftp
Ben Clifford
benc at hawaga.org.uk
Tue Sep 30 19:02:05 CDT 2008
On Tue, 30 Sep 2008, Mihael Hategan wrote:
> But I'm not otherwise opposed to anything in particular. I suppose
> taring/untaring could be done manually, at the expense of messing the
> abstractness of swift.
I played some making Swift do tar/untar of stageins automatically (so no
modifications are needed to the SwiftScript code).
Theres a plot here
http://www.ci.uchicago.edu/~benc/tmp/report-fakecnari-20080930-1820-0nmtamxg/
Basically the first 600s are taken up allocating coaster workers, and the
remaining time uses quite a lot of cores at once. So the total duration of
run doesn't seem that different; but I think that the behaviour as number
of jobs increases will be better- the 600s startup is a fixed cost (which
I also think can be massively reduced in a couple of ways) and the bit
that is proportional to the number of jobs is the remaining three hundred
seconds.
This is a fairly dirty hack - there's no clustering for stageouts; there
is fairly crude decision of whether to cluster transfers or not
(basically, queue file transfers for 30s and after that, if there's more
than one, make a cluster).
The initial startup is slow, I think, because the initial startup of
coaster workers is done based on a malformed job submission caused by the
low quality of this clustering code - it doesn't pass through the
coastersPerNode parameter for initial jobs so the initial coaster worker
is very slow.
--
More information about the Swift-devel
mailing list