[Swift-devel] lots of very small files vs gridftp
    Ben Clifford 
    benc at hawaga.org.uk
       
    Tue Sep 30 19:02:05 CDT 2008
    
    
  
On Tue, 30 Sep 2008, Mihael Hategan wrote:
> But I'm not otherwise opposed to anything in particular. I suppose
> taring/untaring could be done manually, at the expense of messing the
> abstractness of swift.
I played some making Swift do tar/untar of stageins automatically (so no 
modifications are needed to the SwiftScript code).
Theres a plot here 
http://www.ci.uchicago.edu/~benc/tmp/report-fakecnari-20080930-1820-0nmtamxg/
Basically the first 600s are taken up allocating coaster workers, and the 
remaining time uses quite a lot of cores at once. So the total duration of 
run doesn't seem that different; but I think that the behaviour as number 
of jobs increases will be better- the 600s startup is a fixed cost (which 
I also think can be massively reduced in a couple of ways) and the bit 
that is proportional to the number of jobs is the remaining three hundred 
seconds.
This is a fairly dirty hack - there's no clustering for stageouts; there 
is fairly crude decision of whether to cluster transfers or not 
(basically, queue file transfers for 30s and after that, if there's more 
than one, make a cluster).
The initial startup is slow, I think, because the initial startup of 
coaster workers is done based on a malformed job submission caused by the 
low quality of this clustering code - it doesn't pass through the 
coastersPerNode parameter for initial jobs so the initial coaster worker 
is very slow.
-- 
    
    
More information about the Swift-devel
mailing list