[Swift-devel] Re: GridFTP small-file optimizations in Swift

Mihael Hategan hategan at mcs.anl.gov
Sat May 2 11:52:13 CDT 2009


On Sat, 2009-05-02 at 10:16 -0500, Michael Wilde wrote:
> This is an interesting, low-prio background discussion. Im moving it 
> here to swift-devel for group benefit and comment.
> 
> Ian asked me:
> 
> "do you know if the GridFTP "lots of small files" optimizations have 
> helped Swift at all"?
> 
> Mihael said:
> 
> "There are two such optimization I can think of: data channel re-use and 
> pipelining of commands"
> 
> Based on this I wonder:
> 
> 1) how does swift-over-cog use these two features? (ie, how much data 
> gets gets batched into one channel? All the files to/from a job? All the 
> files to/from a site for the duration of a workflow? Are channels cached 
>   or just kept open once opened?)

Swift doesn't use pipelining. We discusses this at length on this
mailing list before.

Data channels are re-used. Clients/connections are cached based not on
jobs but on site and time (i.e. they have a maximum idle time).

> 
> 2) how hard would it be to turn the feature off, so the CEDPS folks 
> could get a before vs after measurement for a few workflows?

There are no flags for that at this time, but I can add some if you
want.

> 
> Like I said, just background discuss.
> 
> CEDPS has a review coming up; It would be great if we can run some 
> experiments on real workflows to compare the benefits of these 
> optimizations to the "before" case. I'd be willing to try that, time 
> permitting, if I could turn it on or off.  Suggestions welcome.
> 
> - Mike
> 
> On 5/1/09 9:49 AM, Mihael Hategan wrote:
> > On Fri, 2009-05-01 at 08:02 -0500, Ian Foster wrote:
> >> Mihael:
> >>
> >> LOSF optimization refers to the support added for reducing per-file  
> >> startup costs by streaming one after the other--helpful for when  
> >> sending lots of small files.
> > 
> > Right. There are two such optimization I can think of: data channel
> > re-use and pipelining of commands.
> > 
> >> Ian.
> >>
> >> On Apr 30, 2009, at 7:10 PM, Mihael Hategan wrote:
> >>
> >>> On Thu, 2009-04-30 at 18:58 -0500, Michael Wilde wrote:
> >>>> Ian asked in IM while I was away:
> >>>>
> >>>> Mike, do you know if the GridFTP "lots of small files" optimizations
> >>>> have helped Swift at all?
> >>> 1. "lots of small files" optimizations is ambiguous.
> >>> 2. I don't think we've made any clear measurements with a real-world
> >>> application of anything gridftp compared to anything else gridftp.
> >>> 3. I think that whatever was done to make things faster helps, so if  
> >>> you
> >>> want an opinion, then "yes".
> >>>
> >>>> --
> >>>>
> >>>> I think Mihael has incorporated some aspect of that optimization into
> >>>> the CoG data provider and Swift, but cc him here for the right  
> >>>> answer.
> >>>>
> >>>> - Mike
> > 




More information about the Swift-devel mailing list