[Swift-devel] Re: GridFTP small-file optimizations in Swift
Mihael Hategan
hategan at mcs.anl.gov
Sat May 2 11:52:13 CDT 2009
On Sat, 2009-05-02 at 10:16 -0500, Michael Wilde wrote:
> This is an interesting, low-prio background discussion. Im moving it
> here to swift-devel for group benefit and comment.
>
> Ian asked me:
>
> "do you know if the GridFTP "lots of small files" optimizations have
> helped Swift at all"?
>
> Mihael said:
>
> "There are two such optimization I can think of: data channel re-use and
> pipelining of commands"
>
> Based on this I wonder:
>
> 1) how does swift-over-cog use these two features? (ie, how much data
> gets gets batched into one channel? All the files to/from a job? All the
> files to/from a site for the duration of a workflow? Are channels cached
> or just kept open once opened?)
Swift doesn't use pipelining. We discusses this at length on this
mailing list before.
Data channels are re-used. Clients/connections are cached based not on
jobs but on site and time (i.e. they have a maximum idle time).
>
> 2) how hard would it be to turn the feature off, so the CEDPS folks
> could get a before vs after measurement for a few workflows?
There are no flags for that at this time, but I can add some if you
want.
>
> Like I said, just background discuss.
>
> CEDPS has a review coming up; It would be great if we can run some
> experiments on real workflows to compare the benefits of these
> optimizations to the "before" case. I'd be willing to try that, time
> permitting, if I could turn it on or off. Suggestions welcome.
>
> - Mike
>
> On 5/1/09 9:49 AM, Mihael Hategan wrote:
> > On Fri, 2009-05-01 at 08:02 -0500, Ian Foster wrote:
> >> Mihael:
> >>
> >> LOSF optimization refers to the support added for reducing per-file
> >> startup costs by streaming one after the other--helpful for when
> >> sending lots of small files.
> >
> > Right. There are two such optimization I can think of: data channel
> > re-use and pipelining of commands.
> >
> >> Ian.
> >>
> >> On Apr 30, 2009, at 7:10 PM, Mihael Hategan wrote:
> >>
> >>> On Thu, 2009-04-30 at 18:58 -0500, Michael Wilde wrote:
> >>>> Ian asked in IM while I was away:
> >>>>
> >>>> Mike, do you know if the GridFTP "lots of small files" optimizations
> >>>> have helped Swift at all?
> >>> 1. "lots of small files" optimizations is ambiguous.
> >>> 2. I don't think we've made any clear measurements with a real-world
> >>> application of anything gridftp compared to anything else gridftp.
> >>> 3. I think that whatever was done to make things faster helps, so if
> >>> you
> >>> want an opinion, then "yes".
> >>>
> >>>> --
> >>>>
> >>>> I think Mihael has incorporated some aspect of that optimization into
> >>>> the CoG data provider and Swift, but cc him here for the right
> >>>> answer.
> >>>>
> >>>> - Mike
> >
More information about the Swift-devel
mailing list