[Swift-devel] GridFTP small-file optimizations in Swift
Michael Wilde
wilde at mcs.anl.gov
Sat May 2 10:16:13 CDT 2009
This is an interesting, low-prio background discussion. Im moving it
here to swift-devel for group benefit and comment.
Ian asked me:
"do you know if the GridFTP "lots of small files" optimizations have
helped Swift at all"?
Mihael said:
"There are two such optimization I can think of: data channel re-use and
pipelining of commands"
Based on this I wonder:
1) how does swift-over-cog use these two features? (ie, how much data
gets gets batched into one channel? All the files to/from a job? All the
files to/from a site for the duration of a workflow? Are channels cached
or just kept open once opened?)
2) how hard would it be to turn the feature off, so the CEDPS folks
could get a before vs after measurement for a few workflows?
Like I said, just background discuss.
CEDPS has a review coming up; It would be great if we can run some
experiments on real workflows to compare the benefits of these
optimizations to the "before" case. I'd be willing to try that, time
permitting, if I could turn it on or off. Suggestions welcome.
- Mike
On 5/1/09 9:49 AM, Mihael Hategan wrote:
> On Fri, 2009-05-01 at 08:02 -0500, Ian Foster wrote:
>> Mihael:
>>
>> LOSF optimization refers to the support added for reducing per-file
>> startup costs by streaming one after the other--helpful for when
>> sending lots of small files.
>
> Right. There are two such optimization I can think of: data channel
> re-use and pipelining of commands.
>
>> Ian.
>>
>> On Apr 30, 2009, at 7:10 PM, Mihael Hategan wrote:
>>
>>> On Thu, 2009-04-30 at 18:58 -0500, Michael Wilde wrote:
>>>> Ian asked in IM while I was away:
>>>>
>>>> Mike, do you know if the GridFTP "lots of small files" optimizations
>>>> have helped Swift at all?
>>> 1. "lots of small files" optimizations is ambiguous.
>>> 2. I don't think we've made any clear measurements with a real-world
>>> application of anything gridftp compared to anything else gridftp.
>>> 3. I think that whatever was done to make things faster helps, so if
>>> you
>>> want an opinion, then "yes".
>>>
>>>> --
>>>>
>>>> I think Mihael has incorporated some aspect of that optimization into
>>>> the CoG data provider and Swift, but cc him here for the right
>>>> answer.
>>>>
>>>> - Mike
>
More information about the Swift-devel
mailing list