[Swift-devel] GridFTP small-file optimizations in Swift

Michael Wilde wilde at mcs.anl.gov
Sat May 2 10:16:13 CDT 2009


This is an interesting, low-prio background discussion. Im moving it 
here to swift-devel for group benefit and comment.

Ian asked me:

"do you know if the GridFTP "lots of small files" optimizations have 
helped Swift at all"?

Mihael said:

"There are two such optimization I can think of: data channel re-use and 
pipelining of commands"

Based on this I wonder:

1) how does swift-over-cog use these two features? (ie, how much data 
gets gets batched into one channel? All the files to/from a job? All the 
files to/from a site for the duration of a workflow? Are channels cached 
  or just kept open once opened?)

2) how hard would it be to turn the feature off, so the CEDPS folks 
could get a before vs after measurement for a few workflows?

Like I said, just background discuss.

CEDPS has a review coming up; It would be great if we can run some 
experiments on real workflows to compare the benefits of these 
optimizations to the "before" case. I'd be willing to try that, time 
permitting, if I could turn it on or off.  Suggestions welcome.

- Mike

On 5/1/09 9:49 AM, Mihael Hategan wrote:
> On Fri, 2009-05-01 at 08:02 -0500, Ian Foster wrote:
>> Mihael:
>>
>> LOSF optimization refers to the support added for reducing per-file  
>> startup costs by streaming one after the other--helpful for when  
>> sending lots of small files.
> 
> Right. There are two such optimization I can think of: data channel
> re-use and pipelining of commands.
> 
>> Ian.
>>
>> On Apr 30, 2009, at 7:10 PM, Mihael Hategan wrote:
>>
>>> On Thu, 2009-04-30 at 18:58 -0500, Michael Wilde wrote:
>>>> Ian asked in IM while I was away:
>>>>
>>>> Mike, do you know if the GridFTP "lots of small files" optimizations
>>>> have helped Swift at all?
>>> 1. "lots of small files" optimizations is ambiguous.
>>> 2. I don't think we've made any clear measurements with a real-world
>>> application of anything gridftp compared to anything else gridftp.
>>> 3. I think that whatever was done to make things faster helps, so if  
>>> you
>>> want an opinion, then "yes".
>>>
>>>> --
>>>>
>>>> I think Mihael has incorporated some aspect of that optimization into
>>>> the CoG data provider and Swift, but cc him here for the right  
>>>> answer.
>>>>
>>>> - Mike
> 



More information about the Swift-devel mailing list