[Swift-devel] Re: Another performance comparison of DOCK

Sun Apr 13 16:52:56 CDT 2008

 >> If its set right, any chance that Swift or Karajan is limiting it
 >> somewhere?
 > 2000 for sure,
 > throttle.submit=off
 > throttle.host.submit=off
 > throttle.score.job.factor=off
 > throttle.transfers=2000
 > throttle.file.operation=2000

Looks like a typo in your properties, Zhao - if the text above came from 
your swift.properties directly:

   throttle.file.operation=2000

vs operations with an s as per the properties doc:

throttle.file.operations=8
#throttle.file.operations=off

Which doesnt explain why we're seeing 100 when the default is 8 ???

- Mike

On 4/13/08 3:39 PM, Zhao Zhang wrote:
> Hi, Mike
> 
> Michael Wilde wrote:
>> Ben, your analysis sounds very good. Some notes below, including 
>> questions for Zhao.
>>
>> On 4/13/08 2:57 PM, Ben Clifford wrote:
>>>
>>>> Ben, can you point me to the graphs for this run? (Zhao's 
>>>> *99cy0z4g.log)
>>>
>>> http://www.ci.uchicago.edu/~benc/report-dock2-20080412-1609-99cy0z4g
>>>
>>>> Once stage-ins start to complete, are the corresponding jobs 
>>>> initiated quickly, or is Swift doing mostly stage-ins for some period?
>>>
>>> In the run dock2-20080412-1609-99cy0z4g, jobs are submitted (to 
>>> falkon) pretty much right as the corresponding stagein completes. I 
>>> have no deeper information about when the worker actually starts to run.
>>>
>>>> Zhao indicated he saw data indicating there was about a 700 second 
>>>> lag from
>>>> workflow start time till the first Falkon jobs started, if I understood
>>>> correctly. Do the graphs confirm this or say something different?
>>>
>>> There is a period of about 500s or so until stuff starts to happen; I 
>>> haven't looked at it. That is before stage-ins start too, though, 
>>> which means that i think this...
>>>
>>>> If the 700-second delay figure is true, and stage-in was eliminated 
>>>> by copying
>>>> input files right to the /tmp workdir rather than first to /shared, 
>>>> then we'd
>>>> have:
>>>>
>>>> 1190260 / ( 1290 * 2048 ) = .45 efficiency
>>>
>>> calculation is not meaningful.
>>>
>>> I have not looked at what is going on during that 500s startup time, 
>>> but I plan to.
>>
>> Zhao, what SVN rev is your Swift at?  Ben fixed an N^2 mapper logging 
>> problem a few weeks ago. Could that cause such a delay, Ben? It would 
>> be very obvious in the swift log.
> The version is Swift svn swift-r1780 cog-r1956
>>
>>>
>>>> I assume we're paying the same staging price on the output side?
>>>
>>> not really - the output stageouts go very fast, and also because job 
>>> ending is staggered, they don't happen all at once.
>>>
>>> This is the same with most of the large runs I've seen (of any 
>>> application) - stageout tends not to be a problem (or at least, no 
>>> where near the problems of stagein).
>>>
>>> All stageins happen over a period t=400 to t=1100 fairly smoothly. 
>>> There's rate limiting still on file operations (100 max) and file 
>>> transfers (2000 max) which is being hit still.
>>
>> I thought Zhao set file operations throttle to 2000 as well.  Sounds 
>> like we can test with the latter higher, and find out what's limiting 
>> the former.
>>
>> Zhao, what are your settings for property throttle.file.operations?
>> I assume you have throttle.transfers set to 2000.
>>
>> If its set right, any chance that Swift or Karajan is limiting it 
>> somewhere?
> 2000 for sure,
> throttle.submit=off
> throttle.host.submit=off
> throttle.score.job.factor=off
> throttle.transfers=2000
> throttle.file.operation=2000
>>>
>>> I think there's two directions to proceed in here that make sense for 
>>> actual use on single clusters running falkon (rather than trying to 
>>> cut out stuff randomly to push up numbers):
>>>
>>>  i) use some of the data placement features in falkon, rather than 
>>> Swift's
>>>     relatively simple data management that was designed more for running
>>>     on the grid.
>>
>> Long term: we should consider how the Coaster implementation could 
>> eventually do a similar data placement approach. In the meantime (mid 
>> term) examining what interface changes are needed for Falkon data 
>> placement might help prepare for that. Need to discuss if that would 
>> be a good step or not.
>>
>>>
>>>  ii) do stage-ins using symlinks rather than file copying. this makes
>>>      sense when everything is living in a single filesystem, which again
>>>      is not what Swift's data management was originally optimised for.
>>
>> I assume you mean symlinks from shared/ back to the user's input files?
>>
>> That sounds worth testing: find out if symlink creation is fast on NFS 
>> and GPFS.
>>
>> Is another approach to copy direct from the user's files to the /tmp 
>> workdir (ie wrapper.sh pulls the data in)? Measurement will tell if 
>> symlinks alone get adequate performance. Symlinks do seem an easier 
>> first step.
>>
>>> I think option ii) is substantially easier to implement (on the order 
>>> of days) and is generally useful in the single-cluster, 
>>> local-source-data situation that appears to be what people want to do 
>>> for running on the BG/P and scicortex (that is, pretty much ignoring 
>>> anything grid-like at all).
>>
>> Grid-like might mean pulling data to the /tmp workdir directly by the 
>> wrapper - but that seems like a harder step, and would need 
>> measurement and prototyping of such code before attempting. Data 
>> transfer clients that the wrapper script can count on might be an 
>> obstacle.
>>
>>>
>>> Option i) is much harder (on the order of months), needing a very 
>>> different interface between Swift and Falkon than exists at the moment.
>>>
>>>
>>>
>>