[Fwd: Re: [Swift-devel] Re: swift-falkon problem... plots to explain plateaus...]

Michael Wilde wilde at mcs.anl.gov
Tue Mar 25 00:28:44 CDT 2008


I eyeballed the wrapperlogs to get a rough idea of what was happening.

I ran with wrapperlog saving and no other changes for wf's of 10, 100 
and 500 jobs, to see how the exec time grew.  At 500 jobs it grew to 
about 30+ seconds for a core app exec time of about 1 sec. (Im just 
recollecting the times as at this point I didnt write much down).

First results showed more time spent in the app wrapper than in 
wrapper.sh.  I remedied this by using /tmp as the app-wrapper's working 
dir, and caching the app binary on /tmp.  This brought a 20+ sec app 
exec time down to about 3 seconds.

With this fixed, the total time in wrapper.sh including the app is now 
about 15 seconds, with 3 being in the app-wrapper itself. The time seems 
about evenly spread over the several wrapper.sh operations, which is not 
surprising when 500 wrappers hit NFS all at once.

I then tried 3 more tests:
- a run to see if the app-executable caching on /tmp had an effect
   (it didnt)
- a run to see if turning of wrapperlog retrieval had an effect
- a run with data operation throttles (both) set to 100 from 10

None of these last three things had a significant effect.

Tomorrow I will try some mods to the wrapper script. Turning off wrapper 
logging in a previous trial yesterday *seemed* to shave 20-30% off the 
run time.  I need to verify this.

I'm also going to try to use /tmp for the jobdir and reduce wrapper.sh 
overhead; also will leave the (tiny) job output on /tmp for later 
aggregation (will have some swift questions on that).

Ben, if you want to look at any of these logs, the runs are in 
swift-logs/wilde in the order described above (w/comment files):

346: 10 job workflow
347: 100 job wf
348: 500 job wf
349: 500 jobs w/ improved app-wrapper
350: 500 jobs w/ improved app-wrapper & executable on /tmp
351: 500 jobs, wrapperlog saving off
352: 500 jobs, wrapperlog saving off, data throttles at 100 (from 20)

All but the first of these should have falkon logs saved as well.

I have several ideas on how to proceed, but welcome advice and any 
discoveries from log analysis.

Thanks,

Mike


On 3/24/08 10:15 PM, Michael Wilde wrote:
> Ben, do you have a script to sum the time spent per step of wrapper.sh, 
> over a set in -info files?
> 
> On 3/24/08 6:36 PM, Ben Clifford wrote:
>> On Mon, 24 Mar 2008, Mihael Hategan wrote:
>>
>>> As far as I can remember, Ben added fairly comprehensive logging to the
>>> wrapper. That may shed some light on the issue.
>>
>> Indeed I did; and that logging information can be sent back to the 
>> submit host by enabling wrapperlog.always.transfer=true
>>
> 



More information about the Swift-devel mailing list