[Fwd: Re: [Swift-devel] Re: swift-falkon problem... plots to explain plateaus...]

Michael Wilde wilde at mcs.anl.gov
Tue Mar 25 08:44:40 CDT 2008


I did runs the day before with a modified wrapper that bypassed the INFO 
logging. It saved a good amount - I recall about 30% but need to 
re-check the numbers.

Yes, I came to the same conclusion on the mkdirs.  Im looking at 
reducing these, likely moving the jobdir to /tmp.  I think I can do that 
within the current structure.  wrapper.sh is ver clear and nicely 
written. (Ben: yes, eyeballing the log #s was easy and no problem).

First thing I want to do, though, is run some large scale tests on our 
two science workflows, increasing the petro-modelling one (the 
sub-second application) to a larger runtime through app-level batching.

Zhao's latest test indicate that if we do batches of 40, bringing the 
jobs from .5 sec to 20 sec, we can saturate the BGP's 4K cores and keep 
it running efficiently. Given the extra wrapper.sh overhead, I might 
need to increase that another 10X, but once the app is wrapped in a 
loop, it makes little difference to the user how big we make that.

The other app is a molecule-docking app, that can be batched similarly.

Once we get those running nicely at a larger, less brutal job time, I'll 
come back to wrapper.sh tuning.  If you or Ben want to do this in the 
meantime, though, that would be great.  We have the use-local-disk 
scenario on our development stack anyways - this would be a good time to 
do it.  If I do it, it will be only a prototype for measurement purposes.

Mike




On 3/25/08 8:34 AM, Mihael Hategan wrote:
> On Tue, 2008-03-25 at 08:16 -0500, Michael Wilde wrote:
>> On 3/25/08 3:31 AM, Mihael Hategan wrote:
>>> On Tue, 2008-03-25 at 00:28 -0500, Michael Wilde wrote:
>>>> I eyeballed the wrapperlogs to get a rough idea of what was happening.
>>>>
>>>> I ran with wrapperlog saving and no other changes for wf's of 10, 100 
>>>> and 500 jobs, to see how the exec time grew.  At 500 jobs it grew to 
>>>> about 30+ seconds for a core app exec time of about 1 sec. (Im just 
>>>> recollecting the times as at this point I didnt write much down).
>>>>
>>> I would personally like to see those logs.
>> I listed all the runs in the previous mail (below), Mihael. They are on 
>> CI NFS at ~benc/swift-logs/wilde/run{345-350}.
> 
> Sorry about that.
> 
>>  Let us know what you find.
>>
> 
> It looks like this:
> - 5 seconds between LOG_START and CREATE_JOBDIR. Likely hogs:
> mkdir -p $WFDIR/info/$JOBDIR
> mkdir -p $WFDIR/status/$JOBDIR
> and the creation of the info file.
> - 2.5 seconds between CREATE_JOBDIR and CREATE_INPUTDIR. Likely problem:
> mkdir -p $DIR
> (on a very fuzzy note, if one mkdir takes 2.5 seconds, two will take 5,
> which seems to roughly fit the observed numbers).
> - 3.5 seconds for COPYING_OUTPUTS
> - 2.5 seconds for RM_JOBDIR
> 
> I'd be curious to know how much of the time is actually spent writing to
> the logs. That's because I see one second between EXECUTE_DONE and
> COPYING_OUTPUTS, a place where the only meaningful things that are done
> are two log messages.
> 
> Perhaps it may be useful to run the whole thing through strace -T.
> 
> Mihael
> 
> 



More information about the Swift-devel mailing list