[Fwd: Re: [Swift-devel] Re: swift-falkon problem... plots to explain plateaus...]

Ioan Raicu iraicu at cs.uchicago.edu
Mon Mar 24 12:36:04 CDT 2008



Michael Wilde wrote:
> > Now the real question is, what is the breakdown of the 100 sec
> > invocation (108.645 sec on average to be exact), how much is due to
> > wrapper.sh, and how much is due to the application itself?  Mike, can
> > you comment on this?  I assume you are running amiga which should have
> > 0.5 sec jobs, right?
>
> Amiga is about .5 secs and teh script that runs (runam3) I think adds 
> another .5 secs (from a quick scan of falkon logs on the actual task 
> run time - but please verify, I think you have all the data from the 
> task log).
The log with 1000 tasks, the shortest job was 72 secs, average 108, and 
max 170 sec.  Is amiga working from RAM, or is it from NFS?  If its from 
NFS, how big is the input data and script?  I thought it was about 
10KB?  The overall throughput was 6.6 jobs/sec, so that is only 66KB/s, 
which seems quite small, assuming that each read is done in large 
chunks, and not a few bytes at a time. 
>
> I suspect, as you and I both agree, that hundreds of short jobs 
> starting in some small interval causes heavy NFS activity. 
Yes, but is the NFS activity due to the app, or due to wrapper.sh?

I would replace the amiga app with a sleep 0.5, or sleep 1, just to see 
if the graph looks much different or not.  That will surely isolate the 
overhead from your app or wrapper.sh.

Ioan
> The next round of testing we'll do should start to pick this apart, 
> determine causes and prototype improvements.
>
> - Mike
>
>
> On 3/24/08 11:52 AM, Ioan Raicu wrote:
>> Not sure if this email made it to the mailing list, due to the larger 
>> size (128KB)...
>>
>> Ioan
>>
>> ------------------------------------------------------------------------
>>
>> Subject:
>> Re: [Swift-devel] Re: swift-falkon problem... plots to explain 
>> plateaus...
>> From:
>> Ioan Raicu <iraicu at cs.uchicago.edu>
>> Date:
>> Mon, 24 Mar 2008 11:48:16 -0500
>> To:
>> Ben Clifford <benc at hawaga.org.uk>
>>
>> To:
>> Ben Clifford <benc at hawaga.org.uk>
>> CC:
>> swift-devel <swift-devel at ci.uchicago.edu>
>>
>>
>> .OK, here is my analysis of the plateaus, from Falkon's point of view.
>>
>> Notice the per task execution (green) is about 100 seconds per job, 
>> where the job is some invocation of the wrapper.sh that Swift sent to 
>> Falkon.  Things look normal so far.  See the 2nd graph for more...
>>
>>
>> This shows that there are 600 workers (600 CPUs), which all get their 
>> work within 10 seconds... then they all churn away until about 100 
>> sec when jobs start completing, and new ones get dispatched.  At 
>> around 132 seconds, the wait queue is empty, and some workers start 
>> becoming idle (the red area)... by time 155, the initial 600 jobs 
>> that started between time 0 and 10, have completed, and from 155 to 
>> 211, the remaining 400 jobs all run to completion; they really only 
>> start completing around 190 sec, and all finish by 211.  So, the 
>> plateau, that is evident here as well, is really when 400 workers are 
>> executing 400 jobs in parallel, and since the jobs are taking around 
>> 100 sec each to complete, the plateau of 50 seconds is completely 
>> normal.  See more after the graph...
>>
>>
>> Now the real question is, what is the breakdown of the 100 sec 
>> invocation (108.645 sec on average to be exact), how much is due to 
>> wrapper.sh, and how much is due to the application itself?  Mike, can 
>> you comment on this?  I assume you are running amiga which should 
>> have 0.5 sec jobs, right?
>>
>> Ioan
>>
>> Ioan Raicu wrote:
>>> I see the plateau, but there are other graphs which seem to go crazy 
>>> during those periods, such as
>>> http://www.ci.uchicago.edu/~benc/report-amps1-20080323-1935-su38n0k5/karatasks.FILE_TRANSFER-total.png 
>>>
>>> http://www.ci.uchicago.edu/~benc/report-amps1-20080323-1935-su38n0k5/karatasks.FILE_OPERATION-total.png 
>>>
>>>
>>> Looking at the Falkon logs might reveal more about if the plateau 
>>> was due to Falkon or not.  Where would I find the Falkon logs that 
>>> correlate to these graphs?
>>>
>>> Ioan
>>>
>>> Ben Clifford wrote:
>>>> you can get plots for your 1000 job run here:
>>>>
>>>> http://www.ci.uchicago.edu/~benc/report-amps1-20080323-1935-su38n0k5/
>>>>
>>>> you're hitting the file transfer and file operation limits (that 
>>>> are 20 in your config) once jobs start staging out.
>>>>
>>>> There's a wierd looking plateu in graph 'number of execute2 tasks 
>>>> at once:' around 170s .. 200s where no jobs complete for some time.
>>>>
>>>> Getting the falkon logs and/or the wrapper (.d) logs would be 
>>>> interesting there.
>>>>
>>>> these were generated on my laptop with:
>>>>
>>>> make \
>>>>  LOG=/Users/benc/work/everylog/amps1-20080323-1935-su38n0k5.log 
>>>> clean \
>>>>  webpage.weights webpage.kara webpage
>>>>
>>>> using the SVN log-procesisng code.
>>>>   
>>>
>>
>> -- 
>> ===================================================
>> Ioan Raicu
>> Ph.D. Candidate
>> ===================================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ===================================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>> http://dev.globus.org/wiki/Incubator/Falkon
>> http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
>> ===================================================
>> ===================================================
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>

-- 
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================





More information about the Swift-devel mailing list