Unsure about some entries in log_summary for 2B DoF problem

Richard Tran Mills rmills at climate.ornl.gov
Mon May 4 13:51:52 CDT 2009


Barry,

Is there a way to dump the log data for all processes so that I can do things 
like look at the the time spent in each stage by each process?  I thought you 
or someone else had mentioned adding a capability to do this in PETSc to a 
file format that would be easy to manipulate using something like Python... 
but perhaps I recall incorrectly.

It would be very nice to have this to calculate some more detailed statistics.

--Richard

Barry Smith wrote:
> 
>    Matt is completely correct. What this means is that though some 
> processes wait a long time for the dots, MOST processes don't wait much 
> at all.
> In other words, the dot causes very little idle time integrated over the 
> whole machine.
> 
> Meanwhile for flow (where the percentage is large) the dots cause a 
> LARGE amount of idle time integrated over the machine.
> 
> Why it is high for one and not the other I do not know.
> 
>    Barry
> 
> On May 4, 2009, at 1:09 PM, Matthew Knepley wrote:
> 
>> I believe that the time reported there is collective sum of times 
>> divided by the collective sum
>> of the stage times. If you look at the time imbalance, it is a 
>> staggering 9.7, which either means
>>
>>   1) The partition is really crap (which we know isn't true)
>>
>>   2) Some procs spend a lot of time waiting
>>
>> We can get at this waiting time with the split VecDot() events.
>>
>>   Matt
>>
>> On Mon, May 4, 2009 at 12:58 PM, Richard Tran Mills 
>> <rmills at climate.ornl.gov> wrote:
>> PETSc folks,
>>
>> I was looking over the log summary data for the 2 billion degrees of 
>> freedom transport problem, and I'm a bit puzzled by some of the things 
>> I'm seeing.  (I sent a tarball of this to the pflotran-dev list on 
>> April 30.)  For instance, looking at the run at 32768 cores, I see 
>> that the total time for the "transport" phase is 3.2139e+02 seconds.  
>> But if I look at the VecDot line for the transport stage, I see
>>
>> Event                Count      Time (sec)     Flops   --- Global ---  
>> --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg 
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> VecDot              1306 1.0 4.1529e+01 9.7 1.76e+08 1.1 0.0e+00 
>> 0.0e+00 1.3e+03  1  0  0  0  1   3  0  0  0 24 128305
>>
>> It's hard to read this the way my email client will wrap it, but it's 
>> saying that 3% of the time in the stage was spent on VecDot()s.  But 
>> the max time in VecDot is 4.1529e+01, close to thirteen percent.  Does 
>> the "%T" for the stage mean something other than what I think it does?
>>
>> --Richard
>>
>> -- 
>> Richard Tran Mills, Ph.D.            |   E-mail: rmills at climate.ornl.gov
>> Computational Scientist              |   Phone:  (865) 241-3198
>> Computational Earth Sciences Group   |   Fax:    (865) 574-0405
>> Oak Ridge National Laboratory        |   http://climate.ornl.gov/~rmills
>>
>>
>>
>> -- 
>> What most experimenters take for granted before they begin their 
>> experiments is infinitely more interesting than any results to which 
>> their experiments lead.
>> -- Norbert Wiener


-- 
Richard Tran Mills, Ph.D.            |   E-mail: rmills at climate.ornl.gov
Computational Scientist              |   Phone:  (865) 241-3198
Computational Earth Sciences Group   |   Fax:    (865) 574-0405
Oak Ridge National Laboratory        |   http://climate.ornl.gov/~rmills



More information about the petsc-dev mailing list