Unsure about some entries in log_summary for 2B DoF problem
Richard Tran Mills
rmills at climate.ornl.gov
Mon May 4 13:51:52 CDT 2009
Barry,
Is there a way to dump the log data for all processes so that I can do things
like look at the the time spent in each stage by each process? I thought you
or someone else had mentioned adding a capability to do this in PETSc to a
file format that would be easy to manipulate using something like Python...
but perhaps I recall incorrectly.
It would be very nice to have this to calculate some more detailed statistics.
--Richard
Barry Smith wrote:
>
> Matt is completely correct. What this means is that though some
> processes wait a long time for the dots, MOST processes don't wait much
> at all.
> In other words, the dot causes very little idle time integrated over the
> whole machine.
>
> Meanwhile for flow (where the percentage is large) the dots cause a
> LARGE amount of idle time integrated over the machine.
>
> Why it is high for one and not the other I do not know.
>
> Barry
>
> On May 4, 2009, at 1:09 PM, Matthew Knepley wrote:
>
>> I believe that the time reported there is collective sum of times
>> divided by the collective sum
>> of the stage times. If you look at the time imbalance, it is a
>> staggering 9.7, which either means
>>
>> 1) The partition is really crap (which we know isn't true)
>>
>> 2) Some procs spend a lot of time waiting
>>
>> We can get at this waiting time with the split VecDot() events.
>>
>> Matt
>>
>> On Mon, May 4, 2009 at 12:58 PM, Richard Tran Mills
>> <rmills at climate.ornl.gov> wrote:
>> PETSc folks,
>>
>> I was looking over the log summary data for the 2 billion degrees of
>> freedom transport problem, and I'm a bit puzzled by some of the things
>> I'm seeing. (I sent a tarball of this to the pflotran-dev list on
>> April 30.) For instance, looking at the run at 32768 cores, I see
>> that the total time for the "transport" phase is 3.2139e+02 seconds.
>> But if I look at the VecDot line for the transport stage, I see
>>
>> Event Count Time (sec) Flops --- Global ---
>> --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio Mess Avg
>> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>> VecDot 1306 1.0 4.1529e+01 9.7 1.76e+08 1.1 0.0e+00
>> 0.0e+00 1.3e+03 1 0 0 0 1 3 0 0 0 24 128305
>>
>> It's hard to read this the way my email client will wrap it, but it's
>> saying that 3% of the time in the stage was spent on VecDot()s. But
>> the max time in VecDot is 4.1529e+01, close to thirteen percent. Does
>> the "%T" for the stage mean something other than what I think it does?
>>
>> --Richard
>>
>> --
>> Richard Tran Mills, Ph.D. | E-mail: rmills at climate.ornl.gov
>> Computational Scientist | Phone: (865) 241-3198
>> Computational Earth Sciences Group | Fax: (865) 574-0405
>> Oak Ridge National Laboratory | http://climate.ornl.gov/~rmills
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
--
Richard Tran Mills, Ph.D. | E-mail: rmills at climate.ornl.gov
Computational Scientist | Phone: (865) 241-3198
Computational Earth Sciences Group | Fax: (865) 574-0405
Oak Ridge National Laboratory | http://climate.ornl.gov/~rmills
More information about the petsc-dev
mailing list