Unsure about some entries in log_summary for 2B DoF problem
Barry Smith
bsmith at mcs.anl.gov
Mon May 4 14:00:56 CDT 2009
On May 4, 2009, at 1:51 PM, Richard Tran Mills wrote:
> Barry,
>
> Is there a way to dump the log data for all processes so that I can
> do things like look at the the time spent in each stage by each
> process? I thought you or someone else had mentioned adding a
> capability to do this in PETSc to a file format that would be easy
> to manipulate using something like Python... but perhaps I recall
> incorrectly.
Hong has worked a little on this, you are welcome to mess with it
if you want (Hong, have you pushed this?).
Barry
>
>
> It would be very nice to have this to calculate some more detailed
> statistics.
>
> --Richard
>
> Barry Smith wrote:
>> Matt is completely correct. What this means is that though some
>> processes wait a long time for the dots, MOST processes don't wait
>> much at all.
>> In other words, the dot causes very little idle time integrated
>> over the whole machine.
>> Meanwhile for flow (where the percentage is large) the dots cause a
>> LARGE amount of idle time integrated over the machine.
>> Why it is high for one and not the other I do not know.
>> Barry
>> On May 4, 2009, at 1:09 PM, Matthew Knepley wrote:
>>> I believe that the time reported there is collective sum of times
>>> divided by the collective sum
>>> of the stage times. If you look at the time imbalance, it is a
>>> staggering 9.7, which either means
>>>
>>> 1) The partition is really crap (which we know isn't true)
>>>
>>> 2) Some procs spend a lot of time waiting
>>>
>>> We can get at this waiting time with the split VecDot() events.
>>>
>>> Matt
>>>
>>> On Mon, May 4, 2009 at 12:58 PM, Richard Tran Mills <rmills at climate.ornl.gov
>>> > wrote:
>>> PETSc folks,
>>>
>>> I was looking over the log summary data for the 2 billion degrees
>>> of freedom transport problem, and I'm a bit puzzled by some of the
>>> things I'm seeing. (I sent a tarball of this to the pflotran-dev
>>> list on April 30.) For instance, looking at the run at 32768
>>> cores, I see that the total time for the "transport" phase is
>>> 3.2139e+02 seconds. But if I look at the VecDot line for the
>>> transport stage, I see
>>>
>>> Event Count Time (sec) Flops --- Global
>>> --- --- Stage --- Total
>>> Max Ratio Max Ratio Max Ratio Mess Avg
>>> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>> VecDot 1306 1.0 4.1529e+01 9.7 1.76e+08 1.1 0.0e+00
>>> 0.0e+00 1.3e+03 1 0 0 0 1 3 0 0 0 24 128305
>>>
>>> It's hard to read this the way my email client will wrap it, but
>>> it's saying that 3% of the time in the stage was spent on
>>> VecDot()s. But the max time in VecDot is 4.1529e+01, close to
>>> thirteen percent. Does the "%T" for the stage mean something
>>> other than what I think it does?
>>>
>>> --Richard
>>>
>>> --
>>> Richard Tran Mills, Ph.D. | E-mail: rmills at climate.ornl.gov
>>> Computational Scientist | Phone: (865) 241-3198
>>> Computational Earth Sciences Group | Fax: (865) 574-0405
>>> Oak Ridge National Laboratory | http://climate.ornl.gov/~rmills
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to
>>> which their experiments lead.
>>> -- Norbert Wiener
>
>
> --
> Richard Tran Mills, Ph.D. | E-mail: rmills at climate.ornl.gov
> Computational Scientist | Phone: (865) 241-3198
> Computational Earth Sciences Group | Fax: (865) 574-0405
> Oak Ridge National Laboratory | http://climate.ornl.gov/~rmills
More information about the petsc-dev
mailing list