Unsure about some entries in log_summary for 2B DoF problem

Satish Balay balay at mcs.anl.gov
Mon May 4 14:04:12 CDT 2009


Well there is MPE loging that one can use - but then use jumpshot to
process it. This stuff isn't tested in a while though..

Satish

On Mon, 4 May 2009, Richard Tran Mills wrote:

> Barry,
> 
> Is there a way to dump the log data for all processes so that I can do things
> like look at the the time spent in each stage by each process?  I thought you
> or someone else had mentioned adding a capability to do this in PETSc to a
> file format that would be easy to manipulate using something like Python...
> but perhaps I recall incorrectly.
> 
> It would be very nice to have this to calculate some more detailed statistics.
> 
> --Richard
> 
> Barry Smith wrote:
> > 
> >    Matt is completely correct. What this means is that though some processes
> > wait a long time for the dots, MOST processes don't wait much at all.
> > In other words, the dot causes very little idle time integrated over the
> > whole machine.
> > 
> > Meanwhile for flow (where the percentage is large) the dots cause a LARGE
> > amount of idle time integrated over the machine.
> > 
> > Why it is high for one and not the other I do not know.
> > 
> >    Barry
> > 
> > On May 4, 2009, at 1:09 PM, Matthew Knepley wrote:
> > 
> > > I believe that the time reported there is collective sum of times divided
> > > by the collective sum
> > > of the stage times. If you look at the time imbalance, it is a staggering
> > > 9.7, which either means
> > > 
> > >   1) The partition is really crap (which we know isn't true)
> > > 
> > >   2) Some procs spend a lot of time waiting
> > > 
> > > We can get at this waiting time with the split VecDot() events.
> > > 
> > >   Matt
> > > 
> > > On Mon, May 4, 2009 at 12:58 PM, Richard Tran Mills
> > > <rmills at climate.ornl.gov> wrote:
> > > PETSc folks,
> > > 
> > > I was looking over the log summary data for the 2 billion degrees of
> > > freedom transport problem, and I'm a bit puzzled by some of the things I'm
> > > seeing.  (I sent a tarball of this to the pflotran-dev list on April 30.)
> > > For instance, looking at the run at 32768 cores, I see that the total time
> > > for the "transport" phase is 3.2139e+02 seconds.  But if I look at the
> > > VecDot line for the transport stage, I see
> > > 
> > > Event                Count      Time (sec)     Flops   --- Global ---  ---
> > > Stage ---   Total
> > >                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> > > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > > VecDot              1306 1.0 4.1529e+01 9.7 1.76e+08 1.1 0.0e+00 0.0e+00
> > > 1.3e+03  1  0  0  0  1   3  0  0  0 24 128305
> > > 
> > > It's hard to read this the way my email client will wrap it, but it's
> > > saying that 3% of the time in the stage was spent on VecDot()s.  But the
> > > max time in VecDot is 4.1529e+01, close to thirteen percent.  Does the
> > > "%T" for the stage mean something other than what I think it does?
> > > 
> > > --Richard
> > > 
> > > -- 
> > > Richard Tran Mills, Ph.D.            |   E-mail: rmills at climate.ornl.gov
> > > Computational Scientist              |   Phone:  (865) 241-3198
> > > Computational Earth Sciences Group   |   Fax:    (865) 574-0405
> > > Oak Ridge National Laboratory        |   http://climate.ornl.gov/~rmills
> > > 
> > > 
> > > 
> > > -- 
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which their
> > > experiments lead.
> > > -- Norbert Wiener
> 
> 
> 




More information about the petsc-dev mailing list