Unsure about some entries in log_summary for 2B DoF problem
Satish Balay
balay at mcs.anl.gov
Mon May 4 14:04:12 CDT 2009
Well there is MPE loging that one can use - but then use jumpshot to
process it. This stuff isn't tested in a while though..
Satish
On Mon, 4 May 2009, Richard Tran Mills wrote:
> Barry,
>
> Is there a way to dump the log data for all processes so that I can do things
> like look at the the time spent in each stage by each process? I thought you
> or someone else had mentioned adding a capability to do this in PETSc to a
> file format that would be easy to manipulate using something like Python...
> but perhaps I recall incorrectly.
>
> It would be very nice to have this to calculate some more detailed statistics.
>
> --Richard
>
> Barry Smith wrote:
> >
> > Matt is completely correct. What this means is that though some processes
> > wait a long time for the dots, MOST processes don't wait much at all.
> > In other words, the dot causes very little idle time integrated over the
> > whole machine.
> >
> > Meanwhile for flow (where the percentage is large) the dots cause a LARGE
> > amount of idle time integrated over the machine.
> >
> > Why it is high for one and not the other I do not know.
> >
> > Barry
> >
> > On May 4, 2009, at 1:09 PM, Matthew Knepley wrote:
> >
> > > I believe that the time reported there is collective sum of times divided
> > > by the collective sum
> > > of the stage times. If you look at the time imbalance, it is a staggering
> > > 9.7, which either means
> > >
> > > 1) The partition is really crap (which we know isn't true)
> > >
> > > 2) Some procs spend a lot of time waiting
> > >
> > > We can get at this waiting time with the split VecDot() events.
> > >
> > > Matt
> > >
> > > On Mon, May 4, 2009 at 12:58 PM, Richard Tran Mills
> > > <rmills at climate.ornl.gov> wrote:
> > > PETSc folks,
> > >
> > > I was looking over the log summary data for the 2 billion degrees of
> > > freedom transport problem, and I'm a bit puzzled by some of the things I'm
> > > seeing. (I sent a tarball of this to the pflotran-dev list on April 30.)
> > > For instance, looking at the run at 32768 cores, I see that the total time
> > > for the "transport" phase is 3.2139e+02 seconds. But if I look at the
> > > VecDot line for the transport stage, I see
> > >
> > > Event Count Time (sec) Flops --- Global --- ---
> > > Stage --- Total
> > > Max Ratio Max Ratio Max Ratio Mess Avg len
> > > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> > > VecDot 1306 1.0 4.1529e+01 9.7 1.76e+08 1.1 0.0e+00 0.0e+00
> > > 1.3e+03 1 0 0 0 1 3 0 0 0 24 128305
> > >
> > > It's hard to read this the way my email client will wrap it, but it's
> > > saying that 3% of the time in the stage was spent on VecDot()s. But the
> > > max time in VecDot is 4.1529e+01, close to thirteen percent. Does the
> > > "%T" for the stage mean something other than what I think it does?
> > >
> > > --Richard
> > >
> > > --
> > > Richard Tran Mills, Ph.D. | E-mail: rmills at climate.ornl.gov
> > > Computational Scientist | Phone: (865) 241-3198
> > > Computational Earth Sciences Group | Fax: (865) 574-0405
> > > Oak Ridge National Laboratory | http://climate.ornl.gov/~rmills
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which their
> > > experiments lead.
> > > -- Norbert Wiener
>
>
>
More information about the petsc-dev
mailing list