[petsc-users] RE : How to get bandwidth peak out of PETSc log ?

Fri Jun 21 01:56:28 CDT 2013

On Fri, Jun 21, 2013 at 8:45 AM, HOUSSEN Franck <Franck.Houssen at cea.fr>wrote:

> Hello,
>
> The log I attached was a very small test case (1 iteration) : I just
> wanted to get some information about PETSc log printings (time, flops...).
> I profiled the code over "big" test case with scalasca : I know I spend 95%
> of the time calling PETSc (on realistic test cases).
>
> Barry,
> I have attached 2 logs over 1 and 2 procs running a intermediate test case
> (not small but not big).
> From PETSc_1proc.log, I get :
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 2.6379e+02 100.0%  9.8232e+10 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  2.943e+03 100.0%
> My understanding is that the relevant figure is 9.8232e+10 (flop) /
> 2.6379e+02 (sec) = 3.72e12 flops = 3723 gflops for 1 proc.
> From PETSc_2procs.log, I get :
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.6733e+02 100.0%  1.0348e+11 100.0%  3.427e+04
> 100.0%  2.951e+04      100.0%  1.355e+04 100.0%
> My understanding is that the relevant figure is 1.0348e+11 (flop) /
> 1.6733e+02 (sec) = 6.18e12 flops = 6184 gflops for 2 procs.
>
> Am I correct ?
>
> My understanding is that if the code scales "well", when I double the
> number of MPI processes, I should double the flops which is not the case (I
> get 6184 gflops instead of 2*3723) : right ? wrong ?
> From this, how can I know if I am at the bandwidth peak ?
>
> If I compare 6184 gflops to the (computer characteristic computed by Matt
> = 12.9 GB/s * 1 flops/12 bytes =) 1.1 GF/s, I get a huge difference that I
> am not sure to understand... I am not sure I can compare these numbers : I
> guess no ?! Did I miss something ?
>

The bandwidth limit depends on the operation. Some operations, like DGEMM
have no limit, whereas others like VecAXPY do. That is
why it is meaningless to just report flop numbers.

> More over I realize that I get :
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>       --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
> VecAXPY            14402 1.0 5.4951e+00 1.0 6.63e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  7  0  0  0   2  7  0  0  0  1206 (in PETSc_1proc.log => 1.2
> gflops which is superior to the 1.1 gflops computed by Matt)
> VecAXPY            14402 1.0 4.3619e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  6  0  0  0   3  6  0  0  0  1520 (in PETSc_2procs.log => 1.5
> gflops which is superior to the 1.1 gflops computed by Matt)
>
> My understanding is that I can conclude I  am at the bandwidth peak if I
> rely on :
> VecAXPY            14402 1.0 5.4951e+00 1.0 6.63e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  7  0  0  0   2  7  0  0  0  1206 (in PETSc_1proc.log => 1.2
> gflops which is superior to the 1.1 gflops computed by Matt)
> VecAXPY            14402 1.0 4.3619e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  6  0  0  0   3  6  0  0  0  1520 (in PETSc_2procs.log => 1.5
> gflops which is superior to the 1.1 gflops computed by Matt)
>
> But I can not conclude anything when I rely on 3723 gflops for 1 proc /
>  6184 gflops for 2 procs.
>
> Can somebody help me to see clear on that ?!
>

What the above shows is that you get very little improvement from the extra
process, so your machine does not have enough bandwidth to support
those two 2 processors for this operation. Thus I would not expect to see
speedup from 1 to 2 processors on this machine.

    Matt

> Thanks,
>
> FH
>
> ________________________________________
> De : Barry Smith [bsmith at mcs.anl.gov]
> Date d'envoi : jeudi 20 juin 2013 23:20
> À : HOUSSEN Franck
> Cc: petsc-users at mcs.anl.gov
> Objet : Re: [petsc-users] How to get bandwidth peak out of PETSc log ?
>
>    Please send also the -log_summary for 1 process.
>
>    Note that in the run you provided the time spent in PETSc is about 25
> percent of the total run. So how the "other" portion of the code scales
> will make a large difference for speedup, hence we need to see with a
> different number of processes.
>
>    Barry
>
> On Jun 20, 2013, at 3:24 PM, HOUSSEN Franck <Franck.Houssen at cea.fr> wrote:
>
> > Hello,
> >
> > I am new to PETSc.
> >
> > I have written a (MPI) PETSc code to solve an AX=B system : how to know
> the bandwidth peak for a given run ? The code does not scale as I would
> expect (doubling the number of MPI processes does not half the elapsed
> time) : I would like to understand if this behavior is related to a bad MPI
> parallelization (that I may be able to improve), or, to the fact that the
> bandwidth limit as been reached (and in this case, my understanding is that
> I can not do anything to improve neither the performance, nor the scaling).
> I would like to know what's going on and why !
> >
> > Concerning the computer, I have tried to estimate the bandwidth peak
> with the "stream benchmark" : www.streambench.org/index.html. I get this :
> > ~>./stream_c.exe
> > ...
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:           11473.8     0.014110     0.013945     0.015064
> > Scale:          11421.2     0.014070     0.014009     0.014096
> > Add:            12974.4     0.018537     0.018498     0.018590
> > Triad:          12964.6     0.018683     0.018512     0.019277
> > As a conclusion, my understanding is that the bandwidth peak of my
> computer is about 12 200 MB / s (= average between 11 400 and 12 900) which
> is about 12200 / 1024 = 11.9 GB/s.
> >
> > Concerning PETSc, I tried to find (without success) a figure to compare
> to 11.9 GB/s. First, I tried to run "make streams" but it doesn't compile
> (I run petsc3.4.1. on Ubuntu 12.04). Then, I looked into the PETSc log
> (using -log_summary and -ksp_view) but I was not able to find out an
> estimate of the bandwidth out of it (I get information about time and flops
> but not about the amount of data transfered between MPI processes in MB).
> How can I get bandwidth peak out of PETSc log ? Is there a specific option
> for that or is this not possible ?
> >
> > I have attached the PETSc log (MPI run over 2 processes).
> >
> > Thanks,
> >
> > FH
> > <PETSc.log>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130621/fe3a30a0/attachment.html>