[petsc-users] RE : How to get bandwidth peak out of PETSc log ?
Matthew Knepley
knepley at gmail.com
Fri Jun 21 01:56:28 CDT 2013
On Fri, Jun 21, 2013 at 8:45 AM, HOUSSEN Franck <Franck.Houssen at cea.fr>wrote:
> Hello,
>
> The log I attached was a very small test case (1 iteration) : I just
> wanted to get some information about PETSc log printings (time, flops...).
> I profiled the code over "big" test case with scalasca : I know I spend 95%
> of the time calling PETSc (on realistic test cases).
>
> Barry,
> I have attached 2 logs over 1 and 2 procs running a intermediate test case
> (not small but not big).
> From PETSc_1proc.log, I get :
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 2.6379e+02 100.0% 9.8232e+10 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 2.943e+03 100.0%
> My understanding is that the relevant figure is 9.8232e+10 (flop) /
> 2.6379e+02 (sec) = 3.72e12 flops = 3723 gflops for 1 proc.
> From PETSc_2procs.log, I get :
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.6733e+02 100.0% 1.0348e+11 100.0% 3.427e+04
> 100.0% 2.951e+04 100.0% 1.355e+04 100.0%
> My understanding is that the relevant figure is 1.0348e+11 (flop) /
> 1.6733e+02 (sec) = 6.18e12 flops = 6184 gflops for 2 procs.
>
> Am I correct ?
>
> My understanding is that if the code scales "well", when I double the
> number of MPI processes, I should double the flops which is not the case (I
> get 6184 gflops instead of 2*3723) : right ? wrong ?
> From this, how can I know if I am at the bandwidth peak ?
>
> If I compare 6184 gflops to the (computer characteristic computed by Matt
> = 12.9 GB/s * 1 flops/12 bytes =) 1.1 GF/s, I get a huge difference that I
> am not sure to understand... I am not sure I can compare these numbers : I
> guess no ?! Did I miss something ?
>
The bandwidth limit depends on the operation. Some operations, like DGEMM
have no limit, whereas others like VecAXPY do. That is
why it is meaningless to just report flop numbers.
> More over I realize that I get :
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
> VecAXPY 14402 1.0 5.4951e+00 1.0 6.63e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 7 0 0 0 2 7 0 0 0 1206 (in PETSc_1proc.log => 1.2
> gflops which is superior to the 1.1 gflops computed by Matt)
> VecAXPY 14402 1.0 4.3619e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 6 0 0 0 3 6 0 0 0 1520 (in PETSc_2procs.log => 1.5
> gflops which is superior to the 1.1 gflops computed by Matt)
>
> My understanding is that I can conclude I am at the bandwidth peak if I
> rely on :
> VecAXPY 14402 1.0 5.4951e+00 1.0 6.63e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 7 0 0 0 2 7 0 0 0 1206 (in PETSc_1proc.log => 1.2
> gflops which is superior to the 1.1 gflops computed by Matt)
> VecAXPY 14402 1.0 4.3619e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 6 0 0 0 3 6 0 0 0 1520 (in PETSc_2procs.log => 1.5
> gflops which is superior to the 1.1 gflops computed by Matt)
>
> But I can not conclude anything when I rely on 3723 gflops for 1 proc /
> 6184 gflops for 2 procs.
>
> Can somebody help me to see clear on that ?!
>
What the above shows is that you get very little improvement from the extra
process, so your machine does not have enough bandwidth to support
those two 2 processors for this operation. Thus I would not expect to see
speedup from 1 to 2 processors on this machine.
Matt
> Thanks,
>
> FH
>
> ________________________________________
> De : Barry Smith [bsmith at mcs.anl.gov]
> Date d'envoi : jeudi 20 juin 2013 23:20
> À : HOUSSEN Franck
> Cc: petsc-users at mcs.anl.gov
> Objet : Re: [petsc-users] How to get bandwidth peak out of PETSc log ?
>
> Please send also the -log_summary for 1 process.
>
> Note that in the run you provided the time spent in PETSc is about 25
> percent of the total run. So how the "other" portion of the code scales
> will make a large difference for speedup, hence we need to see with a
> different number of processes.
>
> Barry
>
> On Jun 20, 2013, at 3:24 PM, HOUSSEN Franck <Franck.Houssen at cea.fr> wrote:
>
> > Hello,
> >
> > I am new to PETSc.
> >
> > I have written a (MPI) PETSc code to solve an AX=B system : how to know
> the bandwidth peak for a given run ? The code does not scale as I would
> expect (doubling the number of MPI processes does not half the elapsed
> time) : I would like to understand if this behavior is related to a bad MPI
> parallelization (that I may be able to improve), or, to the fact that the
> bandwidth limit as been reached (and in this case, my understanding is that
> I can not do anything to improve neither the performance, nor the scaling).
> I would like to know what's going on and why !
> >
> > Concerning the computer, I have tried to estimate the bandwidth peak
> with the "stream benchmark" : www.streambench.org/index.html. I get this :
> > ~>./stream_c.exe
> > ...
> > Function Best Rate MB/s Avg time Min time Max time
> > Copy: 11473.8 0.014110 0.013945 0.015064
> > Scale: 11421.2 0.014070 0.014009 0.014096
> > Add: 12974.4 0.018537 0.018498 0.018590
> > Triad: 12964.6 0.018683 0.018512 0.019277
> > As a conclusion, my understanding is that the bandwidth peak of my
> computer is about 12 200 MB / s (= average between 11 400 and 12 900) which
> is about 12200 / 1024 = 11.9 GB/s.
> >
> > Concerning PETSc, I tried to find (without success) a figure to compare
> to 11.9 GB/s. First, I tried to run "make streams" but it doesn't compile
> (I run petsc3.4.1. on Ubuntu 12.04). Then, I looked into the PETSc log
> (using -log_summary and -ksp_view) but I was not able to find out an
> estimate of the bandwidth out of it (I get information about time and flops
> but not about the amount of data transfered between MPI processes in MB).
> How can I get bandwidth peak out of PETSc log ? Is there a specific option
> for that or is this not possible ?
> >
> > I have attached the PETSc log (MPI run over 2 processes).
> >
> > Thanks,
> >
> > FH
> > <PETSc.log>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130621/fe3a30a0/attachment.html>
More information about the petsc-users
mailing list