[petsc-users] Tuning performance for simple solver

Michael Becker Michael.Becker at physik.uni-giessen.de
Fri Jun 3 09:34:09 CDT 2016


So using -log_summary helps me find out how much time is actually spent 
on the PETSc routines that are repeatedly called. Since that part of my 
code is fairly simple:

   PetscScalar *barray;
   VecGetArray(b,&barray);
   for (int i=0; i<Nall; i++) {
     if (bound[i]==0)
       barray[i] = charge[i]*ih*iepsilon0;
     else
       barray[i] = phi[i];
   }
   VecRestoreArray(b,&barray);

   KSPSolve(ksp,b,x);

   KSPGetSolution(ksp,&x);
   PetscScalar *xarray;
   VecGetArray(x,&xarray);
   for (int i=0; i<Nall; i++)
     phi[i] = xarray[i];
   VecRestoreArray(x,&xarray);

, I don't see how additional log states would help me. So I would then 
still just test which KSP method is the fastest?

I ran a test over 1000 iterations; this is the output:
>                          Max Max/Min        Avg      Total
> Time (sec):           1.916e+02      1.00055   1.915e+02
> Objects:              1.067e+03      1.00000   1.067e+03
> Flops:                5.730e+10      1.22776   5.360e+10 1.158e+13
> Flops/sec:            2.992e+08      1.22792   2.798e+08 6.044e+10
> MPI Messages:         1.900e+06      3.71429   1.313e+06 2.835e+08
> MPI Message Lengths:  1.138e+09      2.38189   6.824e+02 1.935e+11
> MPI Reductions:       1.462e+05      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type 
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length 
> N --> 2N flops
>                             and VecAXPY() for complex vectors of 
> length N --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- 
> Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total counts   
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.9154e+02 100.0%  1.1577e+13 100.0% 2.835e+08 
> 100.0%  6.824e+02      100.0%  1.462e+05 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on 
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() 
> and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in 
> this phase
>       %M - percent messages in this phase     %L - percent message 
> lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec) 
> Flops                             --- Global ---  --- Stage --- Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess Avg len 
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> KSPGMRESOrthog     70070 1.0 7.8035e+01 2.3 1.94e+10 1.2 0.0e+00 
> 0.0e+00 7.0e+04 29 34  0  0 48  29 34  0  0 48 50538
> KSPSetUp               2 1.0 1.5209e-03 1.1 0.00e+00 0.0 0.0e+00 
> 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve            1001 1.0 1.9097e+02 1.0 5.73e+10 1.2 2.8e+08 
> 6.8e+02 1.5e+05100100100100100 100100100100100 60621
> VecMDot            70070 1.0 6.9833e+01 2.8 9.69e+09 1.2 0.0e+00 
> 0.0e+00 7.0e+04 25 17  0  0 48  25 17  0  0 48 28235
> VecNorm            74074 1.0 1.1570e+01 1.7 7.28e+08 1.2 0.0e+00 
> 0.0e+00 7.4e+04  5  1  0  0 51   5  1  0  0 51 12804
> VecScale           73073 1.0 5.6676e-01 1.3 3.59e+08 1.2 0.0e+00 
> 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 128930
> VecCopy             3003 1.0 1.0008e-01 1.6 0.00e+00 0.0 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet             77080 1.0 1.3647e+00 1.4 0.00e+00 0.0 0.0e+00 
> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             6006 1.0 1.0779e-01 1.7 5.90e+07 1.2 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 111441
> VecMAXPY           73073 1.0 9.2155e+00 1.3 1.04e+10 1.2 0.0e+00 
> 0.0e+00 0.0e+00  4 18  0  0  0   4 18  0  0  0 229192
> VecScatterBegin    73073 1.0 7.0538e+00 4.4 0.00e+00 0.0 2.8e+08 
> 6.8e+02 0.0e+00  2  0100100  0   2  0100100  0     0
> VecScatterEnd      73073 1.0 7.8382e+00 2.6 0.00e+00 0.0 0.0e+00 
> 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> VecNormalize       73073 1.0 1.1774e+01 1.6 1.08e+09 1.2 0.0e+00 
> 0.0e+00 7.3e+04  5  2  0  0 50   5  2  0  0 50 18619
> MatMult            73073 1.0 8.6056e+01 1.7 1.90e+10 1.3 2.8e+08 
> 6.8e+02 0.0e+00 36 33100100  0  36 33100100  0 44093
> MatSolve           74074 1.0 5.4865e+01 1.2 1.71e+10 1.2 0.0e+00 
> 0.0e+00 0.0e+00 27 30  0  0  0  27 30  0  0  0 63153
> MatLUFactorNum         1 1.0 4.1230e-03 2.6 9.89e+05241.4 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 36155
> MatILUFactorSym        1 1.0 2.1942e-03 1.3 0.00e+00 0.0 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       2 1.0 5.6112e-03 4.8 0.00e+00 0.0 0.0e+00 
> 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         2 1.0 6.3889e-03 1.0 0.00e+00 0.0 7.8e+03 
> 1.7e+02 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 2.8849e-0515.1 0.00e+00 0.0 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.2279e-04 1.6 0.00e+00 0.0 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCSetUp                2 1.0 6.6662e-03 1.8 9.89e+05241.4 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 22361
> PCSetUpOnBlocks     1001 1.0 7.5164e-03 1.7 9.89e+05241.4 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 19832
> PCApply            74074 1.0 5.9613e+01 1.2 1.71e+10 1.2 0.0e+00 
> 0.0e+00 0.0e+00 29 30  0  0  0  29 30  0  0  0 58123
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>        Krylov Solver     2              2        19576     0.
>      DMKSP interface     1              1          656     0.
>               Vector  1043           1043     42492328     0.
>       Vector Scatter     2              2        41496     0.
>               Matrix     4              4      3163588     0.
>     Distributed Mesh     1              1         5080     0.
> Star Forest Bipartite Graph     2              2         1728 0.
>      Discrete System     1              1          872     0.
>            Index Set     7              7        71796     0.
>    IS L to G Mapping     1              1        28068     0.
>       Preconditioner     2              2         1912     0.
>               Viewer     1              0            0     0.
> ========================================================================================================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 0.000184202
> Average time for zero size MPI_Send(): 1.03469e-05
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --download-f2cblaslapack --with-fc=0 
> --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3

Regarding Matt's answer: It's generally a rectangular grid (3D) of 
predetermined size (not necessarily a cube). Additionally, objects of 
arbitrary shape can be defined by Dirichlet boundary conditions. Is 
geometric MG still viable?

Thanks,
Michael


Am 03.06.2016 um 14:32 schrieb Matthew Knepley:
> On Fri, Jun 3, 2016 at 5:56 AM, Dave May <dave.mayhem23 at gmail.com 
> <mailto:dave.mayhem23 at gmail.com>> wrote:
>
>     On 3 June 2016 at 11:37, Michael Becker
>     <Michael.Becker at physik.uni-giessen.de
>     <mailto:Michael.Becker at physik.uni-giessen.de>> wrote:
>
>         Dear all,
>
>         I have a few questions regarding possible performance
>         enhancements for the PETSc solver I included in my project.
>
>         It's a particle-in-cell plasma simulation written in C++,
>         where Poisson's equation needs to be solved repeatedly on
>         every timestep.
>         The simulation domain is discretized using finite differences,
>         so the solver therefore needs to be able to efficiently solve
>         the linear system A x = b successively with changing b. The
>         solution x of the previous timestep is generally a good
>         initial guess for the solution.
>
>         I wrote a class PETScSolver that holds all PETSc objects and
>         necessary information about domain size and decomposition. To
>         solve the linear system, two arrays, 'phi' and 'charge', are
>         passed to a member function solve(), where they are copied to
>         PETSc vectors, and KSPSolve() is called. After convergence,
>         the solution is then transferred again to the phi array so
>         that other program parts can use it.
>
>         The matrix is created using DMDA. An array 'bound' is used to
>         determine whether a node is either a Dirichlet BC or holds a
>         charge.
>
>         I attached three files, petscsolver.h, petscsolver.cpp and
>         main.cpp, that contain a shortened version of the solver class
>         and a set-up to initialize and run a simple problem.
>
>         Is there anything I can change to generally make the program
>         run faster?
>
>
>     Before changing anything, you should profile your code to see
>     where time is being spent.
>
>     To that end, you should compile an optimized build of petsc, link
>     it to you application and run your code with the option
>     -log_summary. The -log_summary flag will generate a performance
>     profile of specific functionality within petsc (KSPSolve, MatMult
>     etc) so you can see where all the time is being spent.
>
>     As a second round of profiling, you should consider registering
>     specific functionality in your code you think is performance
>     critical.
>     You can do this using the function PetscLogStageRegister()
>
>     http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html
>
>
>     Check out the examples listed at the bottom of this web page to
>     see how to log stages. Once you've registered stages, these will
>     appear in the report provided by -log_summary
>
>
> Do everything Dave said. I will also note that since you are using FD, 
> I am guessing you are solving on a square. Then
> you should really be using geometric MG. We support this through the 
> DMDA object.
>
>   Thanks,
>
>      Matt
>
>     Thanks,
>       Dave
>
>
>         And, since I'm rather unexperienced with KSP methods, how do I
>         efficiently choose PC and KSP? Just by testing every combination?
>         Would multigrid be a viable option as a pure solver (-ksp_type
>         preonly)?
>
>         Thanks,
>         Michael
>
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160603/9458348e/attachment.html>


More information about the petsc-users mailing list