[petsc-users] Tuning performance for simple solver
Michael Becker
Michael.Becker at physik.uni-giessen.de
Fri Jun 3 09:34:09 CDT 2016
So using -log_summary helps me find out how much time is actually spent
on the PETSc routines that are repeatedly called. Since that part of my
code is fairly simple:
PetscScalar *barray;
VecGetArray(b,&barray);
for (int i=0; i<Nall; i++) {
if (bound[i]==0)
barray[i] = charge[i]*ih*iepsilon0;
else
barray[i] = phi[i];
}
VecRestoreArray(b,&barray);
KSPSolve(ksp,b,x);
KSPGetSolution(ksp,&x);
PetscScalar *xarray;
VecGetArray(x,&xarray);
for (int i=0; i<Nall; i++)
phi[i] = xarray[i];
VecRestoreArray(x,&xarray);
, I don't see how additional log states would help me. So I would then
still just test which KSP method is the fastest?
I ran a test over 1000 iterations; this is the output:
> Max Max/Min Avg Total
> Time (sec): 1.916e+02 1.00055 1.915e+02
> Objects: 1.067e+03 1.00000 1.067e+03
> Flops: 5.730e+10 1.22776 5.360e+10 1.158e+13
> Flops/sec: 2.992e+08 1.22792 2.798e+08 6.044e+10
> MPI Messages: 1.900e+06 3.71429 1.313e+06 2.835e+08
> MPI Message Lengths: 1.138e+09 2.38189 6.824e+02 1.935e+11
> MPI Reductions: 1.462e+05 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length
> N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.9154e+02 100.0% 1.1577e+13 100.0% 2.835e+08
> 100.0% 6.824e+02 100.0% 1.462e+05 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in
> this phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec)
> Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> KSPGMRESOrthog 70070 1.0 7.8035e+01 2.3 1.94e+10 1.2 0.0e+00
> 0.0e+00 7.0e+04 29 34 0 0 48 29 34 0 0 48 50538
> KSPSetUp 2 1.0 1.5209e-03 1.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1001 1.0 1.9097e+02 1.0 5.73e+10 1.2 2.8e+08
> 6.8e+02 1.5e+05100100100100100 100100100100100 60621
> VecMDot 70070 1.0 6.9833e+01 2.8 9.69e+09 1.2 0.0e+00
> 0.0e+00 7.0e+04 25 17 0 0 48 25 17 0 0 48 28235
> VecNorm 74074 1.0 1.1570e+01 1.7 7.28e+08 1.2 0.0e+00
> 0.0e+00 7.4e+04 5 1 0 0 51 5 1 0 0 51 12804
> VecScale 73073 1.0 5.6676e-01 1.3 3.59e+08 1.2 0.0e+00
> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 128930
> VecCopy 3003 1.0 1.0008e-01 1.6 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 77080 1.0 1.3647e+00 1.4 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 6006 1.0 1.0779e-01 1.7 5.90e+07 1.2 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 111441
> VecMAXPY 73073 1.0 9.2155e+00 1.3 1.04e+10 1.2 0.0e+00
> 0.0e+00 0.0e+00 4 18 0 0 0 4 18 0 0 0 229192
> VecScatterBegin 73073 1.0 7.0538e+00 4.4 0.00e+00 0.0 2.8e+08
> 6.8e+02 0.0e+00 2 0100100 0 2 0100100 0 0
> VecScatterEnd 73073 1.0 7.8382e+00 2.6 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
> VecNormalize 73073 1.0 1.1774e+01 1.6 1.08e+09 1.2 0.0e+00
> 0.0e+00 7.3e+04 5 2 0 0 50 5 2 0 0 50 18619
> MatMult 73073 1.0 8.6056e+01 1.7 1.90e+10 1.3 2.8e+08
> 6.8e+02 0.0e+00 36 33100100 0 36 33100100 0 44093
> MatSolve 74074 1.0 5.4865e+01 1.2 1.71e+10 1.2 0.0e+00
> 0.0e+00 0.0e+00 27 30 0 0 0 27 30 0 0 0 63153
> MatLUFactorNum 1 1.0 4.1230e-03 2.6 9.89e+05241.4 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 36155
> MatILUFactorSym 1 1.0 2.1942e-03 1.3 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 2 1.0 5.6112e-03 4.8 0.00e+00 0.0 0.0e+00
> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 2 1.0 6.3889e-03 1.0 0.00e+00 0.0 7.8e+03
> 1.7e+02 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 2.8849e-0515.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.2279e-04 1.6 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCSetUp 2 1.0 6.6662e-03 1.8 9.89e+05241.4 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 22361
> PCSetUpOnBlocks 1001 1.0 7.5164e-03 1.7 9.89e+05241.4 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 19832
> PCApply 74074 1.0 5.9613e+01 1.2 1.71e+10 1.2 0.0e+00
> 0.0e+00 0.0e+00 29 30 0 0 0 29 30 0 0 0 58123
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Krylov Solver 2 2 19576 0.
> DMKSP interface 1 1 656 0.
> Vector 1043 1043 42492328 0.
> Vector Scatter 2 2 41496 0.
> Matrix 4 4 3163588 0.
> Distributed Mesh 1 1 5080 0.
> Star Forest Bipartite Graph 2 2 1728 0.
> Discrete System 1 1 872 0.
> Index Set 7 7 71796 0.
> IS L to G Mapping 1 1 28068 0.
> Preconditioner 2 2 1912 0.
> Viewer 1 0 0 0.
> ========================================================================================================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 0.000184202
> Average time for zero size MPI_Send(): 1.03469e-05
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --download-f2cblaslapack --with-fc=0
> --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3
Regarding Matt's answer: It's generally a rectangular grid (3D) of
predetermined size (not necessarily a cube). Additionally, objects of
arbitrary shape can be defined by Dirichlet boundary conditions. Is
geometric MG still viable?
Thanks,
Michael
Am 03.06.2016 um 14:32 schrieb Matthew Knepley:
> On Fri, Jun 3, 2016 at 5:56 AM, Dave May <dave.mayhem23 at gmail.com
> <mailto:dave.mayhem23 at gmail.com>> wrote:
>
> On 3 June 2016 at 11:37, Michael Becker
> <Michael.Becker at physik.uni-giessen.de
> <mailto:Michael.Becker at physik.uni-giessen.de>> wrote:
>
> Dear all,
>
> I have a few questions regarding possible performance
> enhancements for the PETSc solver I included in my project.
>
> It's a particle-in-cell plasma simulation written in C++,
> where Poisson's equation needs to be solved repeatedly on
> every timestep.
> The simulation domain is discretized using finite differences,
> so the solver therefore needs to be able to efficiently solve
> the linear system A x = b successively with changing b. The
> solution x of the previous timestep is generally a good
> initial guess for the solution.
>
> I wrote a class PETScSolver that holds all PETSc objects and
> necessary information about domain size and decomposition. To
> solve the linear system, two arrays, 'phi' and 'charge', are
> passed to a member function solve(), where they are copied to
> PETSc vectors, and KSPSolve() is called. After convergence,
> the solution is then transferred again to the phi array so
> that other program parts can use it.
>
> The matrix is created using DMDA. An array 'bound' is used to
> determine whether a node is either a Dirichlet BC or holds a
> charge.
>
> I attached three files, petscsolver.h, petscsolver.cpp and
> main.cpp, that contain a shortened version of the solver class
> and a set-up to initialize and run a simple problem.
>
> Is there anything I can change to generally make the program
> run faster?
>
>
> Before changing anything, you should profile your code to see
> where time is being spent.
>
> To that end, you should compile an optimized build of petsc, link
> it to you application and run your code with the option
> -log_summary. The -log_summary flag will generate a performance
> profile of specific functionality within petsc (KSPSolve, MatMult
> etc) so you can see where all the time is being spent.
>
> As a second round of profiling, you should consider registering
> specific functionality in your code you think is performance
> critical.
> You can do this using the function PetscLogStageRegister()
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html
>
>
> Check out the examples listed at the bottom of this web page to
> see how to log stages. Once you've registered stages, these will
> appear in the report provided by -log_summary
>
>
> Do everything Dave said. I will also note that since you are using FD,
> I am guessing you are solving on a square. Then
> you should really be using geometric MG. We support this through the
> DMDA object.
>
> Thanks,
>
> Matt
>
> Thanks,
> Dave
>
>
> And, since I'm rather unexperienced with KSP methods, how do I
> efficiently choose PC and KSP? Just by testing every combination?
> Would multigrid be a viable option as a pure solver (-ksp_type
> preonly)?
>
> Thanks,
> Michael
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160603/9458348e/attachment.html>
More information about the petsc-users
mailing list