<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    So using -log_summary helps me find out how much time is actually

    spent on the PETSc routines that are repeatedly called. Since that

    part of my code is fairly simple:<br>

    <br>

      PetscScalar *barray;<br>

      VecGetArray(b,&barray);<br>

      for (int i=0; i<Nall; i++) {<br>

        if (bound[i]==0)<br>

          barray[i] = charge[i]*ih*iepsilon0;<br>

        else<br>

          barray[i] = phi[i];<br>

      }<br>

      VecRestoreArray(b,&barray);<br>

    <br>

      KSPSolve(ksp,b,x);<br>

      <br>

      KSPGetSolution(ksp,&x);<br>

      PetscScalar *xarray;<br>

      VecGetArray(x,&xarray);<br>

      for (int i=0; i<Nall; i++)<br>

        phi[i] = xarray[i];<br>

      VecRestoreArray(x,&xarray);<br>

    <br>

    , I don't see how additional log states would help me. So I would

    then still just test which KSP method is the fastest?<br>

    <br>

    I ran a test over 1000 iterations; this is the output:<br>

    <blockquote type="cite">                         Max      

      Max/Min        Avg      Total <br>

      Time (sec):           1.916e+02      1.00055   1.915e+02<br>

      Objects:              1.067e+03      1.00000   1.067e+03<br>

      Flops:                5.730e+10      1.22776   5.360e+10 

      1.158e+13<br>

      Flops/sec:            2.992e+08      1.22792   2.798e+08 

      6.044e+10<br>

      MPI Messages:         1.900e+06      3.71429   1.313e+06 

      2.835e+08<br>

      MPI Message Lengths:  1.138e+09      2.38189   6.824e+02 

      1.935e+11<br>

      MPI Reductions:       1.462e+05      1.00000<br>

      <br>

      Flop counting convention: 1 flop = 1 real number operation of type

      (multiply/divide/add/subtract)<br>

                                  e.g., VecAXPY() for real vectors of

      length N --> 2N flops<br>

                                  and VecAXPY() for complex vectors of

      length N --> 8N flops<br>

      <br>

      Summary of Stages:   ----- Time ------  ----- Flops -----  ---

      Messages ---  -- Message Lengths --  -- Reductions --<br>

                              Avg     %Total     Avg     %Total  

      counts   %Total     Avg         %Total   counts   %Total <br>

       0:      Main Stage: 1.9154e+02 100.0%  1.1577e+13 100.0% 

      2.835e+08 100.0%  6.824e+02      100.0%  1.462e+05 100.0% <br>

      <br>

------------------------------------------------------------------------------------------------------------------------<br>

      See the 'Profiling' chapter of the users' manual for details on

      interpreting output.<br>

      Phase summary info:<br>

         Count: number of times phase was executed<br>

         Time and Flops: Max - maximum over all processors<br>

                         Ratio - ratio of maximum to minimum over all

      processors<br>

         Mess: number of messages sent<br>

         Avg. len: average message length (bytes)<br>

         Reduct: number of global reductions<br>

         Global: entire computation<br>

         Stage: stages of a computation. Set stages with

      PetscLogStagePush() and PetscLogStagePop().<br>

            %T - percent time in this phase         %F - percent flops

      in this phase<br>

            %M - percent messages in this phase     %L - percent message

      lengths in this phase<br>

            %R - percent reductions in this phase<br>

         Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max

      time over all processors)<br>

------------------------------------------------------------------------------------------------------------------------<br>

      Event                Count      Time (sec)    

      Flops                             --- Global ---  --- Stage ---  

      Total<br>

                         Max Ratio  Max     Ratio   Max  Ratio  Mess  

      Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s<br>

------------------------------------------------------------------------------------------------------------------------<br>

      <br>

      --- Event Stage 0: Main Stage<br>

      <br>

      KSPGMRESOrthog     70070 1.0 7.8035e+01 2.3 1.94e+10 1.2 0.0e+00

      0.0e+00 7.0e+04 29 34  0  0 48  29 34  0  0 48 50538<br>

      KSPSetUp               2 1.0 1.5209e-03 1.1 0.00e+00 0.0 0.0e+00

      0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0<br>

      KSPSolve            1001 1.0 1.9097e+02 1.0 5.73e+10 1.2 2.8e+08

      6.8e+02 1.5e+05100100100100100 100100100100100 60621<br>

      VecMDot            70070 1.0 6.9833e+01 2.8 9.69e+09 1.2 0.0e+00

      0.0e+00 7.0e+04 25 17  0  0 48  25 17  0  0 48 28235<br>

      VecNorm            74074 1.0 1.1570e+01 1.7 7.28e+08 1.2 0.0e+00

      0.0e+00 7.4e+04  5  1  0  0 51   5  1  0  0 51 12804<br>

      VecScale           73073 1.0 5.6676e-01 1.3 3.59e+08 1.2 0.0e+00

      0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 128930<br>

      VecCopy             3003 1.0 1.0008e-01 1.6 0.00e+00 0.0 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

      VecSet             77080 1.0 1.3647e+00 1.4 0.00e+00 0.0 0.0e+00

      0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0<br>

      VecAXPY             6006 1.0 1.0779e-01 1.7 5.90e+07 1.2 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 111441<br>

      VecMAXPY           73073 1.0 9.2155e+00 1.3 1.04e+10 1.2 0.0e+00

      0.0e+00 0.0e+00  4 18  0  0  0   4 18  0  0  0 229192<br>

      VecScatterBegin    73073 1.0 7.0538e+00 4.4 0.00e+00 0.0 2.8e+08

      6.8e+02 0.0e+00  2  0100100  0   2  0100100  0     0<br>

      VecScatterEnd      73073 1.0 7.8382e+00 2.6 0.00e+00 0.0 0.0e+00

      0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0<br>

      VecNormalize       73073 1.0 1.1774e+01 1.6 1.08e+09 1.2 0.0e+00

      0.0e+00 7.3e+04  5  2  0  0 50   5  2  0  0 50 18619<br>

      MatMult            73073 1.0 8.6056e+01 1.7 1.90e+10 1.3 2.8e+08

      6.8e+02 0.0e+00 36 33100100  0  36 33100100  0 44093<br>

      MatSolve           74074 1.0 5.4865e+01 1.2 1.71e+10 1.2 0.0e+00

      0.0e+00 0.0e+00 27 30  0  0  0  27 30  0  0  0 63153<br>

      MatLUFactorNum         1 1.0 4.1230e-03 2.6 9.89e+05241.4 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 36155<br>

      MatILUFactorSym        1 1.0 2.1942e-03 1.3 0.00e+00 0.0 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

      MatAssemblyBegin       2 1.0 5.6112e-03 4.8 0.00e+00 0.0 0.0e+00

      0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

      MatAssemblyEnd         2 1.0 6.3889e-03 1.0 0.00e+00 0.0 7.8e+03

      1.7e+02 8.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

      MatGetRowIJ            1 1.0 2.8849e-0515.1 0.00e+00 0.0 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

      MatGetOrdering         1 1.0 1.2279e-04 1.6 0.00e+00 0.0 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

      PCSetUp                2 1.0 6.6662e-03 1.8 9.89e+05241.4 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 22361<br>

      PCSetUpOnBlocks     1001 1.0 7.5164e-03 1.7 9.89e+05241.4 0.0e+00

      0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 19832<br>

      PCApply            74074 1.0 5.9613e+01 1.2 1.71e+10 1.2 0.0e+00

      0.0e+00 0.0e+00 29 30  0  0  0  29 30  0  0  0 58123<br>

------------------------------------------------------------------------------------------------------------------------<br>

      <br>

      Memory usage is given in bytes:<br>

      <br>

      Object Type          Creations   Destructions     Memory 

      Descendants' Mem.<br>

      Reports information only for process 0.<br>

      <br>

      --- Event Stage 0: Main Stage<br>

      <br>

             Krylov Solver     2              2        19576     0.<br>

           DMKSP interface     1              1          656     0.<br>

                    Vector  1043           1043     42492328     0.<br>

            Vector Scatter     2              2        41496     0.<br>

                    Matrix     4              4      3163588     0.<br>

          Distributed Mesh     1              1         5080     0.<br>

      Star Forest Bipartite Graph     2              2         1728    

      0.<br>

           Discrete System     1              1          872     0.<br>

                 Index Set     7              7        71796     0.<br>

         IS L to G Mapping     1              1        28068     0.<br>

            Preconditioner     2              2         1912     0.<br>

                    Viewer     1              0            0     0.<br>

========================================================================================================================<br>

      Average time to get PetscTime(): 1.90735e-07<br>

      Average time for MPI_Barrier(): 0.000184202<br>

      Average time for zero size MPI_Send(): 1.03469e-05<br>

      #PETSc Option Table entries:<br>

      -log_summary<br>

      #End of PETSc Option Table entries<br>

      Compiled without FORTRAN kernels<br>

      Compiled with full precision matrices (default)<br>

      sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8

      sizeof(PetscScalar) 8 sizeof(PetscInt) 4<br>

      Configure options: --download-f2cblaslapack --with-fc=0

      --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3</blockquote>

    <br>

    Regarding Matt's answer: It's generally a rectangular grid (3D) of

    predetermined size (not necessarily a cube). Additionally, objects

    of arbitrary shape can be defined by Dirichlet boundary conditions.

    Is geometric MG still viable?<br>

    <br>

    Thanks,<br>

    Michael<br>

    <br>

    <br>

    <div class="moz-cite-prefix">Am 03.06.2016 um 14:32 schrieb Matthew

      Knepley:<br>

    </div>

    <blockquote

cite="mid:CAMYG4GmD=e2-wd8Bk=M2yVLjsTTGnHWCfz92LZ61XXTnP1YSmw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">On Fri, Jun 3, 2016 at 5:56 AM, Dave

            May <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:dave.mayhem23@gmail.com" target="_blank">dave.mayhem23@gmail.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div dir="ltr">

                <div class="gmail_extra">

                  <div class="gmail_quote"><span class="">On 3 June 2016

                      at 11:37, Michael Becker <span dir="ltr"><<a

                          moz-do-not-send="true"

                          href="mailto:Michael.Becker@physik.uni-giessen.de"

                          target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:Michael.Becker@physik.uni-giessen.de">Michael.Becker@physik.uni-giessen.de</a></a>></span>

                      wrote:<br>

                      <blockquote class="gmail_quote" style="margin:0px

                        0px 0px 0.8ex;border-left:1px solid

                        rgb(204,204,204);padding-left:1ex">Dear all,<br>

                        <br>

                        I have a few questions regarding possible

                        performance enhancements for the PETSc solver I

                        included in my project.<br>

                        <br>

                        It's a particle-in-cell plasma simulation

                        written in C++, where Poisson's equation needs

                        to be solved repeatedly on every timestep.<br>

                        The simulation domain is discretized using

                        finite differences, so the solver therefore

                        needs to be able to efficiently solve the linear

                        system A x = b successively with changing b. The

                        solution x of the previous timestep is generally

                        a good initial guess for the solution.<br>

                        <br>

                        I wrote a class PETScSolver that holds all PETSc

                        objects and necessary information about domain

                        size and decomposition. To solve the linear

                        system, two arrays, 'phi' and 'charge', are

                        passed to a member function solve(), where they

                        are copied to PETSc vectors, and KSPSolve() is

                        called. After convergence, the solution is then

                        transferred again to the phi array so that other

                        program parts can use it.<br>

                        <br>

                        The matrix is created using DMDA. An array

                        'bound' is used to determine whether a node is

                        either a Dirichlet BC or holds a charge.<br>

                        <br>

                        I attached three files, petscsolver.h,

                        petscsolver.cpp and main.cpp, that contain a

                        shortened version of the solver class and a

                        set-up to initialize and run a simple problem.<br>

                        <br>

                        Is there anything I can change to generally make

                        the program run faster?<br>

                      </blockquote>

                      <div><br>

                      </div>

                    </span>

                    <div>Before changing anything, you should profile

                      your code to see where time is being spent.<br>

                      <br>

                      To that end, you should compile an optimized build

                      of petsc, link it to you application and run your

                      code with the option -log_summary. The

                      -log_summary flag will generate a performance

                      profile of specific functionality within petsc

                      (KSPSolve, MatMult etc) so you can see where all

                      the time is being spent.<br>

                      <br>

                    </div>

                    <div>As a second round of profiling, you should

                      consider registering specific functionality in

                      your code you think is performance critical. <br>

                      You can do this using the function

                      PetscLogStageRegister()<br>

                      <br>

                      <a moz-do-not-send="true"

href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html"

                        target="_blank">http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html</a>

                      <br>

                    </div>

                    <div><br>

                    </div>

                    <div>Check out the examples listed at the bottom of

                      this web page to see how to log stages. Once

                      you've registered stages, these will appear in the

                      report provided by -log_summary</div>

                  </div>

                </div>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>Do everything Dave said. I will also note that since

              you are using FD, I am guessing you are solving on a

              square. Then</div>

            <div>you should really be using geometric MG. We support

              this through the DMDA object.</div>

            <div><br>

            </div>

            <div>  Thanks,</div>

            <div><br>

            </div>

            <div>     Matt</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div dir="ltr">

                <div class="gmail_extra">

                  <div class="gmail_quote">

                    <div>Thanks,<br>

                    </div>

                    <div>  Dave<br>

                    </div>

                    <span class="">

                      <div> <br>

                      </div>

                      <div><br>

                      </div>

                      <blockquote class="gmail_quote" style="margin:0px

                        0px 0px 0.8ex;border-left:1px solid

                        rgb(204,204,204);padding-left:1ex">

                        And, since I'm rather unexperienced with KSP

                        methods, how do I efficiently choose PC and KSP?

                        Just by testing every combination?<br>

                        Would multigrid be a viable option as a pure

                        solver (-ksp_type preonly)?<br>

                        <br>

                        Thanks,<br>

                        Michael<br>

                      </blockquote>

                    </span></div>

                  <br>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

          <br clear="all">

          <div><br>

          </div>

          -- <br>

          <div class="gmail_signature" data-smartmail="gmail_signature">What

            most experimenters take for granted before they begin their

            experiments is infinitely more interesting than any results

            to which their experiments lead.<br>

            -- Norbert Wiener</div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>