[petsc-users] (edit GAMG) petsc 3.7.2 memory usage is much higher when compared to 3.6.1

Hassan Raiesi Hassan.Raiesi at aero.bombardier.com
Wed Jul 6 11:22:11 CDT 2016


Barry,

Thank you for the detailed instructions, I'll try to figure out what change causes this problem, 

To answer your question, I re-ran using fgmres/bjacobi for a simple case and there was virtually no difference in memory footprint reported by PETSc (see the log files ends _basic). So it is safe to assume the extra memory was due to GAMG.

I ran a series of tests with GAMG, I attached full logs here, but to summarize:

PETSc 3.6.1:
--- Event Stage 0: Main Stage

              Matrix   368            365    149426856     0
      Matrix Coarsen    16             16         9920     0
              Vector  1181           1181    218526896     0
      Vector Scatter    99             99       115936     0
       Krylov Solver    22             22        72976     0
      Preconditioner    22             22        21648     0
              Viewer     1              0            0     0
           Index Set   267            267       821040     0
Star Forest Bipartite Graph    16             16        13440     0


Using same options, exactly same code (just linked it with petsc-3.7.2)

PETSc 3.7.2:
--- Event Stage 0: Main Stage

              Matrix   412            409    180705004     0.
      Matrix Coarsen    12             12         7536     0.
              Vector   923            923    214751960     0.
      Vector Scatter    79             79        95488     0.
       Krylov Solver    17             17        67152     0.
      Preconditioner    17             17        16936     0.
         PetscRandom     1              1          638     0.
              Viewer     1              0            0     0.
           Index Set   223            223       790676     0.
Star Forest Bipartite Graph    12             12        10176     0.

GAMG in 3.7.2 creates less levels, but needs more memory. 

For next test, I changed the "pc_gamg_square_graph" from 2 to 1, here 3.7.2 makes 19 levels now

PETSc 3.7.2:
--- Event Stage 0: Main Stage

              Matrix   601            598    188796452     0.
      Matrix Coarsen    19             19        11932     0.
              Vector  1358           1358    216798096     0.
      Vector Scatter   110            110       128920     0.
       Krylov Solver    24             24        76112     0.
      Preconditioner    24             24        23712     0.
         PetscRandom     1              1          638     0.
              Viewer     1              0            0     0.
           Index Set   284            284       857076     0.
Star Forest Bipartite Graph    19             19        16112     0.

with similar memory usage.

If I limit the number of levels to 17, I would get same number of levels as in version 3.6.1, however the memory usage is still higher than version 3.6.1

PETSc 3.7.2:
--- Event Stage 0: Main Stage

              Matrix   506            503    187749632     0.
      Matrix Coarsen    16             16        10048     0.
              Vector  1160           1160    216216344     0.
      Vector Scatter    92             92       100424     0.
       Krylov Solver    21             21        72272     0.
      Preconditioner    21             21        20808     0.
         PetscRandom     1              1          638     0.
              Viewer     1              0            0     0.
           Index Set   237            237       818260     0.
Star Forest Bipartite Graph    16             16        13568     0.

Now running version 3.6.1 with the options used for the above run 

PETSc 3.6.1:
--- Event Stage 0: Main Stage

              Matrix   338            335    153296844     0
      Matrix Coarsen    16             16         9920     0
              Vector  1156           1156    219112832     0
      Vector Scatter    89             89        94696     0
       Krylov Solver    22             22        72976     0
      Preconditioner    22             22        21648     0
              Viewer     1              0            0     0
           Index Set   223            223       791548     0
Star Forest Bipartite Graph    16             16        13440     0


It Looks like the GAMG in 3.7.2 makes a lot more matrices for same number of levels and requires about  (187749632  - 153296844)/153296844   = 22.5%  more memory.

I hope the logs help fixing the issue.

Best Regards

PS: GAMG is great, and by far beats all other AMG libraries we have tried so far :-)


-----Original Message-----
From: Barry Smith [mailto:bsmith at mcs.anl.gov] 
Sent: Tuesday, July 05, 2016 6:19 PM
To: Hassan Raiesi <Hassan.Raiesi at aero.bombardier.com>
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] petsc 3.7.2 memory usage is much higher when compared to 3.6.1


   Hassan,

    This memory usage increase is not expected.  How are you measuring memory usage?

    Since the problem occurs even with a simple solver you should debug with the simpler solver and only after resolving that move on to GAMG and see if the problem persists. Also do the test on the smallest case that clearly demonstrates the problem; if you have a 1 process run that shows a nontrivial memory usage increase then debug with that, don't run a huge problem unless you absolutely have to.

     How much code, if any, did you need to change in your application in going from 3.6.1 to 3.7.2 ?

     Here is the way to track down the problem. It may seem burdensome but requires no guesswork or speculation. Use the bisection capability of git. 

     First obtain PETSc via git if you have not gotten that way http://www.mcs.anl.gov/petsc/download/index.html

     Then in the PETSc directory run

      git bisect start
 
       git bisect good v3.6.1 

       git bisect bad v3.7.2

       It will then change to a new commit where you need to run configure and make on PETSc and then compile and run your application

       If the application uses the excessive memory then in the PETSc directory do

       git bisect bad

       otherwise type

       git bisect good

       if the code won't compile (if the PETSc API changes you may have to adjust your code slightly to get it to compile and you should do that; but if PETSc won't configure to build with the given commit then just do the skip) or crashes then type 

       git bisect skip 

      Now git will switch to another commit 

      where you need again do the same process of configure make and run the application. 

      After a few iterations git bisect will show the EXACT commit (code changes) that resulted in your very different memory usage and we can take a look at the code changes in PETSc and figure out how to reduce the memory usage.

      I realize this seems like a burdensome process but remember a great deal of changes took place in the PETSc code and this is the ONLY well defined way to figure out exactly which change caused the problem. Otherwise we can guess until the end of time.

   Barry







> On Jul 5, 2016, at 3:42 PM, Hassan Raiesi <Hassan.Raiesi at aero.bombardier.com> wrote:
> 
> Hi,
>  
> PETSc 3.7.2 seems to have a much higher memory usage when compared with PETSc- 3.1.1 c, to a point that it crashes our code for large problems that we ran with version 3.6.1 in the past.
> I have re-compiled the code with same options, and ran the same code linked with the two versions, here are the log-summarie:
>  
> -flow_ksp_max_it 20
> -flow_ksp_monitor_true_residual
> -flow_ksp_rtol 0.1
> -flow_ksp_type fgmres
> -flow_mg_coarse_pc_factor_mat_solver_package mumps 
> -flow_mg_coarse_pc_type lu -flow_mg_levels_ksp_type richardson 
> -flow_mg_levels_pc_type sor -flow_pc_gamg_agg_nsmooths 0 
> -flow_pc_gamg_coarse_eq_limit 2000 -flow_pc_gamg_process_eq_limit 2500 
> -flow_pc_gamg_repartition true -flow_pc_gamg_reuse_interpolation true 
> -flow_pc_gamg_square_graph 3 -flow_pc_gamg_sym_graph true 
> -flow_pc_gamg_type agg -flow_pc_mg_cycle v -flow_pc_mg_levels 20 
> -flow_pc_mg_type kaskade -flow_pc_type gamg -log_summary
>  
> Note: it is not specific to PCGAMG, even a bjacobi+fgmres would need more memory (4.5GB/core in version 3.6.1 compared to 6.8GB/core for 3.7.2).
>  
>  
>  
> Using Petsc Development GIT revision: v3.7.2-812-gc68d048  GIT Date: 
> 2016-07-05 12:04:34 -0400
>  
>                          Max       Max/Min        Avg      Total
> Time (sec):           6.760e+02      1.00006   6.760e+02
> Objects:              1.284e+03      1.00469   1.279e+03
> Flops:                3.563e+10      1.10884   3.370e+10  1.348e+13
> Flops/sec:            5.271e+07      1.10884   4.985e+07  1.994e+10
> MPI Messages:         4.279e+04      7.21359   1.635e+04  6.542e+06
> MPI Message Lengths:  3.833e+09     17.25274   7.681e+04  5.024e+11
> MPI Reductions:       4.023e+03      1.00149
>  
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of 
> length N --> 8N flops
>  
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
> 0:      Main Stage: 6.7600e+02 100.0%  1.3478e+13 100.0%  6.533e+06  99.9%  7.674e+04       99.9%  4.010e+03  99.7%
>  
> ----------------------------------------------------------------------
> --------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------
> --------------------------------------------------
>  
> --- Event Stage 0: Main Stage
>  
> MatMult              500 1.0 1.0582e+01 1.2 6.68e+09 1.1 1.9e+06 1.0e+04 0.0e+00  1 19 28  4  0   1 19 29  4  0 237625
> MatMultTranspose     120 1.0 7.6262e-01 1.3 3.58e+08 1.1 2.4e+05 1.5e+04 0.0e+00  0  1  4  1  0   0  1  4  1  0 180994
> MatSolve             380 1.0 4.1580e+00 1.1 1.17e+09 1.1 8.6e+03 8.8e+01 6.0e+01  1  3  0  0  1   1  3  0  0  1 105950
> MatSOR               120 1.0 1.4316e+01 1.2 6.75e+09 1.1 9.5e+05 7.4e+03 0.0e+00  2 19 15  1  0   2 19 15  1  0 177298
> MatLUFactorSym         2 1.0 2.3449e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> MatLUFactorNum        60 1.0 8.8820e+00 1.0 1.95e+08 1.2 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  7877
> MatILUFactorSym        1 1.0 1.9795e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatConvert             6 1.0 2.9893e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
> MatScale               6 1.0 1.8810e-02 1.4 4.52e+06 1.1 2.4e+04 1.5e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0 90171
> MatAssemblyBegin     782 1.0 1.8294e+01 2.9 0.00e+00 0.0 9.2e+05 4.1e+05 4.2e+02  2  0 14 75 10   2  0 14 75 10     0
> MatAssemblyEnd       782 1.0 1.4283e+01 3.0 0.00e+00 0.0 4.1e+05 8.7e+02 4.7e+02  1  0  6  0 12   1  0  6  0 12     0
> MatGetRow        6774900 1.1 9.4289e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            3 3.0 6.6261e-036948.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetSubMatrix       12 1.0 2.6783e+01 1.0 0.00e+00 0.0 1.1e+05 1.3e+05 2.0e+02  4  0  2  3  5   4  0  2  3  5     0
> MatGetOrdering         3 3.0 7.7400e-03 7.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatPartitioning        6 1.0 1.8949e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  0  0  0  0     0
> MatCoarsen             6 1.0 9.5692e-02 1.2 0.00e+00 0.0 2.6e+05 1.1e+03 4.1e+01  0  0  4  0  1   0  0  4  0  1     0
> MatZeroEntries       142 1.0 9.7085e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatTranspose           6 1.0 2.1740e-01 1.0 0.00e+00 0.0 1.9e+05 8.5e+02 7.8e+01  0  0  3  0  2   0  0  3  0  2     0
> MatPtAP              120 1.0 6.0157e+01 1.0 1.82e+10 1.1 1.5e+06 2.7e+05 4.2e+02  9 51 22 80 10   9 51 22 80 10 114269
> MatPtAPSymbolic       12 1.0 8.1081e+00 1.0 0.00e+00 0.0 2.2e+05 3.8e+04 8.4e+01  1  0  3  2  2   1  0  3  2  2     0
> MatPtAPNumeric       120 1.0 5.2205e+01 1.0 1.82e+10 1.1 1.2e+06 3.1e+05 3.4e+02  8 51 19 78  8   8 51 19 78  8 131676
> MatTrnMatMult          3 1.0 1.8608e+00 1.0 3.23e+07 1.2 8.3e+04 7.9e+03 5.7e+01  0  0  1  0  1   0  0  1  0  1  6275
> MatTrnMatMultSym       3 1.0 1.3447e+00 1.0 0.00e+00 0.0 6.9e+04 3.8e+03 5.1e+01  0  0  1  0  1   0  0  1  0  1     0
> MatTrnMatMultNum       3 1.0 5.1695e-01 1.0 3.23e+07 1.2 1.3e+04 3.0e+04 6.0e+00  0  0  0  0  0   0  0  0  0  0 22588
> MatGetLocalMat       126 1.0 1.0355e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetBrAoCol        120 1.0 9.5921e+0019.2 0.00e+00 0.0 5.7e+05 3.3e+04 0.0e+00  1  0  9  4  0   1  0  9  4  0     0
> VecDot               320 1.0 1.1400e+00 1.6 2.04e+08 1.1 0.0e+00 0.0e+00 3.2e+02  0  1  0  0  8   0  1  0  0  8 68967
> VecMDot              260 1.0 1.9577e+00 2.8 3.70e+08 1.1 0.0e+00 0.0e+00 2.6e+02  0  1  0  0  6   0  1  0  0  6 72792
> VecNorm              440 1.0 2.6273e+00 1.9 5.88e+08 1.1 0.0e+00 0.0e+00 4.4e+02  0  2  0  0 11   0  2  0  0 11 86035
> VecScale             320 1.0 2.1386e-01 1.2 7.91e+07 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 141968
> VecCopy              220 1.0 7.0370e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet               862 1.0 7.1000e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              440 1.0 8.6790e-01 1.1 3.83e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 169857
> VecAYPX              280 1.0 5.7766e-01 1.5 1.92e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 127599
> VecMAXPY             300 1.0 9.7396e-01 1.2 4.98e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 196768
> VecAssemblyBegin     234 1.0 4.6313e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.8e+02  0  0  0  0 17   0  0  0  0 17     0
> VecAssemblyEnd       234 1.0 5.1503e-0319.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     1083 1.0 2.9274e-01 4.5 0.00e+00 0.0 3.8e+06 8.5e+03 2.0e+01  0  0 59  6  0   0  0 59  6  0     0
> VecScatterEnd       1063 1.0 3.9653e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog        20 1.0 1.7405e+00 3.7 1.28e+08 1.1 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  0   0  0  0  0  0 28232
> KSPSetUp             222 1.0 6.8469e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              60 1.0 1.4767e+02 1.0 3.55e+10 1.1 6.3e+06 7.2e+04 3.2e+03 22100 96 90 79  22100 96 90 79 91007
> PCGAMGGraph_AGG        6 1.0 6.0792e+00 1.0 4.52e+06 1.1 3.8e+05 9.0e+02 2.5e+02  1  0  6  0  6   1  0  6  0  6   279
> PCGAMGCoarse_AGG       6 1.0 2.0660e+00 1.0 3.23e+07 1.2 4.2e+05 3.1e+03 1.5e+02  0  0  6  0  4   0  0  6  0  4  5652
> PCGAMGProl_AGG         6 1.0 1.8842e+00 1.0 0.00e+00 0.0 7.3e+05 3.3e+03 8.6e+02  0  0 11  0 21   0  0 11  0 22     0
> PCGAMGPOpt_AGG         6 1.0 6.4373e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> GAMG: createProl       6 1.0 1.0036e+01 1.0 3.68e+07 1.2 1.5e+06 2.7e+03 1.3e+03  1  0 23  1 31   1  0 23  1 31  1332
>   Graph               12 1.0 6.0783e+00 1.0 4.52e+06 1.1 3.8e+05 9.0e+02 2.5e+02  1  0  6  0  6   1  0  6  0  6   279
>   MIS/Agg              6 1.0 9.5831e-02 1.2 0.00e+00 0.0 2.6e+05 1.1e+03 4.1e+01  0  0  4  0  1   0  0  4  0  1     0
>   SA: col data         6 1.0 7.7358e-01 1.0 0.00e+00 0.0 6.7e+05 2.9e+03 7.8e+02  0  0 10  0 19   0  0 10  0 19     0
>   SA: frmProl0         6 1.0 1.0759e+00 1.0 0.00e+00 0.0 6.2e+04 7.6e+03 6.0e+01  0  0  1  0  1   0  0  1  0  1     0
> GAMG: partLevel        6 1.0 3.8136e+01 1.0 9.09e+08 1.1 3.8e+05 5.0e+04 5.4e+02  6  3  6  4 13   6  3  6  4 14  9013
>   repartition          6 1.0 2.7910e+00 1.0 0.00e+00 0.0 4.6e+04 1.3e+02 1.6e+02  0  0  1  0  4   0  0  1  0  4     0
>   Invert-Sort          6 1.0 2.5045e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  0  0  0  0  1   0  0  0  0  1     0
>   Move A               6 1.0 1.4832e+01 1.0 0.00e+00 0.0 8.5e+04 1.7e+05 1.1e+02  2  0  1  3  3   2  0  1  3  3     0
>   Move P               6 1.0 1.2023e+01 1.0 0.00e+00 0.0 2.4e+04 3.8e+03 1.1e+02  2  0  0  0  3   2  0  0  0  3     0
> PCSetUp              100 1.0 1.1212e+02 1.0 1.84e+10 1.1 3.2e+06 1.3e+05 2.2e+03 17 52 49 84 54  17 52 49 84 54 62052
> PCSetUpOnBlocks       40 1.0 1.0386e+00 1.2 1.95e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 67368
> PCApply              380 1.0 2.0034e+01 1.1 8.60e+09 1.1 1.5e+06 9.9e+03 6.0e+01  3 24 22  3  1   3 24 22  3  1 161973
> SFSetGraph            12 1.0 4.9813e-0310.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFBcastBegin          47 1.0 3.3110e-02 2.6 0.00e+00 0.0 2.6e+05 1.1e+03 6.0e+00  0  0  4  0  0   0  0  4  0  0     0
> SFBcastEnd            47 1.0 1.3497e-02 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFReduceBegin          6 1.0 1.8593e-02 4.2 0.00e+00 0.0 7.2e+04 4.9e+02 6.0e+00  0  0  1  0  0   0  0  1  0  0     0
> SFReduceEnd            6 1.0 7.1628e-0318.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> BuildTwoSided         12 1.0 3.5771e-02 2.5 0.00e+00 0.0 5.0e+04 4.0e+00 1.2e+01  0  0  1  0  0   0  0  1  0  0     0
> ----------------------------------------------------------------------
> --------------------------------------------------
>  
> Memory usage is given in bytes:
>  
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>  
> --- Event Stage 0: Main Stage
>  
>               Matrix   302            299   1992700700     0.
> Matrix Partitioning     6              6         3888     0.
>       Matrix Coarsen     6              6         3768     0.
>               Vector   600            600   1582204168     0.
>       Vector Scatter    87             87      5614432     0.
>        Krylov Solver    11             11        59472     0.
>       Preconditioner    11             11        11120     0.
>          PetscRandom     1              1          638     0.
>               Viewer     1              0            0     0.
>            Index Set   247            247      9008420     0.
> Star Forest Bipartite Graph    12             12        10176     0.
> ======================================================================
> ==================================================
>  
> And for  petsc 3.6.1:
>  
> Using Petsc Development GIT revision: v3.6.1-307-g26c82d3  GIT Date: 
> 2015-08-06 11:50:34 -0500
>  
>                          Max       Max/Min        Avg      Total
> Time (sec):           5.515e+02      1.00001   5.515e+02
> Objects:              1.231e+03      1.00490   1.226e+03
> Flops:                3.431e+10      1.12609   3.253e+10  1.301e+13
> Flops/sec:            6.222e+07      1.12609   5.899e+07  2.359e+10
> MPI Messages:         4.432e+04      7.84165   1.504e+04  6.016e+06
> MPI Message Lengths:  2.236e+09     12.61261   5.027e+04  3.024e+11
> MPI Reductions:       4.012e+03      1.00150
>  
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of 
> length N --> 8N flops
>  
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
> 0:      Main Stage: 5.5145e+02 100.0%  1.3011e+13 100.0%  6.007e+06  99.9%  5.020e+04       99.9%  3.999e+03  99.7%
>  
> ----------------------------------------------------------------------
> --------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------
> --------------------------------------------------
>  
> --- Event Stage 0: Main Stage
>  
> MatMult              500 1.0 1.0172e+01 1.2 6.68e+09 1.1 1.9e+06 9.9e+03 0.0e+00  2 19 31  6  0   2 19 31  6  0 247182
> MatMultTranspose     120 1.0 6.9889e-01 1.2 3.56e+08 1.1 2.5e+05 1.4e+04 0.0e+00  0  1  4  1  0   0  1  4  1  0 197492
> MatSolve             380 1.0 3.9310e+00 1.1 1.17e+09 1.1 1.3e+04 5.7e+01 6.0e+01  1  3  0  0  1   1  3  0  0  2 112069
> MatSOR               120 1.0 1.3915e+01 1.1 6.73e+09 1.1 9.5e+05 7.4e+03 0.0e+00  2 20 16  2  0   2 20 16  2  0 182405
> MatLUFactorSym         2 1.0 2.1180e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> MatLUFactorNum        60 1.0 7.9378e+00 1.0 1.95e+08 1.2 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  8814
> MatILUFactorSym        1 1.0 2.3076e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatConvert             6 1.0 3.2693e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
> MatScale               6 1.0 2.1923e-02 1.7 4.50e+06 1.1 2.4e+04 1.5e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0 77365
> MatAssemblyBegin     266 1.0 1.0337e+01 4.4 0.00e+00 0.0 1.8e+05 3.8e+03 4.2e+02  1  0  3  0 10   1  0  3  0 10     0
> MatAssemblyEnd       266 1.0 3.0336e+00 1.0 0.00e+00 0.0 4.1e+05 8.6e+02 4.7e+02  1  0  7  0 12   1  0  7  0 12     0
> MatGetRow        6730366 1.1 8.6473e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            3 3.0 5.2931e-035550.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetSubMatrix       12 1.0 2.2689e+01 1.0 0.00e+00 0.0 1.1e+05 1.3e+05 1.9e+02  4  0  2  5  5   4  0  2  5  5     0
> MatGetOrdering         3 3.0 6.5000e-03 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatPartitioning        6 1.0 2.9801e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  1  0  0  0  0   1  0  0  0  0     0
> MatCoarsen             6 1.0 9.5374e-02 1.1 0.00e+00 0.0 2.5e+05 1.1e+03 3.8e+01  0  0  4  0  1   0  0  4  0  1     0
> MatZeroEntries        22 1.0 6.1185e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatTranspose           6 1.0 1.9780e-01 1.1 0.00e+00 0.0 1.9e+05 8.6e+02 7.8e+01  0  0  3  0  2   0  0  3  0  2     0
> MatPtAP              120 1.0 5.2996e+01 1.0 1.70e+10 1.1 9.7e+05 2.1e+05 4.2e+02 10 49 16 67 10  10 49 16 67 11 120900
> MatPtAPSymbolic       12 1.0 5.8209e+00 1.0 0.00e+00 0.0 2.2e+05 3.7e+04 8.4e+01  1  0  4  3  2   1  0  4  3  2     0
> MatPtAPNumeric       120 1.0 4.7185e+01 1.0 1.70e+10 1.1 7.6e+05 2.6e+05 3.4e+02  9 49 13 64  8   9 49 13 64  8 135789
> MatTrnMatMult          3 1.0 1.1679e+00 1.0 3.22e+07 1.2 8.2e+04 8.0e+03 5.7e+01  0  0  1  0  1   0  0  1  0  1  9997
> MatTrnMatMultSym       3 1.0 6.8366e-01 1.0 0.00e+00 0.0 6.9e+04 3.9e+03 5.1e+01  0  0  1  0  1   0  0  1  0  1     0
> MatTrnMatMultNum       3 1.0 4.8513e-01 1.0 3.22e+07 1.2 1.3e+04 3.0e+04 6.0e+00  0  0  0  0  0   0  0  0  0  0 24069
> MatGetLocalMat       126 1.0 1.1939e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetBrAoCol        120 1.0 5.9887e-01 2.7 0.00e+00 0.0 5.7e+05 3.3e+04 0.0e+00  0  0  9  6  0   0  0  9  6  0     0
> MatGetSymTrans        24 1.0 1.4878e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecDot               320 1.0 1.5860e+00 1.5 2.04e+08 1.1 0.0e+00 0.0e+00 3.2e+02  0  1  0  0  8   0  1  0  0  8 49574
> VecMDot              260 1.0 1.8154e+00 2.5 3.70e+08 1.1 0.0e+00 0.0e+00 2.6e+02  0  1  0  0  6   0  1  0  0  7 78497
> VecNorm              440 1.0 2.8876e+00 1.8 5.88e+08 1.1 0.0e+00 0.0e+00 4.4e+02  0  2  0  0 11   0  2  0  0 11 78281
> VecScale             320 1.0 2.2738e-01 1.2 7.88e+07 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 133517
> VecCopy              220 1.0 7.1162e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet               862 1.0 7.0683e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              440 1.0 9.0657e-01 1.2 3.83e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 162612
> VecAYPX              280 1.0 5.8935e-01 1.5 1.92e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 125070
> VecMAXPY             300 1.0 9.7644e-01 1.2 4.98e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 196269
> VecAssemblyBegin     234 1.0 5.0308e+00 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 6.8e+02  1  0  0  0 17   1  0  0  0 17     0
> VecAssemblyEnd       234 1.0 1.8253e-03 8.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     1083 1.0 2.8195e-01 4.7 0.00e+00 0.0 3.8e+06 8.4e+03 2.0e+01  0  0 64 11  0   0  0 64 11  1     0
> VecScatterEnd       1063 1.0 3.4924e+00 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog        20 1.0 1.5598e+00 3.2 1.28e+08 1.1 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  0   0  0  0  0  1 31503
> KSPSetUp             222 1.0 9.7521e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              60 1.0 1.3742e+02 1.0 3.42e+10 1.1 5.7e+06 4.4e+04 3.2e+03 25100 95 83 79  25100 95 83 79 94396
> PCGAMGGraph_AGG        6 1.0 5.7683e+00 1.0 4.50e+06 1.1 3.8e+05 9.1e+02 2.5e+02  1  0  6  0  6   1  0  6  0  6   294
> PCGAMGCoarse_AGG       6 1.0 1.4101e+00 1.0 3.22e+07 1.2 4.0e+05 3.2e+03 1.4e+02  0  0  7  0  4   0  0  7  0  4  8280
> PCGAMGProl_AGG         6 1.0 1.8976e+00 1.0 0.00e+00 0.0 7.2e+05 3.4e+03 8.6e+02  0  0 12  1 22   0  0 12  1 22     0
> PCGAMGPOpt_AGG         6 1.0 5.7220e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> GAMG: createProl       6 1.0 9.0840e+00 1.0 3.67e+07 1.2 1.5e+06 2.7e+03 1.3e+03  2  0 25  1 31   2  0 25  1 31  1472
>   Graph               12 1.0 5.7669e+00 1.0 4.50e+06 1.1 3.8e+05 9.1e+02 2.5e+02  1  0  6  0  6   1  0  6  0  6   294
>   MIS/Agg              6 1.0 9.5481e-02 1.1 0.00e+00 0.0 2.5e+05 1.1e+03 3.8e+01  0  0  4  0  1   0  0  4  0  1     0
>   SA: col data         6 1.0 8.5414e-01 1.0 0.00e+00 0.0 6.6e+05 3.0e+03 7.8e+02  0  0 11  1 19   0  0 11  1 20     0
>   SA: frmProl0         6 1.0 1.0123e+00 1.0 0.00e+00 0.0 6.2e+04 7.6e+03 6.0e+01  0  0  1  0  1   0  0  1  0  2     0
> GAMG: partLevel        6 1.0 3.6150e+01 1.0 8.41e+08 1.1 3.5e+05 5.0e+04 5.3e+02  7  2  6  6 13   7  2  6  6 13  8804
>   repartition          6 1.0 3.8351e+00 1.0 0.00e+00 0.0 4.7e+04 1.3e+02 1.6e+02  1  0  1  0  4   1  0  1  0  4     0
>   Invert-Sort          6 1.0 4.4953e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  1  0  0  0  1   1  0  0  0  1     0
>   Move A               6 1.0 1.0806e+01 1.0 0.00e+00 0.0 8.5e+04 1.6e+05 1.0e+02  2  0  1  5  3   2  0  1  5  3     0
>   Move P               6 1.0 1.1953e+01 1.0 0.00e+00 0.0 2.5e+04 3.6e+03 1.0e+02  2  0  0  0  3   2  0  0  0  3     0
> PCSetUp              100 1.0 1.0166e+02 1.0 1.72e+10 1.1 2.7e+06 8.3e+04 2.2e+03 18 50 44 73 54  18 50 44 73 54 63848
> PCSetUpOnBlocks       40 1.0 1.0812e+00 1.2 1.95e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 64711
> PCApply              380 1.0 1.9359e+01 1.1 8.58e+09 1.1 1.4e+06 9.6e+03 6.0e+01  3 25 24  5  1   3 25 24  5  2 167605
> SFSetGraph            12 1.0 3.5203e-03 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFBcastBegin          44 1.0 2.4242e-02 3.0 0.00e+00 0.0 2.5e+05 1.1e+03 6.0e+00  0  0  4  0  0   0  0  4  0  0     0
> SFBcastEnd            44 1.0 3.0994e-02 8.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFReduceBegin          6 1.0 1.6784e-02 3.8 0.00e+00 0.0 7.1e+04 5.0e+02 6.0e+00  0  0  1  0  0   0  0  1  0  0     0
> SFReduceEnd            6 1.0 8.6989e-0332.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ----------------------------------------------------------------------
> --------------------------------------------------
>  
> Memory usage is given in bytes:
>  
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>  
> --- Event Stage 0: Main Stage
>  
>               Matrix   246            243   1730595756     0
> Matrix Partitioning     6              6         3816     0
>       Matrix Coarsen     6              6         3720     0
>               Vector   602            602   1603749672     0
>       Vector Scatter    87             87      4291136     0
>        Krylov Solver    12             12        60416     0
>       Preconditioner    12             12        12040     0
>               Viewer     1              0            0     0
>            Index Set   247            247      9018060     0
> Star Forest Bipartite Graph    12             12        10080     0
> ======================================================================
> ==================================================
>  
> Any idea why there are more matrix created with version 3.7.2? I only have 2 MatCreate calls and 4 VecCreate calls in my code!, so I assume the others are internally created.
>  
>  
> Thank you,
>  
>  
> Hassan Raiesi, PhD
>  
> Advanced Aerodynamics Department
> Bombardier Aerospace
>  
> hassan.raiesi at aero.bombardier.com
>  
> 2351 boul. Alfred-Nobel (BAN1)
> Ville Saint-Laurent, Québec, H4S 2A9
>  
>  
>  
> Tél.
>   514-855-5001    # 62204
>  
>  
>  
> <image001.png>
>  
>  
> CONFIDENTIALITY NOTICE - This communication may contain privileged or confidential information.
> If you are not the intended recipient or received this communication 
> by error, please notify the sender and delete the message without copying, forwarding and/or disclosing it.


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_3.6.1_gamg.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160706/125704ab/attachment-0007.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_3.7.2_gamg_run_with_square_graph_1.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160706/125704ab/attachment-0008.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_3.7.2_gamg_square_graph_1_max_level_17.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160706/125704ab/attachment-0009.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_3.6.1_gamg_run_with_square_graph_1_max_level_17.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160706/125704ab/attachment-0010.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_3.7.2_gamg.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160706/125704ab/attachment-0011.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_3.7.2_basic.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160706/125704ab/attachment-0012.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_3.6.1_basic.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160706/125704ab/attachment-0013.txt>


More information about the petsc-users mailing list