[petsc-users] Suspect Poor Performance of Petsc

Barry Smith bsmith at mcs.anl.gov
Thu Nov 17 10:24:20 CST 2016


  Ivano,

I have cut and pasted the relevant parts of the logs below and removed a few irrelevent lines to make the analysis simplier.

There is a lot of bad stuff going on that is hurting performance.

1) The percentage of the time in the linear solve is getting lower going from 79% with 4 processes to 54 % with 16 processes. This means the rest of the code is not scaling well, likely that is generating the matrix. How are you getting the matrix into the program? If you are reading it as ASCII (somehow in parallel?) you should not do that. You should use MatLoad() to get the matrix in efficiently (see for example src/ksp/ksp/examples/tutorials/ex10.c).

To make the analysis better you need to add around the KSP solve.

  ierr = PetscLogStageRegister("Solve", &stage);CHKERRQ(ierr);
  ierr = PetscLogStagePush(stage);CHKERRQ(ierr);
  ierr = KSPSolve(ksp,bb,xx);CHKERRQ(ierr);
  ierr = PetscLogStagePop();CHKERRQ(ierr);

and rerun the three cases.

2) The load balance is bad even for four processes. For example it is 1.3 in the MatSolve, it should be really close to 1.0. How are you dividing the matrix up between processes?

3) It is spending a HUGE amount of time in VecNorm(), 26 % on 4 processes and 42% on 16 processes. This could be partially or completely due to load balancing but might have other issues.

Run with -ksp_norm_type natural in your new sets of runs

Also always run with -ksp_type cg ; it makes no sense to use gmres or the other KSP methods.

Eagerly awaiting your response.

Barry



------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

MatMult               75 1.0 4.1466e-03 1.2 3.70e+06 1.6 6.0e+02 4.4e+02 0.0e+00 13 25 97 99  0  13 25 97 99  0  2868
MatSolve              75 1.0 6.1995e-03 1.3 3.68e+06 1.6 0.0e+00 0.0e+00 0.0e+00 19 24  0  0  0  19 24  0  0  0  1908
MatLUFactorNum         1 1.0 3.6880e-04 1.4 5.81e+04 1.7 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   499
MatILUFactorSym        1 1.0 1.7040e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAssemblyBegin       1 1.0 2.5113e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  1  0  0  0  1   1  0  0  0  1     0
MatAssemblyEnd         1 1.0 1.8365e-03 1.0 0.00e+00 0.0 1.6e+01 1.1e+02 8.0e+00  6  0  3  1  3   6  0  3  1  3     0
MatGetRowIJ            1 1.0 2.2865e-0517.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 6.1687e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecTDot              150 1.0 3.2991e-03 2.7 2.05e+06 1.0 0.0e+00 0.0e+00 1.5e+02  8 17  0  0 62   8 17  0  0 62  2466
VecNorm               76 1.0 7.5034e-03 1.0 1.04e+06 1.0 0.0e+00 0.0e+00 7.6e+01 26  9  0  0 31  26  9  0  0 31   549
VecSet                77 1.0 2.4495e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              150 1.0 7.8158e-04 1.1 2.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00  3 17  0  0  0   3 17  0  0  0 10409
VecAYPX               74 1.0 6.8849e-04 1.0 1.01e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  8  0  0  0   2  8  0  0  0  5829
VecScatterBegin       75 1.0 1.7794e-04 1.2 0.00e+00 0.0 6.0e+02 4.4e+02 0.0e+00  1  0 97 99  0   1  0 97 99  0     0
VecScatterEnd         75 1.0 2.1674e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPSetUp               2 1.0 1.4922e-04 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.2833e-02 1.0 1.36e+07 1.3 6.0e+02 4.4e+02 2.3e+02 79100 97 99 93  79100 97 99 93  2116
PCSetUp                2 1.0 1.0116e-03 1.2 5.81e+04 1.7 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0   182
PCSetUpOnBlocks        1 1.0 6.2872e-04 1.2 5.81e+04 1.7 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0   293
PCApply               75 1.0 7.2835e-03 1.3 3.68e+06 1.6 0.0e+00 0.0e+00 0.0e+00 22 24  0  0  0  22 24  0  0  0  1624

MatMult               77 1.0 3.5985e-03 1.2 2.18e+06 2.4 1.5e+03 3.8e+02 0.0e+00  1 25 97 99  0   1 25 97 99  0  3393
MatSolve              77 1.0 3.8145e-03 1.4 2.16e+06 2.4 0.0e+00 0.0e+00 0.0e+00  1 24  0  0  0   1 24  0  0  0  3163
MatLUFactorNum         1 1.0 9.3037e-04 1.9 3.37e+04 2.6 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   196
MatILUFactorSym        1 1.0 2.1638e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 1.9466e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  1  0  0  0  1   1  0  0  0  1     0
MatAssemblyEnd         1 1.0 2.1234e-02 1.0 0.00e+00 0.0 4.0e+01 9.6e+01 8.0e+00  8  0  3  1  3   8  0  3  1  3     0
MatGetRowIJ            1 1.0 1.0025e-0312.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.4848e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecTDot              154 1.0 4.1220e-03 1.8 1.06e+06 1.0 0.0e+00 0.0e+00 1.5e+02  1 17  0  0 62   1 17  0  0 62  2026
VecNorm               78 1.0 1.5534e-01 1.0 5.38e+05 1.0 0.0e+00 0.0e+00 7.8e+01 60  9  0  0 31  60  9  0  0 31    27
VecSet                79 1.0 1.5549e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              154 1.0 8.0559e-04 1.2 1.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0 17  0  0  0   0 17  0  0  0 10368
VecAYPX               76 1.0 5.8600e-04 1.4 5.24e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  8  0  0  0   0  8  0  0  0  7034
VecScatterBegin       77 1.0 8.4793e-04 3.7 0.00e+00 0.0 1.5e+03 3.8e+02 0.0e+00  0  0 97 99  0   0  0 97 99  0     0
VecScatterEnd         77 1.0 7.7019e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 1.1451e-03 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 1.8231e-01 1.0 7.49e+06 1.5 1.5e+03 3.8e+02 2.3e+02 71100 97 99 93  71100 97 99 94   272
PCSetUp                2 1.0 1.0994e-02 1.1 3.37e+04 2.6 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0    17
PCSetUpOnBlocks        1 1.0 4.9001e-03 1.2 3.37e+04 2.6 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0    37
PCApply               77 1.0 5.2556e-03 1.3 2.16e+06 2.4 0.0e+00 0.0e+00 0.0e+00  2 24  0  0  0   2 24  0  0  0  2296

MatMult               78 1.0 1.2783e-02 4.8 1.16e+06 3.9 3.5e+03 2.5e+02 0.0e+00  1 25 98 99  0   1 25 98 99  0   968
MatSolve              78 1.0 1.4015e-0214.0 1.14e+06 3.9 0.0e+00 0.0e+00 0.0e+00  0 24  0  0  0   0 24  0  0  0   867
MatLUFactorNum         1 1.0 1.0275e-0240.1 1.76e+04 4.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    18
MatILUFactorSym        1 1.0 2.0541e-0213.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAssemblyBegin       1 1.0 2.1347e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  1  0  0  0  1   1  0  0  0  1     0
MatAssemblyEnd         1 1.0 1.5367e-01 1.1 0.00e+00 0.0 9.0e+01 6.5e+01 8.0e+00 12  0  2  1  3  12  0  2  1  3     0
MatGetRowIJ            1 1.0 1.2759e-02159.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.8199e-0221.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecTDot              156 1.0 1.3093e-02 6.1 5.45e+05 1.0 0.0e+00 0.0e+00 1.6e+02  1 17  0  0 62   1 17  0  0 62   646
VecNorm               79 1.0 5.2373e-01 1.0 2.76e+05 1.0 0.0e+00 0.0e+00 7.9e+01 42  9  0  0 31  42  9  0  0 31     8
VecSet                80 1.0 2.1215e-0229.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              156 1.0 2.5283e-03 1.7 5.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0 17  0  0  0   0 17  0  0  0  3346
VecAYPX               77 1.0 1.5826e-03 2.6 2.69e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  8  0  0  0   0  8  0  0  0  2639
VecScatterBegin       78 1.0 7.8273e-0326.8 0.00e+00 0.0 3.5e+03 2.5e+02 0.0e+00  0  0 98 99  0   0  0 98 99  0     0
VecScatterEnd         78 1.0 4.8130e-0344.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 1.9786e-0232.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPSolve               1 1.0 6.7540e-01 1.0 3.87e+06 1.8 3.5e+03 2.5e+02 2.4e+02 54100 98 99 93  54100 98 99 94    74
PCSetUp                2 1.0 9.6539e-02 1.2 1.76e+04 4.5 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0   7  0  0  0  0     2
PCSetUpOnBlocks        1 1.0 5.1548e-02 1.8 1.76e+04 4.5 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     4
PCApply               78 1.0 1.7296e-02 5.3 1.14e+06 3.9 0.0e+00 0.0e+00 0.0e+00  1 24  0  0  0   1 24  0  0  0   702
------------------------------------------------------------------------------------------------------------------------




> On Nov 17, 2016, at 6:28 AM, Ivano Barletta <ibarletta at inogs.it> wrote:
> 
> Dear Petsc users
> 
> My aim is to replace the linear solver of an ocean model with Petsc, to see if is
> there place for improvement of performances.
> 
> The linear system solves an elliptic equation, and the former solver is a 
> Preconditioned Conjugate Gradient, with a simple diagonal preconditioning.
> The size of the matrix is roughly 27000
> 
> Prior to nest Petsc into the model, I've built a simple test case, where 
> the same system is solved by the two of the methods
> 
> I've noticed that, compared to the former solver (pcg), Petsc performance
> results are quite disappointing
> 
> Pcg does not scale that much, but its solution time remains below 
> 4-5e-2 seconds. 
> Petsc solution time, instead, the more cpu I use the more increases
> (see output of -log_view in attachment ). 
> 
> I've only tried to change the ksp solver ( gmres, cg, and bcgs with no
> improvement) and preconditioning is the default of Petsc. Maybe these
> options don't suit my problem very well, but I don't think this alone
> justifies this strange behavior
> 
> I've tried to provide d_nnz and o_nnz for the exact number of nonzeros in the 
> Preallocation phase, but no gain even in this case.
> 
> At this point, my question is, what am I doing wrong?
> 
> Do you think that the problem is too small for the Petsc to 
> have any effect?
> 
> Thanks in advance
> Ivano
> 
> <petsc_time_8><petsc_time_4><petsc_time_16>



More information about the petsc-users mailing list