[petsc-users] Question about the performance of KSP solver

Jed Brown jed at jedbrown.org
Sun Feb 27 09:16:09 CST 2022


This is pretty typical. You see the factorization time is significantly better (because their more compute-limited) but MatMult and MatSolve are about the same because they are limited by memory bandwidth. On most modern architectures, the bandwidth is saturated with 16 cores or so.

https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup

If you haven't yet, I recommend trying to use AMG for this problem. You should call MatSetNearNullSpace() to set the rigid body modes and then use -pc_type gamg or (with external packages -pc_type ml and -pc_type hypre). The iteration count should be much less and solves reasonably fast.

If you're interested in using different data structures, our experience is that we can solve similar problem sizes using Q2 elements in a few seconds (2-10) on a single node. 

Gong Yujie <yc17470 at connect.um.edu.mo> writes:

> Hi,
>
> I'm using the GMRES with ASM preconditioner with sub-domain solver ILU(2) to solve an elasticity problem. First, I use 16 cores to test the computation time, then use 32 cores to run the same code with the same parameters.  But I just get about 10% speed up. From the log file I found that the computation time of KSPSolve() and MatSolve() just decrease a little bit. My PETSc version is 3.16.0 and use --with-debugging=0 when configure it. The matrix size is about 7*10^6. Some detail of the log is shown below:
>
> 16-cores:
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatMult              664 1.0 5.0794e+01 1.6 2.70e+10 1.1 7.1e+04 4.8e+04 1.0e+00  7 13 49 20  0   7 13 49 20  0  8010
> MatSolve             663 1.0 1.9868e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 0.0e+00 33 70  0  0  0  33 70  0  0  0 10932
> MatLUFactorNum         1 1.0 6.1501e+00 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 0.0e+00  1  7  0  0  0   1  7  0  0  0 35056
> MatILUFactorSym        1 1.0 1.5566e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> KSPSetUp               2 1.0 5.9627e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.5168e+02 1.0 1.90e+11 1.1 1.4e+05 4.8e+04 1.3e+03 44 93 98 40 89  44 93 98 40 90 11437
> KSPGMRESOrthog       641 1.0 1.8980e+01 1.7 1.82e+10 1.1 0.0e+00 0.0e+00 6.4e+02  3  9  0  0 43   3  9  0  0 44 14578
> PCSetUp                2 1.0 2.2480e+01 1.1 1.40e+10 1.1 5.3e+02 6.5e+05 7.0e+00  4  7  0  2  0   4  7  0  2  0  9591
> PCSetUpOnBlocks        1 1.0 2.1555e+01 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 0.0e+00  3  7  0  0  0   3  7  0  0  0 10002
> PCApply              663 1.0 2.0296e+02 1.1 1.43e+11 1.1 7.0e+04 4.8e+04 1.0e+00 33 70 49 20  0  33 70 49 20  0 10701
> PCApplyOnBlocks      663 1.0 1.9908e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 0.0e+00 33 70  0  0  0  33 70  0  0  0 10910
>
> 32-cores:
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatMult              671 1.0 4.7602e+01 2.0 1.39e+10 1.1 1.7e+05 2.8e+04 1.0e+00  7 13 49 23  0   7 13 49 23  0  8637
> MatSolve             670 1.0 1.7800e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 0.0e+00 33 71  0  0  0  33 71  0  0  0 12544
> MatLUFactorNum         1 1.0 3.5714e+00 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 0.0e+00  1  7  0  0  0   1  7  0  0  0 60743
> MatILUFactorSym        1 1.0 8.4088e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> KSPSetUp               2 1.0 3.8060e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.1680e+02 1.0 9.95e+10 1.1 3.5e+05 2.8e+04 1.3e+03 44 93 98 47 89  44 93 98 47 90 13592
> KSPGMRESOrthog       648 1.0 1.6999e+01 2.0 9.39e+09 1.1 0.0e+00 0.0e+00 6.5e+02  2  9  0  0 43   2  9  0  0 44 16450
> PCSetUp                2 1.0 1.2439e+01 1.1 7.16e+09 1.1 1.3e+03 3.7e+05 7.0e+00  2  7  0  2  0   2  7  0  2  0 17440
> PCSetUpOnBlocks        1 1.0 1.1876e+01 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 0.0e+00  2  7  0  0  0   2  7  0  0  0 18267
> PCApply              670 1.0 1.8235e+02 1.1 7.56e+10 1.1 1.7e+05 2.7e+04 1.0e+00 34 71 49 23  0  34 71 49 23  0 12245
> PCApplyOnBlocks      670 1.0 1.7838e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 0.0e+00 33 71  0  0  0  33 71  0  0  0 12517
>
> Hope you can help me!
>
> Best Regards,
> Yujie


More information about the petsc-users mailing list