[petsc-users] Question about the performance of KSP solver
Jed Brown
jed at jedbrown.org
Sun Feb 27 09:16:09 CST 2022
This is pretty typical. You see the factorization time is significantly better (because their more compute-limited) but MatMult and MatSolve are about the same because they are limited by memory bandwidth. On most modern architectures, the bandwidth is saturated with 16 cores or so.
https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
If you haven't yet, I recommend trying to use AMG for this problem. You should call MatSetNearNullSpace() to set the rigid body modes and then use -pc_type gamg or (with external packages -pc_type ml and -pc_type hypre). The iteration count should be much less and solves reasonably fast.
If you're interested in using different data structures, our experience is that we can solve similar problem sizes using Q2 elements in a few seconds (2-10) on a single node.
Gong Yujie <yc17470 at connect.um.edu.mo> writes:
> Hi,
>
> I'm using the GMRES with ASM preconditioner with sub-domain solver ILU(2) to solve an elasticity problem. First, I use 16 cores to test the computation time, then use 32 cores to run the same code with the same parameters. But I just get about 10% speed up. From the log file I found that the computation time of KSPSolve() and MatSolve() just decrease a little bit. My PETSc version is 3.16.0 and use --with-debugging=0 when configure it. The matrix size is about 7*10^6. Some detail of the log is shown below:
>
> 16-cores:
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatMult 664 1.0 5.0794e+01 1.6 2.70e+10 1.1 7.1e+04 4.8e+04 1.0e+00 7 13 49 20 0 7 13 49 20 0 8010
> MatSolve 663 1.0 1.9868e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 0.0e+00 33 70 0 0 0 33 70 0 0 0 10932
> MatLUFactorNum 1 1.0 6.1501e+00 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 35056
> MatILUFactorSym 1 1.0 1.5566e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> KSPSetUp 2 1.0 5.9627e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 2.5168e+02 1.0 1.90e+11 1.1 1.4e+05 4.8e+04 1.3e+03 44 93 98 40 89 44 93 98 40 90 11437
> KSPGMRESOrthog 641 1.0 1.8980e+01 1.7 1.82e+10 1.1 0.0e+00 0.0e+00 6.4e+02 3 9 0 0 43 3 9 0 0 44 14578
> PCSetUp 2 1.0 2.2480e+01 1.1 1.40e+10 1.1 5.3e+02 6.5e+05 7.0e+00 4 7 0 2 0 4 7 0 2 0 9591
> PCSetUpOnBlocks 1 1.0 2.1555e+01 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 0.0e+00 3 7 0 0 0 3 7 0 0 0 10002
> PCApply 663 1.0 2.0296e+02 1.1 1.43e+11 1.1 7.0e+04 4.8e+04 1.0e+00 33 70 49 20 0 33 70 49 20 0 10701
> PCApplyOnBlocks 663 1.0 1.9908e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 0.0e+00 33 70 0 0 0 33 70 0 0 0 10910
>
> 32-cores:
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatMult 671 1.0 4.7602e+01 2.0 1.39e+10 1.1 1.7e+05 2.8e+04 1.0e+00 7 13 49 23 0 7 13 49 23 0 8637
> MatSolve 670 1.0 1.7800e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 0.0e+00 33 71 0 0 0 33 71 0 0 0 12544
> MatLUFactorNum 1 1.0 3.5714e+00 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 60743
> MatILUFactorSym 1 1.0 8.4088e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> KSPSetUp 2 1.0 3.8060e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 2.1680e+02 1.0 9.95e+10 1.1 3.5e+05 2.8e+04 1.3e+03 44 93 98 47 89 44 93 98 47 90 13592
> KSPGMRESOrthog 648 1.0 1.6999e+01 2.0 9.39e+09 1.1 0.0e+00 0.0e+00 6.5e+02 2 9 0 0 43 2 9 0 0 44 16450
> PCSetUp 2 1.0 1.2439e+01 1.1 7.16e+09 1.1 1.3e+03 3.7e+05 7.0e+00 2 7 0 2 0 2 7 0 2 0 17440
> PCSetUpOnBlocks 1 1.0 1.1876e+01 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 0.0e+00 2 7 0 0 0 2 7 0 0 0 18267
> PCApply 670 1.0 1.8235e+02 1.1 7.56e+10 1.1 1.7e+05 2.7e+04 1.0e+00 34 71 49 23 0 34 71 49 23 0 12245
> PCApplyOnBlocks 670 1.0 1.7838e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 0.0e+00 33 71 0 0 0 33 71 0 0 0 12517
>
> Hope you can help me!
>
> Best Regards,
> Yujie
More information about the petsc-users
mailing list