[petsc-users] Expected weak scaling behaviour for AMG libraries?

Thu Nov 7 09:08:29 CST 2024

Hello Mark and Mathew,

Apologies for the delay in reply (I was gone for a vacation). Really appreciate the prompt response.

I am now planning to redo these tests with the load balancing suggestions you have provided. Would you suggest any load balancing options to use as default when dealing with unstructured meshes in general? I use PETSc as an external linear solver for my software, where I supply a Poisson system discretised using 3D simplical elements and FEM - which are solved using AMG. I observed bad weak scaling behaviour for my application for 20k DOF/rank, which prompted me to test something similar only in PETSc.

I choose ex12 instead of ex56 because it uses 3D FEM. I am not sure if I can make ex56 work for tetrahedrons out of the box. Maybe ex13 is more suited as Mark mentioned.

On point 3,4 from Mathew:
The plot below is from the numbers extracted from the -log_view option for all the runs. I have attached a sample log file from my runs, and pasted a sample output in the email.

------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

KSPSolve               2 1.0 1.4079e-01 1.0 2.14e+07 2.0 1.2e+03 1.1e+04 4.4e+01  2  4 26 16 17   2  4 26 16 18   875
SNESSolve              1 1.0 2.9310e+00 1.0 1.69e+08 1.1 1.7e+03 2.0e+04 6.1e+01 46 46 37 38 23  46 46 37 38 25   445
PCApply               23 1.0 1.2774e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Thanks and Best,
Parv

________________________________
From: Mark Adams <mfadams at lbl.gov>
Sent: 31 October 2024 11:30
To: Matthew Knepley <knepley at gmail.com>
Cc: Khurana, Parv <p.khurana22 at imperial.ac.uk>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Expected weak scaling behaviour for AMG libraries?

This email from mfadams at lbl.gov originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://urldefense.us/v3/__https://spam.ic.ac.uk/SpamConsole/Senders.aspx__;!!G_uCfscf7eWS!f47A4_a_tRPU1XvaYrgWMAp2uGFajfIIWf6QG4FERzGhIyI7U-eiYao8U73sCFqUwb_u9HrBY8TMcMT4qnKeIzOyDrGDgvqU$ > to disable email stamping for this address.

As Matt said snes ex56 is better because it does a convergence test that refines the grid. You need/want these two parameters to have the same arg (eg, 2,2,1): -dm_plex_box_faces 2,2,1 -petscpartitioner_simple_process_grid 2,2,1.
This will put one cell per process.

Then you use: -max_conv_its N, to specify the N levels of refinement to do. It will run the 2,2,1 first then a 4,4,2, etc., N times.

/src/snes/tests/ex13.c is designed for benchmarking and it uses '-petscpartitioner_simple_node_grid 1,1,1 [default]' to give you a two level partitioner.
You need to have dm_plex_box_faces_i =  petscpartitioner_simple_process_grid_i  * petscpartitioner_simple_node_grid_i
Again, you should put one cell per process (NP = product of dm_plex_box_faces args) and use -dm_refine N to get a single solve.

Mark

On Wed, Oct 30, 2024 at 11:02 PM Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:
On Wed, Oct 30, 2024 at 4:13 PM Khurana, Parv <p.khurana22 at imperial.ac.uk<mailto:p.khurana22 at imperial.ac.uk>> wrote:
Hello PETSc Community,
I am trying to understand the scaling behaviour of AMG methods in PETSc (Hypre for now) and how many DOFs/Rank are needed for a performant AMG solve.
I’m currently conducting weak scaling tests using src/snes/tutorials/ex12.c in 3D, applying Dirichlet BCs with FEM at P=1. The tests keep DOFs per processor constant while increasing the mesh size and processor count, specifically:

  *   20000 and 80000 DOF/RANK configurations.
  *   Running SNES twice, using GMRES with a tolerance of 1e-5 and preconditioning with Hypre-BoomerAMG.

A couple of quick points  in order to make sure that there is no confusion:

1) Partitioner type "simple" is for the CI. It is a very bad partition, and should not be used for timing. The default is ParMetis which should be good enough.

2) You start out with 6^3 = 216 elements, distribute that, and then refine it. This will be _really_ bad load balance on all arrangement except the divisors of 216. You usually want to start out with something bigger at the later stages. You can use -dm_refine_pre to refine before distribution.

3) It is not clear you are using the timing for just the solver (SNESSolve). It could be that extraneous things are taking time. When asking questions like this, please always send the output of -log_view for timing, and at least -ksp_monitor_true_residial for convergence.

4) SNES ex56 is the example we use for GAMG scalability testing

  Thanks,

      Matt
Unfortunately, parallel efficiency degrades noticeably with increased processor counts. Are there any insights or rules of thumb for using AMG more effectively? I have been looking at this issue for a while now and would love to engage in a further discussion. Please find below the weak scaling results and the options I use to run the tests.
[cid:ii_192e0800b4dcb971f161]
#Run type
-run_type full
-petscpartitioner_type simple

#Mesh settings
-dm_plex_dim 3
-dm_plex_simplex 1
-dm_refine 5 #Varied this
-dm_plex_box_faces 6,6,6

#BCs and FEM space
-bc_type dirichlet
-petscspace_degree 1

#Solver settings
-snes_max_it 2
-ksp_type gmres
-ksp_rtol 1.0e-5
#Same settings as what we use for LOR
-pc_type hypre
-pc_hypre_type boomeramg
-pc_hypre_boomeramg_coarsen_type hmis
-pc_hypre_boomeramg_relax_type_all symmetric-sor/jacobi
-pc_hypre_boomeramg_strong_threshold 0.7
-pc_hypre_boomeramg_interp_type ext+i
-pc_hypre_boomeramg_P_max 2
-pc_hypre_boomeramg_truncfactor 0.3

Best,
Parv

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f47A4_a_tRPU1XvaYrgWMAp2uGFajfIIWf6QG4FERzGhIyI7U-eiYao8U73sCFqUwb_u9HrBY8TMcMT4qnKeIzOyDtzWTvkU$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YXR-8qKioRYS0fNOHacYGkm6WaIuKge2zoTiW1n0vLsWQUBiyLM48cg58pRLtNm0QjVigIZYftn2x09fmjiN$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20241107/2eff54f5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 119488 bytes
Desc: image.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20241107/2eff54f5/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_scale.l144491.pbs-6
Type: application/octet-stream
Size: 24704 bytes
Desc: petsc_scale.l144491.pbs-6
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20241107/2eff54f5/attachment-0001.obj>