[petsc-users] About recent changes in GAMG

Matthew Knepley knepley at gmail.com
Fri Apr 19 15:04:42 CDT 2024


On Fri, Apr 19, 2024 at 3:52 PM Ashish Patel <ashish.patel at ansys.com> wrote:

> Hi Jed, VmRss is on a higher side and seems to match what
> PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. Mark,
> running without the near nullspace also gives similar results. I have
> attached the malloc_view and gamg info
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> Hi Jed,
> VmRss is on a higher side and seems to match what
> PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me.
>
> Mark, running without the near nullspace also gives similar results. I
> have attached the malloc_view and gamg info for serial and 2 core runs.
> Some of the standout functions on rank 0 for parallel run seems to be
> 5.3 GB MatSeqAIJSetPreallocation_SeqAIJ
> 7.7 GB MatStashSortCompress_Private
> 10.1 GB PetscMatStashSpaceGet
>

This is strange. We would expect the MatStash to be much smaller than the
allocation, but it is larger.
That suggests that you are sending a large number of off-process values. Is
this by design?

  Thanks,

     Matt


> 7.7 GB  PetscSegBufferAlloc_Private
>
> malloc_view also says the following
> [0] Maximum memory PetscMalloc()ed 32387548912 maximum size of entire
> process 8270635008
> which fits the PetscMallocGetMaximumUsage > PetscMemoryGetMaximumUsage
>  output.
>
> Let me know if you need some other info.
>
> Thanks,
> Ashish
>
> ------------------------------
> *From:* Jed Brown <jed at jedbrown.org>
> *Sent:* Thursday, April 18, 2024 2:16 PM
> *To:* Mark Adams <mfadams at lbl.gov>; Ashish Patel <ashish.patel at ansys.com>;
> PETSc users list <petsc-users at mcs.anl.gov>
> *Cc:* Scott McClennan <scott.mcclennan at ansys.com>
> *Subject:* Re: [petsc-users] About recent changes in GAMG
>
> [External Sender]
>
> Mark Adams <mfadams at lbl.gov> writes:
>
> >>> Yea, my interpretation of these methods is also that "
> > PetscMemoryGetMaximumUsage" should be >= "PetscMallocGetMaximumUsage".
> >>> But you are seeing the opposite.
> >
> >
> > We are using PETSc main and have found a case where memory consumption
> > explodes in parallel.
> > Also, we see a non-negligible difference between
> PetscMemoryGetMaximumUsage()
> > and PetscMallocGetMaximumUsage().
> > Running in serial through /usr/bin/time, the max. resident set size
> matches
> > the PetscMallocGetMaximumUsage() result.
> > I would have expected it to match PetscMemoryGetMaximumUsage() instead.
>
> PetscMemoryGetMaximumUsage uses procfs (if PETSC_USE_PROCFS_FOR_SIZE,
> which should be typical on Linux anyway) in PetscHeaderDestroy to update a
> static variable. If you haven't destroyed an object yet, its value will be
> nonsense.
>
> If your program is using huge pages, it might also be inaccurate (I don't
> know). You can look at /proc/<pid>/statm to see what PETSc is reading
> (second field, which is number of pages in RSS). You can also look at the
> VmRss field in /proc/<pid>/status, which reads in kB. See also the
> HugetlbPages field in /proc/<pid>/status.
>
> https://urldefense.us/v3/__https://www.kernel.org/doc/Documentation/filesystems/proc.txt__;!!G_uCfscf7eWS!ed35Hheva03XvvROCiYSMw0awizDuqiHG-IvZWEe-6j6XOY7z0eYVj_VFWQsUtaWNm-JLkMBQQT5wHbCEp3F$ 
> <https://urldefense.us/v3/__https://www.kernel.org/doc/Documentation/filesystems/proc.txt__;!!G_uCfscf7eWS!cjjzIqrR0_JSzQFZMrxX9GzpJEPHSN5oVeNexSd2AKNVhVFmsrJy-sKYRd0VFTzEk1LB727T1dFbrhKJ208CIzji9_k$>
>
> If your app is swapping, these will be inaccurate because swapped memory
> is not resident. We don't use the first field (VmSize) because there are
> reasons why programs sometimes map much more memory than they'll actually
> use, making such numbers irrelevant for most purposes.
>
> >
> >
> > PetscMemoryGetMaximumUsage
> > PetscMallocGetMaximumUsage
> >  Time
> > Serial + Option 1
> >  4.8 GB
> >  7.4 GB
> > 112 sec
> > 2 core + Option1
> > 15.2 GB
> > 45.5 GB
> > 150 sec
> > Serial + Option 2
> > 3.1 GB
> > 3.8 GB
> >  167 sec
> > 2 core + Option2
> > 13.1 GB
> > 17.4 GB
> > 89 sec
> > Serial + Option 3
> > 4.7GB
> > 5.2GB
> > 693 sec
> > 2 core + Option 3
> > 23.2 GB
> > 26.4 GB
> > 411 sec
> >
> >
> > On Thu, Apr 18, 2024 at 4:13 PM Mark Adams <mfadams at lbl.gov> wrote:
> >
> >> The next thing you might try is not using the null space argument.
> >> Hypre does not use it, but GAMG does.
> >> You could also run with -malloc_view to see some info on mallocs. It is
> >> probably in the Mat objects.
> >> You can also run with "-info" and grep on GAMG in the output and send
> that.
> >>
> >> Mark
> >>
> >> On Thu, Apr 18, 2024 at 12:03 PM Ashish Patel <ashish.patel at ansys.com>
> >> wrote:
> >>
> >>> Hi Mark,
> >>>
> >>> Thanks for your response and suggestion. With hypre both memory and
> time
> >>> looks good, here is the data for that
> >>>
> >>> PetscMemoryGetMaximumUsage
> >>> PetscMallocGetMaximumUsage
> >>>  Time
> >>> Serial + Option 4
> >>> 5.55 GB
> >>>  5.17 GB
> >>> 15.7 sec
> >>> 2 core + Option 4
> >>> 5.85 GB
> >>> 4.69 GB
> >>> 21.9 sec
> >>>
> >>> Option 4
> >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name
> >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type hypre
> >>> -pc_hypre_boomeramg_strong_threshold 0.9 -ksp_view -log_view
> >>> -log_view_memory -info :pc
> >>>
> >>> I am also attaching a standalone program to reproduce these options and
> >>> the link to matrix, rhs and near null spaces (serial.tar 2.xz
> >>> <
> https://urldefense.us/v3/__https://ansys-my.sharepoint.com/:u:/p/ashish_patel/EbUM5Ahp-epNi4xDxR9mnN0B1dceuVzGhVXQQYJzI5Py2g__;!!G_uCfscf7eWS!ar7t_MsQ-W6SXcDyEWpSDZP_YngFSqVsz2D-8chGJHSz7IZzkLBvN4UpJ1GXrRBGyhEHqmDUQGBfqTKf5x_BPXo$
> >
> >>> ) if you would like to try as well. Please let me know if you have
> >>> trouble accessing the link.
> >>>
> >>> Ashish
> >>> ------------------------------
> >>> *From:* Mark Adams <mfadams at lbl.gov>
> >>> *Sent:* Wednesday, April 17, 2024 7:52 PM
> >>> *To:* Jeremy Theler (External) <jeremy.theler-ext at ansys.com>
> >>> *Cc:* Ashish Patel <ashish.patel at ansys.com>; Scott McClennan <
> >>> scott.mcclennan at ansys.com>
> >>> *Subject:* Re: About recent changes in GAMG
> >>>
> >>>
> >>> *[External Sender]*
> >>>
> >>>
> >>> On Wed, Apr 17, 2024 at 7:20 AM Jeremy Theler (External) <
> >>> jeremy.theler-ext at ansys.com> wrote:
> >>>
> >>> Hey Mark. Long time no see! How are thing going over there?
> >>>
> >>> We are using PETSc main and have found a case where memory consumption
> >>> explodes in parallel.
> >>> Also, we see a non-negligible difference between
> >>> PetscMemoryGetMaximumUsage() and PetscMallocGetMaximumUsage().
> >>> Running in serial through /usr/bin/time, the max. resident set size
> >>> matches the PetscMallocGetMaximumUsage() result.
> >>> I would have expected it to match PetscMemoryGetMaximumUsage() instead.
> >>>
> >>>
> >>> Yea, my interpretation of these methods is also that "Memory" should be
> >>> >= "Malloc". But you are seeing the opposite.
> >>>
> >>> I don't have any idea what is going on with your big memory penalty
> going
> >>> from 1 to 2 cores on this test, but the first thing to do is try other
> >>> solvers and see how that behaves. Hypre in particular would be a good
> thing
> >>> to try because it is a similar algorithm.
> >>>
> >>> Mark
> >>>
> >>>
> >>>
> >>> The matrix size is around 1 million. We can share it with you if you
> >>> want, along with the RHS and the 6 near nullspace vectors and a
> modified
> >>> ex1.c which will read these files and show the following behavior.
> >>>
> >>> Observations using latest main for elastic matrix with a block size of
> 3
> >>> (after removing bonded glue-like DOFs with direct elimination) and near
> >>> null space provided
> >>>
> >>>    - Big memory penalty going from serial to parallel (2 core)
> >>>    - Big difference between PetscMemoryGetMaximumUsage and
> >>>    PetscMallocGetMaximumUsage, why?
> >>>    - The memory penalty decreases with
> -pc_gamg_aggressive_square_graph false
> >>>    (option 2)
> >>>    - The difference between PetscMemoryGetMaximumUsage and
> >>>    PetscMallocGetMaximumUsage reduces when -pc_gamg_threshold is
> >>>    increased from 0 to 0.01 (option 3), the solve time increase a lot
> though.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> PetscMemoryGetMaximumUsage
> >>> PetscMallocGetMaximumUsage
> >>>  Time
> >>> Serial + Option 1
> >>>  4.8 GB
> >>>  7.4 GB
> >>> 112 sec
> >>> 2 core + Option1
> >>> 15.2 GB
> >>> 45.5 GB
> >>> 150 sec
> >>> Serial + Option 2
> >>> 3.1 GB
> >>> 3.8 GB
> >>>  167 sec
> >>> 2 core + Option2
> >>> 13.1 GB
> >>> 17.4 GB
> >>> 89 sec
> >>> Serial + Option 3
> >>> 4.7GB
> >>> 5.2GB
> >>> 693 sec
> >>> 2 core + Option 3
> >>> 23.2 GB
> >>> 26.4 GB
> >>> 411 sec
> >>>
> >>> Option 1
> >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name
> >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg
> >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory
> >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold 0.0 -info :pc
> >>>
> >>> Option 2
> >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name
> >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg
> >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory
> >>> -pc_gamg_aggressive_square_graph *false* -pc_gamg_threshold 0.0 -info
> :pc
> >>>
> >>> Option 3
> >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name
> >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg
> >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory
> >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold *0.01* -info
> :pc
> >>> ------------------------------
> >>> *From:* Mark Adams <mfadams at lbl.gov>
> >>> *Sent:* Tuesday, November 14, 2023 11:28 AM
> >>> *To:* Jeremy Theler (External) <jeremy.theler-ext at ansys.com>
> >>> *Cc:* Ashish Patel <ashish.patel at ansys.com>
> >>> *Subject:* Re: About recent changes in GAMG
> >>>
> >>>
> >>> *[External Sender]*
> >>> Sounds good,
> >>>
> >>> I think the not-square graph "aggressive" coarsening is only issue
> that I
> >>> see and you can fix this by using:
> >>>
> >>> -mat_coarsen_type mis
> >>>
> >>> Aside, '-pc_gamg_aggressive_square_graph' should do it also, and you
> can
> >>> use both and they will be ignored in earlier versions.
> >>>
> >>> If you see a difference then the first thing to do is run with '-info
> >>> :pc' and send that to me (you can grep on 'GAMG' and send that if you
> like
> >>> to reduce the data).
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>>
> >>> On Tue, Nov 14, 2023 at 8:49 AM Jeremy Theler (External) <
> >>> jeremy.theler-ext at ansys.com> wrote:
> >>>
> >>> Hi Mark.
> >>> Thanks for reaching out. For now, we are going to stick to 3.19 for our
> >>> production code because the changes in 3.20 impact in our tests in
> >>> different ways (some of them perform better, some perform worse).
> >>> I now switched to another task about investigating structural elements
> in
> >>> DMplex.
> >>> I'll go back to analyzing the new changes in GAMG in a couple of weeks
> so
> >>> we can then see if we upgrade to 3.20 or we wait until 3.21.
> >>>
> >>> Thanks for your work and your kindness.
> >>> --
> >>> jeremy
> >>> ------------------------------
> >>> *From:* Mark Adams <mfadams at lbl.gov>
> >>> *Sent:* Tuesday, November 14, 2023 9:35 AM
> >>> *To:* Jeremy Theler (External) <jeremy.theler-ext at ansys.com>
> >>> *Cc:* Ashish Patel <ashish.patel at ansys.com>
> >>> *Subject:* Re: About recent changes in GAMG
> >>>
> >>>
> >>> *[External Sender]*
> >>> Hi Jeremy,
> >>>
> >>> Just following up.
> >>> I appreciate your digging into performance regressions in GAMG.
> >>> AMG is really a pain sometimes and we want GAMG to be solid, at least
> for
> >>> mainstream options, and your efforts are appreciated.
> >>> So feel free to start this discussion up.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>> On Wed, Oct 25, 2023 at 9:52 PM Jeremy Theler (External) <
> >>> jeremy.theler-ext at ansys.com> wrote:
> >>>
> >>> Dear Mark
> >>>
> >>> Thanks for the follow up and sorry for the delay.
> >>> I'm taking some days off. I'll be back to full throttle next week so
> can
> >>> continue the discussion about these changes in GAMG.
> >>>
> >>> Regards,
> >>> Jeremy
> >>>
> >>> ------------------------------
> >>> *From:* Mark Adams <mfadams at lbl.gov>
> >>> *Sent:* Wednesday, October 18, 2023 9:15 AM
> >>> *To:* Jeremy Theler (External) <jeremy.theler-ext at ansys.com>; PETSc
> >>> users list <petsc-users at mcs.anl.gov>
> >>> *Cc:* Ashish Patel <ashish.patel at ansys.com>
> >>> *Subject:* Re: About recent changes in GAMG
> >>>
> >>>
> >>> *[External Sender]*
> >>> Hi Jeremy,
> >>>
> >>> I hope you don't mind putting this on the list (w/o data), but this is
> >>> documentation and you are the second user that found regressions.
> >>> Sorry for the churn.
> >>>
> >>> There is a lot here so we can iterate, but here is a pass at your
> >>> questions.
> >>>
> >>> *** Using MIS-2 instead of square graph was motivated by setup
> >>> cost/performance but on GPUs with some recent fixes in Kokkos (in a
> branch)
> >>> square graph seems OK.
> >>> My experience was that square graph is better in terms of quality and
> we
> >>> have a power user, like you all, that found this also.
> >>> So I switched the default back to square graph.
> >>>
> >>> Interesting that you found that MIS-2 (new method) could be faster, but
> >>> it might be because the two methods coarsen at different rates and
> that can
> >>> make a big difference.
> >>> (the way to test would be to adjust parameters to get similar coarsen
> >>> rates, but I digress)
> >>> It's hard to understand the differences between these two methods in
> >>> terms of aggregate quality so we need to just experiment and have
> options.
> >>>
> >>> *** As far as your thermal problem. There was a complaint that the
> eigen
> >>> estimates for chebyshev smoother were not recomputed for nonlinear
> problems
> >>> and I added an option to do that and turned it on by default:
> >>> Use '-pc_gamg_recompute_esteig false' to get back to the original.
> >>> (I should have turned it off by default)
> >>>
> >>> Now, if your problem is symmetric and you use CG to compute the eigen
> >>> estimates there should be no difference.
> >>> If you use CG to compute the eigen estimates in GAMG (and have GAMG
> give
> >>> them to cheby, the default) that when you recompute the eigen
> estimates the
> >>> cheby eigen estimator is used and that will use gmres by default
> unless you
> >>> set the SPD property in your matrix.
> >>> So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set
> >>> '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and
> -options_left)
> >>> CG is a much better estimator for SPD.
> >>>
> >>> And I found that the cheby eigen estimator uses an LAPACK *eigen*
> method
> >>> to compute the eigen bounds and GAMG uses a *singular value* method.
> >>> The two give very different results on the lid driven cavity test
> (ex19).
> >>> eigen is lower, which is safer but not optimal if it is too low.
> >>> I have a branch to have cheby use the singular value method, but I
> don't
> >>> plan on merging it (enough churn and I don't understand these
> differences).
> >>>
> >>> *** '-pc_gamg_low_memory_threshold_filter false' recovers the old
> >>> filtering method.
> >>> This is the default now because there is a bug in the (new) low memory
> >>> filter.
> >>> This bug is very rare and catastrophic.
> >>> We are working on it and will turn it on by default when it's fixed.
> >>> This does not affect the semantics of the solver, just work and memory
> >>> complexity.
> >>>
> >>> *** As far as tet4 vs tet10, I would guess that tet4 wants more
> >>> aggressive coarsening.
> >>> The default is to do aggressive on one (1) level.
> >>> You might want more levels for tet4.
> >>> And the new MIS-k coarsening can use any k (default is 2) wth
> >>> '-mat_coarsen_misk_distance k' (eg, k=3)
> >>> I have not added hooks to have a more complex schedule to specify the
> >>> method on each level.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>> On Tue, Oct 17, 2023 at 9:33 PM Jeremy Theler (External) <
> >>> jeremy.theler-ext at ansys.com> wrote:
> >>>
> >>> Hey Mark
> >>>
> >>> Regarding the changes in the coarsening algorithm in 3.20 with respect
> to
> >>> 3.19 in general we see that for some problems the MIS strategy gives
> and
> >>> overall performance which is slightly better and for some others it is
> >>> slightly worse than the "baseline" from 3.19.
> >>> We also saw that current main has switched back to the old square
> >>> coarsening algorithm by default, which again, in some cases is better
> and
> >>> in others is worse than 3.19 without any extra command-line option.
> >>>
> >>> Now what seems weird to us is that we have a test case which is a heat
> >>> conduction problem with radiation boundary conditions (so it is non
> linear)
> >>> using tet10 and we see
> >>>
> >>>    1. that in parallel v3.20 is way worse than v3.19, although the
> >>>    memory usage is similar
> >>>    2. that petsc main (with no extra flags, just the defaults) recover
> >>>    the 3.19 performance but memory usage is significantly larger
> >>>
> >>>
> >>> I tried using the -pc_gamg_low_memory_threshold_filter flag and the
> >>> results were the same.
> >>>
> >>> Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI
> >>> ranks.
> >>> Is there any explanation about these two points we are seeing?
> >>> Another weird finding is that if we use tet4 instead of tet10, v3.20 is
> >>> only 10% slower than the other two and main does not need more memory
> than
> >>> the other two.
> >>>
> >>> BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and
> >>> main should you be interested.
> >>>
> >>> Let me know if it is better to move this discussion into the PETSc
> >>> mailing list.
> >>>
> >>> Regards,
> >>> jeremy theler
> >>>
> >>>
> >>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ed35Hheva03XvvROCiYSMw0awizDuqiHG-IvZWEe-6j6XOY7z0eYVj_VFWQsUtaWNm-JLkMBQQT5wFN0FfNW$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ed35Hheva03XvvROCiYSMw0awizDuqiHG-IvZWEe-6j6XOY7z0eYVj_VFWQsUtaWNm-JLkMBQQT5wPmhLOGr$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240419/fb3ca444/attachment-0001.html>


More information about the petsc-users mailing list