<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr"><br></div><div dir="ltr"><br><blockquote type="cite">On Oct 6, 2023, at 3:48 AM, Mark Adams <mfadams@lbl.gov> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr">Pierre, (moved to dev)<div><br></div><div>It looks like there is a subtle bug in the new MatFilter.</div><div>My guess is that after the compression/filter the communication buffers and lists need to be recomputed because the graph has changed.</div></div></div></blockquote><div><br></div><div>Maybe an issue with MatHeaderReplace()?</div><div>Do you have a reproducer?</div><div>I use this routine for AIJ, BAIJ, and SBAIJ and never ran into this (though the subsequent Mat is not involved in the same kind of operations as in GAMG).</div><div><br></div><div>Thanks,</div><div>Pierre</div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div>And, the Mat-Mat Mults failed or hung because the communication requirements, as seen in the graph, did not match the cached communication lists.</div><div>The old way just created a whole new matrix, which took care of that.</div><div><br></div><div>Mark</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 5, 2023 at 8:51 PM Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Fantastic, it will get merged soon.<div><br><div>Thank you for your diligence and patience.</div><div>This would have been a time bomb waiting to explode.</div><div><br></div><div>Mark <br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 5, 2023 at 7:23 PM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk" target="_blank">s.kramer@imperial.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Great, that seems to fix the issue indeed - i.e. on the branch with the <br>

low memory filtering switched off (by default) we no longer see the <br>

"inconsistent data" error or hangs, and going back to the square graph <br>

aggressive coarsening brings us back the old performance. So we'd be <br>

keen to have that branch merged indeed<br>

Many thanks for your assistance with this<br>

Stephan<br>

<br>

On 05/10/2023 01:11, Mark Adams wrote:<br>

> Thanks Stephan,<br>

><br>

> It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat<br>

> is waiting for messages that were not sent. A bug.<br>

><br>

> Can you try my branch, which is ready to merge, adams/gamg-fast-filter.<br>

> We added a new filtering method in main that uses low memory but I found it<br>

> was slow, so this branch brings back the old filter code, used by default,<br>

> and keeps the low memory version as an option.<br>

> It is possible this low memory filtering messed up the internals of the Mat<br>

> in some way.<br>

> I hope this is it, but if not we can continue.<br>

><br>

> This MR also makes square graph the default.<br>

> I have found it does create better aggregates and on GPUs, with Kokkos bug<br>

> fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs)<br>

><br>

> Mark<br>

><br>

><br>

><br>

><br>

> On Wed, Oct 4, 2023 at 12:30 AM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk" target="_blank">s.kramer@imperial.ac.uk</a>><br>

> wrote:<br>

><br>

>> Hi Mark<br>

>><br>

>> Thanks again for re-enabling the square graph aggressive coarsening<br>

>> option which seems to have restored performance for most of our cases.<br>

>> Unfortunately we do have a remaining issue, which only seems to occur<br>

>> for the larger mesh size ("level 7" which has 6,389,890 vertices and we<br>

>> normally run on 1536 cpus): we either get a "Petsc has generated<br>

>> inconsistent data" error, or a hang - both when constructing the square<br>

>> graph matrix. So this is with the new<br>

>> -pc_gamg_aggressive_square_graph=true option, without the option there's<br>

>> no error but of course we would get back to the worse performance.<br>

>><br>

>> Backtrace for the "inconsistent data" error. Note this is actually just<br>

>> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge<br>

>> of adams/gamg-add-old-coarsening into main - with one unrelated commit<br>

>> from firedrake<br>

>><br>

>> [0]PETSC ERROR: --------------------- Error Message<br>

>> --------------------------------------------------------------<br>

>> [0]PETSC ERROR: Petsc has generated inconsistent data<br>

>> [0]PETSC ERROR: j 8 not equal to expected number of sends 9<br>

>> [0]PETSC ERROR: Petsc Development GIT revision:<br>

>> v3.4.2-43104-ga3b76b71a1  GIT Date: 2023-09-18 10:26:04 +0100<br>

>> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a  named<br>

>> <a href="http://gadi-cpu-clx-0241.gadi.nci.org.au" rel="noreferrer" target="_blank">gadi-cpu-clx-0241.gadi.nci.org.au</a> by sck551 Wed Oct  4 14:30:41 2023<br>

>> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix<br>

>> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1<br>

>> --with-fortran-bindings=0 --with-zlib --with-c2html=0<br>

>> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx<br>

>> --with-fc=mpifort --download-hdf5 --download-hypre<br>

>> --download-superlu_dist --download-ptscotch --download-suitesparse<br>

>> --download-pastix --download-hwloc --download-metis --download-scalapack<br>

>> --download-mumps --download-chaco --download-ml<br>

>> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441<br>

>> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270<br>

>> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867<br>

>> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071<br>

>> [0]PETSC ERROR: #4 MatProductSymbolic() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795<br>

>> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489<br>

>> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969<br>

>> [0]PETSC ERROR: #7 PCSetUp_GAMG() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645<br>

>> [0]PETSC ERROR: #8 PCSetUp() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069<br>

>> [0]PETSC ERROR: #9 PCApply() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484<br>

>> [0]PETSC ERROR: #10 PCApply() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487<br>

>> [0]PETSC ERROR: #11 KSP_PCApply() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383<br>

>> [0]PETSC ERROR: #12 KSPSolve_CG() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162<br>

>> [0]PETSC ERROR: #13 KSPSolve_Private() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910<br>

>> [0]PETSC ERROR: #14 KSPSolve() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082<br>

>> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at<br>

>><br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175<br>

>> [0]PETSC ERROR: #16 PCApply() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487<br>

>> [0]PETSC ERROR: #17 KSP_PCApply() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383<br>

>> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25<br>

>> [0]PETSC ERROR: #19 KSPSolve_Private() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910<br>

>> [0]PETSC ERROR: #20 KSPSolve() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082<br>

>> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49<br>

>> [0]PETSC ERROR: #22 SNESSolve() at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635<br>

>><br>

>> Last -info :pc messages:<br>

>><br>

>> [0] <pc:gamg> PCSetUp(): Setting up PC for first time<br>

>> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0)<br>

>> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536<br>

>> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in<br>

>> graph (1.588710e+07 1.765233e+06)<br>

>> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:<br>

>> Square Graph on level 1<br>

>> [0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes<br>

>> [0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_:<br>

>> New grid 380144 nodes<br>

>> [0] <pc:gamg> PCGAMGOptProlongator_AGG():<br>

>> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00<br>

>> min=9.015236e-02 PC=jacobi<br>

>> [0] <pc:gamg> PCGAMGOptProlongator_AGG():<br>

>> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra<br>

>> 0.0901524 4.48938<br>

>> [0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_:<br>

>> Coarse grid reduction from 1536 to 1536 active processes<br>

>> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1)<br>

>> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes<br>

>> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in<br>

>> graph (5.310360e+05 5.353000e+03)<br>

>> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:<br>

>> Square Graph on level 2<br>

>><br>

>> The hang (on a slightly different model configuration but on the same<br>

>> mesh and n/o cores) seems to occur in the same location. If I use gdb to<br>

>> attach to the running processes, it seems on some cores it has somehow<br>

>> manages to fall out of the pcsetup and is waiting in the first norm<br>

>> calculation in the outside CG iteration:<br>

>><br>

>> #0  0x000014cce9999119 in<br>

>> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from<br>

>> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so<br>

>> #1  0x000014ccef2c2737 in _coll_ml_allreduce () from<br>

>> /apps/hcoll/4.7.3202/lib/libhcoll.so.1<br>

>> #2  0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1,<br>

>> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>,<br>

>> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110)<br>

>> at<br>

>><br>

>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228<br>

>> #3  0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1,<br>

>> recvbuf=<optimized out>, count=1, datatype=<optimized out>,<br>

>> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at pallreduce.c:113<br>

>> #4  0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>,<br>

>> type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>)<br>

>> at<br>

>><br>

>> /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168<br>

>> #5  VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39<br>

>> #6  0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648,<br>

>> val=0x22d) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214<br>

>> #7  0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163<br>

>> etc.<br>

>><br>

>> but with other cores still stuck at:<br>

>><br>

>> #0  0x000015375cf41e8a in ucp_worker_progress () from<br>

>> /apps/ucx/1.12.0/lib/libucp.so.0<br>

>> #1  0x000015377d4bd57b in opal_progress () at<br>

>><br>

>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231<br>

>> #2  0x000015377d4c3ba5 in ompi_sync_wait_mt<br>

>> (sync=sync@entry=0x7ffd6aedf6f0) at<br>

>><br>

>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85<br>

>> #3  0x000015378bf7cf38 in ompi_request_default_wait_any (count=8,<br>

>> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at<br>

>><br>

>> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124<br>

>> #4  0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0,<br>

>> indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86<br>

>> #5  0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ<br>

>> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884<br>

>> #6  0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ<br>

>> (C=0x2cc7500) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071<br>

>> #7  0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795<br>

>> #8  0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500,<br>

>> Gmat1=0x1, Gmat2=0xc0fe132c) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489<br>

>> #9  0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500,<br>

>> a_Gmat1=0x1, agg_lists=0xc0fe132c) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969<br>

>> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645<br>

>> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069<br>

>> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484<br>

>> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply<br>

>> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524,<br>

>> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082<br>

>> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS<br>

>> (func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>,<br>

>> kwnames=<optimized out>) at ../Objects/descrobject.c:405<br>

>> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0,<br>

>> nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890,<br>

>> tstate=0x23e0020) at ../Include/cpython/abstract.h:114<br>

>> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>,<br>

>> args=0x83b3218, callable=0x15378f302890) at<br>

>> ../Include/cpython/abstract.h:123<br>

>> #17 call_function (kwnames=0x0, oparg=<optimized out>,<br>

>> pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390,<br>

>> tstate=<optimized out>) at ../Python/ceval.c:5867<br>

>> #18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,<br>

>> throwflag=<optimized out>) at ../Python/ceval.c:4198<br>

>> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080,<br>

>> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46<br>

>> #20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>,<br>

>> locals=<optimized out>, args=<optimized out>, argcount=4,<br>

>> kwnames=<optimized out>) at ../Python/ceval.c:5065<br>

>> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized<br>

>> out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at<br>

>> src/petsc4py/PETSc.c:548022<br>

>> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500,<br>

>> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979<br>

>> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487<br>

>> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1,<br>

>> y=0xc0fe132c) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383<br>

>> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at<br>

>> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162<br>

>><br>

>> Let me know if there is anything further we can try to debug this issue<br>

>><br>

>> Kind regards<br>

>> Stephan Kramer<br>

>><br>

>><br>

>> On 02/09/2023 01:58, Mark Adams wrote:<br>

>>> Fantastic!<br>

>>><br>

>>> I fixed a memory free problem. You should be OK now.<br>

>>> I am pretty sure you are good but I would like to wait to get any<br>

>> feedback<br>

>>> from you.<br>

>>> We should have a release at the end of the month and it would be nice to<br>

>>> get this into it.<br>

>>><br>

>>> Thanks,<br>

>>> Mark<br>

>>><br>

>>><br>

>>> On Fri, Sep 1, 2023 at 7:07 AM Stephan Kramer <<a href="mailto:s.kramer@imperial.ac.uk" target="_blank">s.kramer@imperial.ac.uk</a>><br>

>>> wrote:<br>

>>><br>

>>>> Hi Mark<br>

>>>><br>

>>>> Sorry took a while to report back. We have tried your branch but hit a<br>

>>>> few issues, some of which we're not entirely sure are related.<br>

>>>><br>

>>>> First switching off minimum degree ordering, and then switching to the<br>

>>>> old version of aggressive coarsening, as you suggested, got us back to<br>

>>>> the coarsening behaviour that we had previously, but then we also<br>

>>>> observed an even further worsening of the iteration count: it had<br>

>>>> previously gone up by 50% already (with the newer main petsc), but now<br>

>>>> was more than double "old" petsc. Took us a while to realize this was<br>

>>>> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.<br>

>>>> Switching this also back to the old default we get back to very similar<br>

>>>> coarsening levels (see below for more details if it is of interest) and<br>

>>>> iteration counts.<br>

>>>><br>

>>>> So that's all very good news. However, we were also starting seeing<br>

>>>> memory errors (double free or corruption) when we switched off the<br>

>>>> minimum degree ordering. Because this was at an earlier version of your<br>

>>>> branch we then rebuild, hoping this was just an earlier bug that had<br>

>>>> been fixed, but then we were having MPI-lockup issues. We have now<br>

>>>> figured out the MPI issues are completely unrelated - some combination<br>

>>>> with a newer mpi build and firedrake on our cluster which also occur<br>

>>>> using main branches of everything. So switching back to an older MPI<br>

>>>> build we are hoping to now test your most recent version of<br>

>>>> adams/gamg-add-old-coarsening with these options and see whether the<br>

>>>> memory errors are still there. Will let you know<br>

>>>><br>

>>>> Best wishes<br>

>>>> Stephan Kramer<br>

>>>><br>

>>>> Coarsening details with various options for Level 6 of the test case:<br>

>>>><br>

>>>> In our original setup (using "old" petsc), we had:<br>

>>>><br>

>>>>              rows=516, cols=516, bs=6<br>

>>>>              rows=12660, cols=12660, bs=6<br>

>>>>              rows=346974, cols=346974, bs=6<br>

>>>>              rows=19169670, cols=19169670, bs=3<br>

>>>><br>

>>>> Then with the newer main petsc we had<br>

>>>><br>

>>>>              rows=666, cols=666, bs=6<br>

>>>>              rows=7740, cols=7740, bs=6<br>

>>>>              rows=34902, cols=34902, bs=6<br>

>>>>              rows=736578, cols=736578, bs=6<br>

>>>>              rows=19169670, cols=19169670, bs=3<br>

>>>><br>

>>>> Then on your branch with minimum_degree_ordering False:<br>

>>>><br>

>>>>              rows=504, cols=504, bs=6<br>

>>>>              rows=2274, cols=2274, bs=6<br>

>>>>              rows=11010, cols=11010, bs=6<br>

>>>>              rows=35790, cols=35790, bs=6<br>

>>>>              rows=430686, cols=430686, bs=6<br>

>>>>              rows=19169670, cols=19169670, bs=3<br>

>>>><br>

>>>> And with minimum_degree_ordering False and use_aggressive_square_graph<br>

>>>> True:<br>

>>>><br>

>>>>              rows=498, cols=498, bs=6<br>

>>>>              rows=12672, cols=12672, bs=6<br>

>>>>              rows=346974, cols=346974, bs=6<br>

>>>>              rows=19169670, cols=19169670, bs=3<br>

>>>><br>

>>>> So that is indeed pretty much back to what it was before<br>

>>>><br>

>>>><br>

>>>><br>

>>>><br>

>>>><br>

>>>><br>

>>>><br>

>>>><br>

>>>> On 31/08/2023 23:40, Mark Adams wrote:<br>

>>>>> Hi Stephan,<br>

>>>>><br>

>>>>> This branch is settling down.  adams/gamg-add-old-coarsening<br>

>>>>> <<br>

>> <a href="https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening" rel="noreferrer" target="_blank">https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening</a>><br>

>>>>> I made the old, not minimum degree, ordering the default but kept the<br>

>> new<br>

>>>>> "aggressive" coarsening as the default, so I am hoping that just adding<br>

>>>>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests<br>

>> will<br>

>>>>> get you back to where you were before.<br>

>>>>> Fingers crossed ... let me know if you have any success or not.<br>

>>>>><br>

>>>>> Thanks,<br>

>>>>> Mark<br>

>>>>><br>

>>>>><br>

>>>>> On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

>>>>><br>

>>>>>> Hi Stephan,<br>

>>>>>><br>

>>>>>> I have a branch that you can try: adams/gamg-add-old-coarsening<br>

>>>>>> <<br>

>> <a href="https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening" rel="noreferrer" target="_blank">https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening</a><br>

>>>>>> Things to test:<br>

>>>>>> * First, verify that nothing unintended changed by reproducing your<br>

>> bad<br>

>>>>>> results with this branch (the defaults are the same)<br>

>>>>>> * Try not using the minimum degree ordering that I suggested<br>

>>>>>> with: -pc_gamg_use_minimum_degree_ordering false<br>

>>>>>>      -- I am eager to see if that is the main problem.<br>

>>>>>> * Go back to what I think is the old method:<br>

>>>>>> -pc_gamg_use_minimum_degree_ordering<br>

>>>>>> false -pc_gamg_use_aggressive_square_graph true<br>

>>>>>><br>

>>>>>> When we get back to where you were, I would like to try to get modern<br>

>>>>>> stuff working.<br>

>>>>>> I did add a -pc_gamg_aggressive_mis_k <2><br>

>>>>>> You could to another step of MIS coarsening with<br>

>>>> -pc_gamg_aggressive_mis_k<br>

>>>>>> 3<br>

>>>>>><br>

>>>>>> Anyway, lots to look at but, alas, AMG does have a lot of parameters.<br>

>>>>>><br>

>>>>>> Thanks,<br>

>>>>>> Mark<br>

>>>>>><br>

>>>>>> On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>

>>>>>><br>

>>>>>>> On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <<br>

>>>> <a href="mailto:s.kramer@imperial.ac.uk" target="_blank">s.kramer@imperial.ac.uk</a>><br>

>>>>>>> wrote:<br>

>>>>>>><br>

>>>>>>>> Many thanks for looking into this, Mark<br>

>>>>>>>>> My 3D tests were not that different and I see you lowered the<br>

>>>>>>>> threshold.<br>

>>>>>>>>> Note, you can set the threshold to zero, but your test is running<br>

>> so<br>

>>>>>>>> much<br>

>>>>>>>>> differently than mine there is something else going on.<br>

>>>>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to<br>

>> shoot<br>

>>>>>>>> for<br>

>>>>>>>>> in 3D.<br>

>>>>>>>>><br>

>>>>>>>>> So it is not clear what the problem is.  Some questions:<br>

>>>>>>>>><br>

>>>>>>>>> * do you have a picture of this mesh to show me?<br>

>>>>>>>> It's just a standard hexahedral cubed sphere mesh with the<br>

>> refinement<br>

>>>>>>>> level giving the number of times each of the six sides have been<br>

>>>>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to<br>

>> 16<br>

>>>>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x<br>

>>>> 16 =<br>

>>>>>>>> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3)<br>

>>>> going<br>

>>>>>>>> to the next Level<br>

>>>>>>>><br>

>>>>>>> I see, and I assume these are pretty stretched elements.<br>

>>>>>>><br>

>>>>>>><br>

>>>>>>>>> * what do you mean by Q1-Q2 elements?<br>

>>>>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for<br>

>> velocity<br>

>>>>>>>> and (tri)linear for pressure<br>

>>>>>>>><br>

>>>>>>>> I guess you could argue we could/should just do good old geometric<br>

>>>>>>>> multigrid instead. More generally we do use this solver<br>

>> configuration<br>

>>>> a<br>

>>>>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our<br>

>>>>>>>> adaptive mesh runs - would it be worth to see if we have the same<br>

>>>>>>>> performance issues with tetrahedral P2-P1?<br>

>>>>>>>><br>

>>>>>>> No, you have a clear reproducer, if not minimal.<br>

>>>>>>> The first coarsening is very different.<br>

>>>>>>><br>

>>>>>>> I am working on this and I see that I added a heuristic for thin<br>

>> bodies<br>

>>>>>>> where you order the vertices in greedy algorithms with minimum degree<br>

>>>> first.<br>

>>>>>>> This will tend to pick corners first, edges then faces, etc.<br>

>>>>>>> That may be the problem. I would like to understand it better (see<br>

>>>> below).<br>

>>>>>>><br>

>>>>>>>>> It would be nice to see if the new and old codes are similar<br>

>> without<br>

>>>>>>>>> aggressive coarsening.<br>

>>>>>>>>> This was the intended change of the major change in this time frame<br>

>>>> as<br>

>>>>>>>> you<br>

>>>>>>>>> noticed.<br>

>>>>>>>>> If these jobs are easy to run, could you check that the old and new<br>

>>>>>>>>> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you<br>

>>>> only<br>

>>>>>>>> need<br>

>>>>>>>>> one time step).<br>

>>>>>>>>> All you need to do is check that the first coarse grid has about<br>

>> the<br>

>>>>>>>> same<br>

>>>>>>>>> number of equations (large).<br>

>>>>>>>> Unfortunately we're seeing some memory errors when we use this<br>

>> option,<br>

>>>>>>>> and I'm not entirely clear whether we're just running out of memory<br>

>>>> and<br>

>>>>>>>> need to put it on a special queue.<br>

>>>>>>>><br>

>>>>>>>> The run with square_graph 0 using new PETSc managed to get through<br>

>> one<br>

>>>>>>>> solve at level 5, and is giving the following mg levels:<br>

>>>>>>>><br>

>>>>>>>>             rows=174, cols=174, bs=6<br>

>>>>>>>>               total: nonzeros=30276, allocated nonzeros=30276<br>

>>>>>>>> --<br>

>>>>>>>>               rows=2106, cols=2106, bs=6<br>

>>>>>>>>               total: nonzeros=4238532, allocated nonzeros=4238532<br>

>>>>>>>> --<br>

>>>>>>>>               rows=21828, cols=21828, bs=6<br>

>>>>>>>>               total: nonzeros=62588232, allocated nonzeros=62588232<br>

>>>>>>>> --<br>

>>>>>>>>               rows=589824, cols=589824, bs=6<br>

>>>>>>>>               total: nonzeros=1082528928, allocated<br>

>> nonzeros=1082528928<br>

>>>>>>>> --<br>

>>>>>>>>               rows=2433222, cols=2433222, bs=3<br>

>>>>>>>>               total: nonzeros=456526098, allocated nonzeros=456526098<br>

>>>>>>>><br>

>>>>>>>> comparing with square_graph 100 with new PETSc<br>

>>>>>>>><br>

>>>>>>>>               rows=96, cols=96, bs=6<br>

>>>>>>>>               total: nonzeros=9216, allocated nonzeros=9216<br>

>>>>>>>> --<br>

>>>>>>>>               rows=1440, cols=1440, bs=6<br>

>>>>>>>>               total: nonzeros=647856, allocated nonzeros=647856<br>

>>>>>>>> --<br>

>>>>>>>>               rows=97242, cols=97242, bs=6<br>

>>>>>>>>               total: nonzeros=65656836, allocated nonzeros=65656836<br>

>>>>>>>> --<br>

>>>>>>>>               rows=2433222, cols=2433222, bs=3<br>

>>>>>>>>               total: nonzeros=456526098, allocated nonzeros=456526098<br>

>>>>>>>><br>

>>>>>>>> and old PETSc with square_graph 100<br>

>>>>>>>><br>

>>>>>>>>               rows=90, cols=90, bs=6<br>

>>>>>>>>               total: nonzeros=8100, allocated nonzeros=8100<br>

>>>>>>>> --<br>

>>>>>>>>               rows=1872, cols=1872, bs=6<br>

>>>>>>>>               total: nonzeros=1234080, allocated nonzeros=1234080<br>

>>>>>>>> --<br>

>>>>>>>>               rows=47652, cols=47652, bs=6<br>

>>>>>>>>               total: nonzeros=23343264, allocated nonzeros=23343264<br>

>>>>>>>> --<br>

>>>>>>>>               rows=2433222, cols=2433222, bs=3<br>

>>>>>>>>               total: nonzeros=456526098, allocated nonzeros=456526098<br>

>>>>>>>> --<br>

>>>>>>>><br>

>>>>>>>> Unfortunately old PETSc with square_graph 0 did not complete a<br>

>> single<br>

>>>>>>>> solve before giving the memory error<br>

>>>>>>>><br>

>>>>>>> OK, thanks for trying.<br>

>>>>>>><br>

>>>>>>> I am working on this and I will give you a branch to test, but if you<br>

>>>> can<br>

>>>>>>> rebuild PETSc here is a quick test that might fix your problem.<br>

>>>>>>> In src/ksp/pc/impls/gamg/agg.c you will see:<br>

>>>>>>><br>

>>>>>>>        PetscCall(PetscSortIntWithArray(nloc, degree, permute));<br>

>>>>>>><br>

>>>>>>> If you can comment this out in the new code and compare with the old,<br>

>>>>>>> that might fix the problem.<br>

>>>>>>><br>

>>>>>>> Thanks,<br>

>>>>>>> Mark<br>

>>>>>>><br>

>>>>>>><br>

>>>>>>>>> BTW, I am starting to think I should add the old method back as an<br>

>>>>>>>> option.<br>

>>>>>>>>> I did not think this change would cause large differences.<br>

>>>>>>>> Yes, I think that would be much appreciated. Let us know if we can<br>

>> do<br>

>>>>>>>> any testing<br>

>>>>>>>><br>

>>>>>>>> Best wishes<br>

>>>>>>>> Stephan<br>

>>>>>>>><br>

>>>>>>>><br>

>>>>>>>>> Thanks,<br>

>>>>>>>>> Mark<br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>>> Note that we are providing the rigid body near nullspace,<br>

>>>>>>>>>> hence the bs=3 to bs=6.<br>

>>>>>>>>>> We have tried different values for the gamg_threshold but it<br>

>> doesn't<br>

>>>>>>>>>> really seem to significantly alter the coarsening amount in that<br>

>>>> first<br>

>>>>>>>>>> step.<br>

>>>>>>>>>><br>

>>>>>>>>>> Do you have any suggestions for further things we should try/look<br>

>>>> at?<br>

>>>>>>>>>> Any feedback would be much appreciated<br>

>>>>>>>>>><br>

>>>>>>>>>> Best wishes<br>

>>>>>>>>>> Stephan Kramer<br>

>>>>>>>>>><br>

>>>>>>>>>> Full logs including log_view timings available from<br>

>>>>>>>>>> <a href="https://github.com/stephankramer/petsc-scaling/" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/</a><br>

>>>>>>>>>><br>

>>>>>>>>>> In particular:<br>

>>>>>>>>>><br>

>>>>>>>>>><br>

>>>>>>>>>><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat</a><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat</a><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat</a><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat</a><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat</a><br>

>> <a href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat" rel="noreferrer" target="_blank">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat</a><br>

>><br>

<br>

</blockquote></div>

</blockquote></div>

</div></blockquote></body></html>