<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><br><div><blockquote type="cite"><div>On 11 Oct 2023, at 6:41 AM, Stephan Kramer <s.kramer@imperial.ac.uk> wrote:</div><br class="Apple-interchange-newline"><div>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div>
On 07/10/2023 06:51, Pierre Jolivet wrote:<br>
<blockquote type="cite" cite="mid:6869AB4E-1343-4D8B-91A9-7206CFE72165@joliv.et">
<pre class="moz-quote-pre" wrap="">Hello Stephan,
Could you share the Amat/Pmat in binary format of the specific fieldsplit block, as well as all inputs needed to generate the same grid hierarchy (block size, options used, near kernel)?
Alternatively, have you been able to generate the same error in a plain PETSc example?</pre>
</blockquote>
I could but unfortunately, as Mark indicated, we only see this on on
a very large system, run on 1536 cores. The matrix dump appears to
be 300G. If you want I could try make it available but I imagine
it's not the most practical thing.<br></div></div></blockquote><div><br></div><div>It should be OK on my end.</div><br><blockquote type="cite"><div><div>
We have tried the one line change you suggested below and it indeed
prevents the problem - i.e. on the <span style="color: rgb(38, 38,
38); font-family: "Source Sans 3 VF", sans-serif;
font-size: 14px; font-style: normal; font-variant-ligatures:
normal; font-variant-caps: normal; font-weight: 400;
letter-spacing: normal; orphans: 2; text-align: start;
text-indent: 0px; text-transform: none; widows: 2; word-spacing:
0px; -webkit-text-stroke-width: 0px; white-space: normal;
background-color: rgb(255, 255, 255); text-decoration-thickness:
initial; text-decoration-style: initial; text-decoration-color:
initial; display: inline !important; float: none;"><span></span>adams/gamg-fast-filter
branch we get the "inconsistent data" error with
-pc_gamg_low_memory_filter True but not if we change that line as
suggested<br></span></div></div></blockquote><div><br></div><div>OK, then that means the bug is indeed pretty localized.</div><div>Either MatEliminateZeros(), MatDuplicate(), or MatHeaderReplace().</div><div>Hong (Mr.), do you think there is something missing in MatEliminateZeros_MPIAIJ()? Maybe a call to MatDisAssemble_MPIAIJ() — I have no idea what this function does.</div><br><blockquote type="cite"><div><div><span style="color: rgb(38, 38,
38); font-family: "Source Sans 3 VF", sans-serif;
font-size: 14px; font-style: normal; font-variant-ligatures:
normal; font-variant-caps: normal; font-weight: 400;
letter-spacing: normal; orphans: 2; text-align: start;
text-indent: 0px; text-transform: none; widows: 2; word-spacing:
0px; -webkit-text-stroke-width: 0px; white-space: normal;
background-color: rgb(255, 255, 255); text-decoration-thickness:
initial; text-decoration-style: initial; text-decoration-color:
initial; display: inline !important; float: none;">
Note that for our uses, we're happy to just not use the low memory
filter (as is now the default in main), but let us know if we can
provide any further help<br></span></div></div></blockquote><div><br></div><div>I’m not happy with the same function being twice in the library, and having an “improved” version only available to a part of the library.</div><div>I’m also not happy with GAMG having tons of MatAIJ-specific code, which makes it unusable with other MatType, e.g., we can’t even use MatBAIJ or MatSBAIJ whereas PCHYPRE works even though it’s an external package (a good use case here would have been to ask you to use a MatBAIJ with bs = 1 to incriminate MatEliminateZeros_MPIAIJ() or not, but we can’t).</div><div>But that’s just my opinion.</div><div><br></div><div>Thanks,</div><div>Pierre</div><br><blockquote type="cite"><div><div><span style="color: rgb(38, 38,
38); font-family: "Source Sans 3 VF", sans-serif;
font-size: 14px; font-style: normal; font-variant-ligatures:
normal; font-variant-caps: normal; font-weight: 400;
letter-spacing: normal; orphans: 2; text-align: start;
text-indent: 0px; text-transform: none; widows: 2; word-spacing:
0px; -webkit-text-stroke-width: 0px; white-space: normal;
background-color: rgb(255, 255, 255); text-decoration-thickness:
initial; text-decoration-style: initial; text-decoration-color:
initial; display: inline !important; float: none;">
Thanks<br>
Stephan<br>
<br>
</span>
<blockquote type="cite" cite="mid:6869AB4E-1343-4D8B-91A9-7206CFE72165@joliv.et">
<pre class="moz-quote-pre" wrap="">
I’m suspecting a bug in MatEliminateZeros(). If you have the chance to, could you please edit src/mat/impls/aij/mpi/mpiaij.c, change the line that looks like:
PetscCall(MatFilter(Gmat, filter, PETSC_TRUE, PETSC_TRUE));
Into:
PetscCall(MatFilter(Gmat, filter, PETSC_FALSE, PETSC_TRUE));
And give that a go? It will be extremely memory-inefficient, but this is just to confirm my intuition.
Thanks,
Pierre
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 6 Oct 2023, at 1:22 AM, Stephan Kramer <a class="moz-txt-link-rfc2396E" href="mailto:s.kramer@imperial.ac.uk"><s.kramer@imperial.ac.uk></a> wrote:
Great, that seems to fix the issue indeed - i.e. on the branch with the low memory filtering switched off (by default) we no longer see the "inconsistent data" error or hangs, and going back to the square graph aggressive coarsening brings us back the old performance. So we'd be keen to have that branch merged indeed
Many thanks for your assistance with this
Stephan
On 05/10/2023 01:11, Mark Adams wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Thanks Stephan,
It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat
is waiting for messages that were not sent. A bug.
Can you try my branch, which is ready to merge, adams/gamg-fast-filter.
We added a new filtering method in main that uses low memory but I found it
was slow, so this branch brings back the old filter code, used by default,
and keeps the low memory version as an option.
It is possible this low memory filtering messed up the internals of the Mat
in some way.
I hope this is it, but if not we can continue.
This MR also makes square graph the default.
I have found it does create better aggregates and on GPUs, with Kokkos bug
fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs)
Mark
On Wed, Oct 4, 2023 at 12:30 AM Stephan Kramer <a class="moz-txt-link-rfc2396E" href="mailto:s.kramer@imperial.ac.uk"><s.kramer@imperial.ac.uk></a>
wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi Mark
Thanks again for re-enabling the square graph aggressive coarsening
option which seems to have restored performance for most of our cases.
Unfortunately we do have a remaining issue, which only seems to occur
for the larger mesh size ("level 7" which has 6,389,890 vertices and we
normally run on 1536 cpus): we either get a "Petsc has generated
inconsistent data" error, or a hang - both when constructing the square
graph matrix. So this is with the new
-pc_gamg_aggressive_square_graph=true option, without the option there's
no error but of course we would get back to the worse performance.
Backtrace for the "inconsistent data" error. Note this is actually just
petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge
of adams/gamg-add-old-coarsening into main - with one unrelated commit
from firedrake
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Petsc has generated inconsistent data
[0]PETSC ERROR: j 8 not equal to expected number of sends 9
[0]PETSC ERROR: Petsc Development GIT revision:
v3.4.2-43104-ga3b76b71a1 GIT Date: 2023-09-18 10:26:04 +0100
[0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a named
gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct 4 14:30:41 2023
[0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix
--with-make-np=4 --with-debugging=0 --with-shared-libraries=1
--with-fortran-bindings=0 --with-zlib --with-c2html=0
--with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx
--with-fc=mpifort --download-hdf5 --download-hypre
--download-superlu_dist --download-ptscotch --download-suitesparse
--download-pastix --download-hwloc --download-metis --download-scalapack
--download-mumps --download-chaco --download-ml
CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441
[0]PETSC ERROR: #1 PetscGatherMessageLengths2() at
/jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270
[0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867
[0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
[0]PETSC ERROR: #4 MatProductSymbolic() at
/jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
[0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
[0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
[0]PETSC ERROR: #7 PCSetUp_GAMG() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
[0]PETSC ERROR: #8 PCSetUp() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
[0]PETSC ERROR: #9 PCApply() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
[0]PETSC ERROR: #10 PCApply() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
[0]PETSC ERROR: #11 KSP_PCApply() at
/jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
[0]PETSC ERROR: #12 KSPSolve_CG() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
[0]PETSC ERROR: #13 KSPSolve_Private() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
[0]PETSC ERROR: #14 KSPSolve() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
[0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175
[0]PETSC ERROR: #16 PCApply() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
[0]PETSC ERROR: #17 KSP_PCApply() at
/jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
[0]PETSC ERROR: #18 KSPSolve_PREONLY() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25
[0]PETSC ERROR: #19 KSPSolve_Private() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
[0]PETSC ERROR: #20 KSPSolve() at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
[0]PETSC ERROR: #21 SNESSolve_KSPONLY() at
/jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49
[0]PETSC ERROR: #22 SNESSolve() at
/jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635
Last -info :pc messages:
[0] <pc:gamg> PCSetUp(): Setting up PC for first time
[0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0)
N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536
[0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in
graph (1.588710e+07 1.765233e+06)
[0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
Square Graph on level 1
[0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes
[0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_:
New grid 380144 nodes
[0] <pc:gamg> PCGAMGOptProlongator_AGG():
Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00
min=9.015236e-02 PC=jacobi
[0] <pc:gamg> PCGAMGOptProlongator_AGG():
Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra
0.0901524 4.48938
[0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_:
Coarse grid reduction from 1536 to 1536 active processes
[0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1)
N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes
[0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in
graph (5.310360e+05 5.353000e+03)
[0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
Square Graph on level 2
The hang (on a slightly different model configuration but on the same
mesh and n/o cores) seems to occur in the same location. If I use gdb to
attach to the running processes, it seems on some cores it has somehow
manages to fall out of the pcsetup and is waiting in the first norm
calculation in the outside CG iteration:
#0 0x000014cce9999119 in
hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from
/apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so
#1 0x000014ccef2c2737 in _coll_ml_allreduce () from
/apps/hcoll/4.7.3202/lib/libhcoll.so.1
#2 0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1,
rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>,
op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110)
at
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228
#3 0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1,
recvbuf=<optimized out>, count=1, datatype=<optimized out>,
op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at pallreduce.c:113
#4 0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>,
type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>)
at
/jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168
#5 VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at
/jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39
#6 0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648,
val=0x22d) at
/jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214
#7 0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163
etc.
but with other cores still stuck at:
#0 0x000015375cf41e8a in ucp_worker_progress () from
/apps/ucx/1.12.0/lib/libucp.so.0
#1 0x000015377d4bd57b in opal_progress () at
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231
#2 0x000015377d4c3ba5 in ompi_sync_wait_mt
(sync=sync@entry=0x7ffd6aedf6f0) at
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85
#3 0x000015378bf7cf38 in ompi_request_default_wait_any (count=8,
requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at
/jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124
#4 0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0,
indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86
#5 0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ
(P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884
#6 0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ
(C=0x2cc7500) at
/jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
#7 0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at
/jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
#8 0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500,
Gmat1=0x1, Gmat2=0xc0fe132c) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
#9 0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500,
a_Gmat1=0x1, agg_lists=0xc0fe132c) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
#10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
#11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
#12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
#13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply
(__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524,
__pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082
#14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS
(func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>,
kwnames=<optimized out>) at ../Objects/descrobject.c:405
#15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0,
nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890,
tstate=0x23e0020) at ../Include/cpython/abstract.h:114
#16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>,
args=0x83b3218, callable=0x15378f302890) at
../Include/cpython/abstract.h:123
#17 call_function (kwnames=0x0, oparg=<optimized out>,
pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390,
tstate=<optimized out>) at ../Python/ceval.c:5867
#18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,
throwflag=<optimized out>) at ../Python/ceval.c:4198
#19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080,
tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46
#20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>,
locals=<optimized out>, args=<optimized out>, argcount=4,
kwnames=<optimized out>) at ../Python/ceval.c:5065
#21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized
out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at
src/petsc4py/PETSc.c:548022
#22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500,
__pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979
#23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
#24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1,
y=0xc0fe132c) at
/jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
#25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at
/jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
Let me know if there is anything further we can try to debug this issue
Kind regards
Stephan Kramer
On 02/09/2023 01:58, Mark Adams wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Fantastic!
I fixed a memory free problem. You should be OK now.
I am pretty sure you are good but I would like to wait to get any
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">feedback
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">from you.
We should have a release at the end of the month and it would be nice to
get this into it.
Thanks,
Mark
On Fri, Sep 1, 2023 at 7:07 AM Stephan Kramer <a class="moz-txt-link-rfc2396E" href="mailto:s.kramer@imperial.ac.uk"><s.kramer@imperial.ac.uk></a>
wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi Mark
Sorry took a while to report back. We have tried your branch but hit a
few issues, some of which we're not entirely sure are related.
First switching off minimum degree ordering, and then switching to the
old version of aggressive coarsening, as you suggested, got us back to
the coarsening behaviour that we had previously, but then we also
observed an even further worsening of the iteration count: it had
previously gone up by 50% already (with the newer main petsc), but now
was more than double "old" petsc. Took us a while to realize this was
due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.
Switching this also back to the old default we get back to very similar
coarsening levels (see below for more details if it is of interest) and
iteration counts.
So that's all very good news. However, we were also starting seeing
memory errors (double free or corruption) when we switched off the
minimum degree ordering. Because this was at an earlier version of your
branch we then rebuild, hoping this was just an earlier bug that had
been fixed, but then we were having MPI-lockup issues. We have now
figured out the MPI issues are completely unrelated - some combination
with a newer mpi build and firedrake on our cluster which also occur
using main branches of everything. So switching back to an older MPI
build we are hoping to now test your most recent version of
adams/gamg-add-old-coarsening with these options and see whether the
memory errors are still there. Will let you know
Best wishes
Stephan Kramer
Coarsening details with various options for Level 6 of the test case:
In our original setup (using "old" petsc), we had:
rows=516, cols=516, bs=6
rows=12660, cols=12660, bs=6
rows=346974, cols=346974, bs=6
rows=19169670, cols=19169670, bs=3
Then with the newer main petsc we had
rows=666, cols=666, bs=6
rows=7740, cols=7740, bs=6
rows=34902, cols=34902, bs=6
rows=736578, cols=736578, bs=6
rows=19169670, cols=19169670, bs=3
Then on your branch with minimum_degree_ordering False:
rows=504, cols=504, bs=6
rows=2274, cols=2274, bs=6
rows=11010, cols=11010, bs=6
rows=35790, cols=35790, bs=6
rows=430686, cols=430686, bs=6
rows=19169670, cols=19169670, bs=3
And with minimum_degree_ordering False and use_aggressive_square_graph
True:
rows=498, cols=498, bs=6
rows=12672, cols=12672, bs=6
rows=346974, cols=346974, bs=6
rows=19169670, cols=19169670, bs=3
So that is indeed pretty much back to what it was before
On 31/08/2023 23:40, Mark Adams wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi Stephan,
This branch is settling down. adams/gamg-add-old-coarsening
<
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap=""><a class="moz-txt-link-freetext" href="https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening">https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening</a>>
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I made the old, not minimum degree, ordering the default but kept the
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">new
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">"aggressive" coarsening as the default, so I am hoping that just adding
"-pc_gamg_use_aggressive_square_graph true" to your regression tests
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">will
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">get you back to where you were before.
Fingers crossed ... let me know if you have any success or not.
Thanks,
Mark
On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <a class="moz-txt-link-rfc2396E" href="mailto:mfadams@lbl.gov"><mfadams@lbl.gov></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi Stephan,
I have a branch that you can try: adams/gamg-add-old-coarsening
<
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap=""><a class="moz-txt-link-freetext" href="https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening">https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening</a>
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Things to test:
* First, verify that nothing unintended changed by reproducing your
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">bad
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">results with this branch (the defaults are the same)
* Try not using the minimum degree ordering that I suggested
with: -pc_gamg_use_minimum_degree_ordering false
-- I am eager to see if that is the main problem.
* Go back to what I think is the old method:
-pc_gamg_use_minimum_degree_ordering
false -pc_gamg_use_aggressive_square_graph true
When we get back to where you were, I would like to try to get modern
stuff working.
I did add a -pc_gamg_aggressive_mis_k <2>
You could to another step of MIS coarsening with
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">-pc_gamg_aggressive_mis_k
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">3
Anyway, lots to look at but, alas, AMG does have a lot of parameters.
Thanks,
Mark
On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <a class="moz-txt-link-rfc2396E" href="mailto:mfadams@lbl.gov"><mfadams@lbl.gov></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap=""><a class="moz-txt-link-abbreviated" href="mailto:s.kramer@imperial.ac.uk">s.kramer@imperial.ac.uk</a>>
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Many thanks for looking into this, Mark
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">My 3D tests were not that different and I see you lowered the
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">threshold.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Note, you can set the threshold to zero, but your test is running
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">so
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">much
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">differently than mine there is something else going on.
Note, the new, bad, coarsening rate of 30:1 is what we tend to
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">shoot
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">for
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">in 3D.
So it is not clear what the problem is. Some questions:
* do you have a picture of this mesh to show me?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">It's just a standard hexahedral cubed sphere mesh with the
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">refinement
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">level giving the number of times each of the six sides have been
subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">16
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">layers. So the total number of elements at Level_5 is 6 x 32 x 32 x
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">16 =
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">98304 hexes. And everything doubles in all 3 dimensions (so 2^3)
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">going
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">to the next Level
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">I see, and I assume these are pretty stretched elements.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">* what do you mean by Q1-Q2 elements?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">velocity
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">and (tri)linear for pressure
I guess you could argue we could/should just do good old geometric
multigrid instead. More generally we do use this solver
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">configuration
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">a
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
adaptive mesh runs - would it be worth to see if we have the same
performance issues with tetrahedral P2-P1?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">No, you have a clear reproducer, if not minimal.
The first coarsening is very different.
I am working on this and I see that I added a heuristic for thin
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">bodies
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">where you order the vertices in greedy algorithms with minimum degree
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">first.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">This will tend to pick corners first, edges then faces, etc.
That may be the problem. I would like to understand it better (see
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">below).
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap=""></pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">It would be nice to see if the new and old codes are similar
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">without
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">aggressive coarsening.
This was the intended change of the major change in this time frame
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">as
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">you
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">noticed.
If these jobs are easy to run, could you check that the old and new
versions are similar with "-pc_gamg_square_graph 0 ", ( and you
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">only
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">need
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">one time step).
All you need to do is check that the first coarse grid has about
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">the
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">same
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">number of equations (large).
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Unfortunately we're seeing some memory errors when we use this
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">option,
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">and I'm not entirely clear whether we're just running out of memory
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">and
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">need to put it on a special queue.
The run with square_graph 0 using new PETSc managed to get through
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">one
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">solve at level 5, and is giving the following mg levels:
rows=174, cols=174, bs=6
total: nonzeros=30276, allocated nonzeros=30276
--
rows=2106, cols=2106, bs=6
total: nonzeros=4238532, allocated nonzeros=4238532
--
rows=21828, cols=21828, bs=6
total: nonzeros=62588232, allocated nonzeros=62588232
--
rows=589824, cols=589824, bs=6
total: nonzeros=1082528928, allocated
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">nonzeros=1082528928
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">--
rows=2433222, cols=2433222, bs=3
total: nonzeros=456526098, allocated nonzeros=456526098
comparing with square_graph 100 with new PETSc
rows=96, cols=96, bs=6
total: nonzeros=9216, allocated nonzeros=9216
--
rows=1440, cols=1440, bs=6
total: nonzeros=647856, allocated nonzeros=647856
--
rows=97242, cols=97242, bs=6
total: nonzeros=65656836, allocated nonzeros=65656836
--
rows=2433222, cols=2433222, bs=3
total: nonzeros=456526098, allocated nonzeros=456526098
and old PETSc with square_graph 100
rows=90, cols=90, bs=6
total: nonzeros=8100, allocated nonzeros=8100
--
rows=1872, cols=1872, bs=6
total: nonzeros=1234080, allocated nonzeros=1234080
--
rows=47652, cols=47652, bs=6
total: nonzeros=23343264, allocated nonzeros=23343264
--
rows=2433222, cols=2433222, bs=3
total: nonzeros=456526098, allocated nonzeros=456526098
--
Unfortunately old PETSc with square_graph 0 did not complete a
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">single
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">solve before giving the memory error
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">OK, thanks for trying.
I am working on this and I will give you a branch to test, but if you
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">can
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">rebuild PETSc here is a quick test that might fix your problem.
In src/ksp/pc/impls/gamg/agg.c you will see:
PetscCall(PetscSortIntWithArray(nloc, degree, permute));
If you can comment this out in the new code and compare with the old,
that might fix the problem.
Thanks,
Mark
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">BTW, I am starting to think I should add the old method back as an
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">option.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I did not think this change would cause large differences.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Yes, I think that would be much appreciated. Let us know if we can
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">do
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">any testing
Best wishes
Stephan
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Thanks,
Mark
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Note that we are providing the rigid body near nullspace,
hence the bs=3 to bs=6.
We have tried different values for the gamg_threshold but it
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">doesn't
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">really seem to significantly alter the coarsening amount in that
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">first
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">step.
Do you have any suggestions for further things we should try/look
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">at?
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Any feedback would be much appreciated
Best wishes
Stephan Kramer
Full logs including log_view timings available from
<a class="moz-txt-link-freetext" href="https://github.com/stephankramer/petsc-scaling/">https://github.com/stephankramer/petsc-scaling/</a>
In particular:
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap=""><a class="moz-txt-link-freetext" href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat</a>
<a class="moz-txt-link-freetext" href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat</a>
<a class="moz-txt-link-freetext" href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat</a>
<a class="moz-txt-link-freetext" href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat</a>
<a class="moz-txt-link-freetext" href="https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat">https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat</a>
<a class="moz-txt-link-freetext" href="https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat">https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat</a>
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap=""></pre>
</blockquote>
<pre class="moz-quote-pre" wrap=""></pre>
</blockquote>
<br>
</div>
</div></blockquote></div><br></body></html>