<div dir="ltr">Hi Jeremy,<div><br></div><div>I hope you don't mind putting this on the list (w/o data), but this is documentation and you are the second user that found regressions. </div><div>Sorry for the churn. <br></div><div><br></div><div>There is a lot here so we can iterate, but here is a pass at your questions.</div><div><br></div><div>*** Using MIS-2 instead of square graph was motivated by setup cost/performance but on GPUs with some recent fixes in Kokkos (in a branch) square graph seems OK.</div><div>My experience was that square graph is better in terms of quality and we have a power user, like you all, that found this also.<br></div><div><div>So I switched the default back to square graph.</div><br class="gmail-Apple-interchange-newline"></div><div>Interesting that you found that MIS-2 (new method) could be faster, but it might be because the two methods coarsen at different rates and that can make a big difference.</div><div>(the way to test would be to adjust parameters to get similar coarsen rates, but I digress)</div><div>It's hard to understand the differences between these two methods in terms of aggregate quality so we need to just experiment and have options.</div><div><br></div><div>*** As far as your thermal problem. There was a complaint that the eigen estimates for chebyshev smoother were not recomputed for nonlinear problems and I added an option to do that and turned it on by default:</div><div>Use '-pc_gamg_recompute_esteig false' to get back to the original.<br></div><div>(I should have turned it off by default)</div><div><br></div><div><div>Now, if your problem is symmetric and you use CG to compute the eigen estimates there should be no difference.</div></div><div>If you use CG to compute the eigen estimates in GAMG (and have GAMG give them to cheby, the default) that when you recompute the eigen estimates the cheby eigen estimator is used and that will use gmres by default unless you set the SPD property in your matrix.<br></div><div>So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and -options_left)</div><div>CG is a much better estimator for SPD.</div><div><br></div><div>And I found that the cheby eigen estimator uses an LAPACK *eigen* method to compute the eigen bounds and GAMG uses a *singular value* method.</div><div>The two give very different results on the lid driven cavity test (ex19). </div><div>eigen is lower, which is safer but not optimal if it is too low.</div><div>I have a branch to have cheby use the singular value method, but I don't plan on merging it (enough churn and I don't understand these differences).</div><div><br></div>*** '-pc_gamg_low_memory_threshold_filter false' recovers the old filtering method. <div>This is the default now because there is a bug in the (new) low memory filter.<div>This bug is very rare and catastrophic.</div><div>We are working on it and will turn it on by default when it's fixed.</div><div>This does not affect the semantics of the solver, just work and memory complexity.</div><div><br></div><div>*** As far as tet4 vs tet10, I would guess that tet4 wants more aggressive coarsening.</div><div>The default is to do aggressive on one (1) level.</div><div>You might want more levels for tet4.</div><div>And the new MIS-k coarsening can use any k (default is 2) wth '-mat_coarsen_misk_distance k' (eg, k=3)</div><div>I have not added hooks to have a more complex schedule to specify the method on each level.</div><div><br></div></div><div>Thanks,</div><div>Mark</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Oct 17, 2023 at 9:33 PM Jeremy Theler (External) <<a href="mailto:jeremy.theler-ext@ansys.com">jeremy.theler-ext@ansys.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg2815640872225250034">


<div dir="ltr">

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

Hey Mark</div>

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

</div>

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

Regarding the changes in the coarsening algorithm in 3.20 with respect to 3.19 in general we see that for some problems the MIS strategy gives and overall performance which is slightly better and for some others it is slightly worse than the "baseline" from

 3.19.</div>

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

We also saw that current main has switched back to the old square coarsening algorithm by default, which again, in some cases is better and in others is worse than 3.19 without any extra command-line option.</div>

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

</div>

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

Now what seems weird to us is that we have a test case which is a heat conduction problem with radiation boundary conditions (so it is non linear) using tet10 and we see</div>

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<ol>

<li style="list-style-type:"1. ""><span>that in parallel v3.20 is way worse than v3.19, although the memory usage is similar</span></li><li style="list-style-type:"2. ""><span>that petsc main (with no extra flags, just the defaults) recover the 3.19 performance but memory usage is significantly larger</span></li></ol>

<div><br>

</div>

<div>I tried using the -pc_gamg_low_memory_threshold_filter flag and the results were the same.</div>

<div><br>

</div>

<div>Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI ranks.</div>

<div>Is there any explanation about these two points we are seeing?</div>

<div>Another weird finding is that if we use tet4 instead of tet10, v3.20 is only 10% slower than the other two and main does not need more memory than the other two.<br>

</div>

<div><br>

</div>

<div>BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and main should you be interested.</div>

<div><br>

</div>

<div>Let me know if it is better to move this discussion into the PETSc mailing list.<br>

</div>

<div><br>

</div>

<div>Regards,</div>

<div>jeremy theler</div>

<div><br>

</div>

</div>

<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

</div>

</div>


</div></blockquote></div>