Mark,<div><br></div><div>Using "-pc_gamg_square_graph 10" didn't change anything. I used values of 1, 10, 100, and 1000 and the performance seemed unaffected.</div><div><br></div><div>Changing the threshold of -pc_gamg_threshold to 0.8 did decrease wall-clock time but it required more iterations.</div><div><br></div><div>I am not really sure how I go about tinkering around with GAMG or even ML. Do you have a manual/reference/paper/etc that describes what's going on within gamg?</div><div><br></div><div>Thanks,</div><div>Justin<br><br>On Thursday, March 3, 2016, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">You have a very sparse 3D problem, with 9 non-zeros per row.   It is coarsening very slowly and creating huge coarse grids. which are expensive to construct.  The superlinear speedup is from cache effects, most likely.  First try with:<div><br></div><div>-pc_gamg_square_graph 10</div><div><br></div><div>ML must have some AI in there to do this automatically, because gamg are pretty similar algorithmically.  There is a threshold parameter that is important (-pc_gamg_threshold <0.0>) and I think ML has the same default.  ML is doing OK, but I would guess that if you use like 0.02 for MLs threshold you would see some improvement.  </div><div><br></div><div>Hypre is doing pretty bad also.  I suspect that it is getting confused as well.  I know less about how to deal with hypre.</div><div><br></div><div>If you use -info and grep on GAMG you will see about 20 lines that will tell you the number of equations on level and the average number of non-zeros per row.  In 3D the reduction per level should be -- very approximately -- 30x and the number of non-zeros per row should not explode, but getting up to several hundred is OK.</div><div><br></div><div>If you care to test this we should be able to get ML and GAMG to agree pretty well.  ML is a nice solver, but our core numerics should be about the same.  I tested this on a 3D elasticity problem a few years ago.  That said, I think your ML solve is pretty good.</div><div><br></div><div>Mark</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 3, 2016 at 4:36 AM, Lawrence Mitchell <span dir="ltr"><<a href="javascript:_e(%7B%7D,'cvml','lawrence.mitchell@imperial.ac.uk');" target="_blank">lawrence.mitchell@imperial.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 02/03/16 22:28, Justin Chang wrote:<br>

...<br>

<span><br>

<br>

>         Down solver (pre-smoother) on level 3<br>

><br>

</span>>           KSP Object:          (solver_fieldsplit_1_mg_levels_3_)<br>

<span>>             linear system matrix = precond matrix:<br>

</span>...<br>

<span>>             Mat Object:             1 MPI processes<br>

><br>

>               type: seqaij<br>

><br>

>               rows=52147, cols=52147<br>

><br>

>               total: nonzeros=38604909, allocated nonzeros=38604909<br>

><br>

>               total number of mallocs used during MatSetValues calls =2<br>

><br>

>                 not using I-node routines<br>

><br>

</span><span>>         Down solver (pre-smoother) on level 4<br>

><br>

</span>>           KSP Object:          (solver_fieldsplit_1_mg_levels_4_)<br>

<span>>             linear system matrix followed by preconditioner matrix:<br>

><br>

>             Mat Object:            (solver_fieldsplit_1_)<br>

<br>

</span>...<br>

<span>><br>

>             Mat Object:             1 MPI processes<br>

><br>

>               type: seqaij<br>

><br>

>               rows=384000, cols=384000<br>

><br>

>               total: nonzeros=3416452, allocated nonzeros=3416452<br>

<br>

<br>

</span>This looks pretty suspicious to me.  The original matrix on the finest<br>

level has 3.8e5 rows and ~3.4e6 nonzeros.  The next level up, the<br>

coarsening produces 5.2e4 rows, but 38e6 nonzeros.<br>

<br>

FWIW, although Justin's PETSc is from Oct 2015, I get the same<br>

behaviour with:<br>

<br>

ad5697c (Master as of 1st March).<br>

<br>

If I compare with the coarse operators that ML produces on the same<br>

problem:<br>

<br>

The original matrix has, again:<br>

<span><br>

        Mat Object:         1 MPI processes<br>

          type: seqaij<br>

          rows=384000, cols=384000<br>

          total: nonzeros=3416452, allocated nonzeros=3416452<br>

          total number of mallocs used during MatSetValues calls=0<br>

            not using I-node routines<br>

<br>

</span>While the next finest level has:<br>

<span><br>

            Mat Object:             1 MPI processes<br>

              type: seqaij<br>

</span>              rows=65258, cols=65258<br>

              total: nonzeros=1318400, allocated nonzeros=1318400<br>

<span>              total number of mallocs used during MatSetValues calls=0<br>

                not using I-node routines<br>

<br>

</span>So we have 6.5e4 rows and 1.3e6 nonzeros, which seems more plausible.<br>

<br>

Cheers,<br>

<br>

Lawrence<br>

<br>

</blockquote></div><br></div>

</blockquote></div>