[petsc-users] Strange GAMG performance for mixed FE formulation

Mark Adams mfadams at lbl.gov
Fri Mar 4 07:40:50 CST 2016


On Fri, Mar 4, 2016 at 2:05 AM, Justin Chang <jychang48 at gmail.com> wrote:

> Mark,
>
> Using "-pc_gamg_square_graph 10" didn't change anything. I used values of
> 1, 10, 100, and 1000 and the performance seemed unaffected.
>

Humm. please run with -info and grep on GAMG,  This will be about 20 lines
but -info is very noisy.  Perhaps if you can do the same thing for ML (I'm
not sure if the ML interface supports verbose output like this) that would
be useful.

BTW, -pc_gamg_square_graph N is the number of levels to square the graph.
You will see in the verbose output the number of non-zeros per row in your
problems starts at 9 and goes up to ~740, and ~5370, and then 207 the the
coarse grid where there are 207 columns.


>
> Changing the threshold of -pc_gamg_threshold to 0.8 did decrease
> wall-clock time but it required more iterations.
>

That is very large.  A more reasonable scan would be: 0, 0.01, 0.04, 0.08.


>
> I am not really sure how I go about tinkering around with GAMG or even ML.
> Do you have a manual/reference/paper/etc that describes what's going on
> within gamg?
>

There is a section in the manual.  It goes through some of these
troubleshooting techniques.


>
> Thanks,
> Justin
>
>
> On Thursday, March 3, 2016, Mark Adams <mfadams at lbl.gov> wrote:
>
>> You have a very sparse 3D problem, with 9 non-zeros per row.   It is
>> coarsening very slowly and creating huge coarse grids. which are expensive
>> to construct.  The superlinear speedup is from cache effects, most likely.
>> First try with:
>>
>> -pc_gamg_square_graph 10
>>
>> ML must have some AI in there to do this automatically, because gamg are
>> pretty similar algorithmically.  There is a threshold parameter that is
>> important (-pc_gamg_threshold <0.0>) and I think ML has the same default.
>> ML is doing OK, but I would guess that if you use like 0.02 for MLs
>> threshold you would see some improvement.
>>
>> Hypre is doing pretty bad also.  I suspect that it is getting confused as
>> well.  I know less about how to deal with hypre.
>>
>> If you use -info and grep on GAMG you will see about 20 lines that will
>> tell you the number of equations on level and the average number of
>> non-zeros per row.  In 3D the reduction per level should be -- very
>> approximately -- 30x and the number of non-zeros per row should not
>> explode, but getting up to several hundred is OK.
>>
>> If you care to test this we should be able to get ML and GAMG to agree
>> pretty well.  ML is a nice solver, but our core numerics should be about
>> the same.  I tested this on a 3D elasticity problem a few years ago.  That
>> said, I think your ML solve is pretty good.
>>
>> Mark
>>
>>
>>
>>
>> On Thu, Mar 3, 2016 at 4:36 AM, Lawrence Mitchell <
>> lawrence.mitchell at imperial.ac.uk> wrote:
>>
>>> On 02/03/16 22:28, Justin Chang wrote:
>>> ...
>>>
>>>
>>> >         Down solver (pre-smoother) on level 3
>>> >
>>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_3_)
>>> >             linear system matrix = precond matrix:
>>> ...
>>> >             Mat Object:             1 MPI processes
>>> >
>>> >               type: seqaij
>>> >
>>> >               rows=52147, cols=52147
>>> >
>>> >               total: nonzeros=38604909, allocated nonzeros=38604909
>>> >
>>> >               total number of mallocs used during MatSetValues calls =2
>>> >
>>> >                 not using I-node routines
>>> >
>>> >         Down solver (pre-smoother) on level 4
>>> >
>>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_4_)
>>> >             linear system matrix followed by preconditioner matrix:
>>> >
>>> >             Mat Object:            (solver_fieldsplit_1_)
>>>
>>> ...
>>> >
>>> >             Mat Object:             1 MPI processes
>>> >
>>> >               type: seqaij
>>> >
>>> >               rows=384000, cols=384000
>>> >
>>> >               total: nonzeros=3416452, allocated nonzeros=3416452
>>>
>>>
>>> This looks pretty suspicious to me.  The original matrix on the finest
>>> level has 3.8e5 rows and ~3.4e6 nonzeros.  The next level up, the
>>> coarsening produces 5.2e4 rows, but 38e6 nonzeros.
>>>
>>> FWIW, although Justin's PETSc is from Oct 2015, I get the same
>>> behaviour with:
>>>
>>> ad5697c (Master as of 1st March).
>>>
>>> If I compare with the coarse operators that ML produces on the same
>>> problem:
>>>
>>> The original matrix has, again:
>>>
>>>         Mat Object:         1 MPI processes
>>>           type: seqaij
>>>           rows=384000, cols=384000
>>>           total: nonzeros=3416452, allocated nonzeros=3416452
>>>           total number of mallocs used during MatSetValues calls=0
>>>             not using I-node routines
>>>
>>> While the next finest level has:
>>>
>>>             Mat Object:             1 MPI processes
>>>               type: seqaij
>>>               rows=65258, cols=65258
>>>               total: nonzeros=1318400, allocated nonzeros=1318400
>>>               total number of mallocs used during MatSetValues calls=0
>>>                 not using I-node routines
>>>
>>> So we have 6.5e4 rows and 1.3e6 nonzeros, which seems more plausible.
>>>
>>> Cheers,
>>>
>>> Lawrence
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160304/1dbbd66b/attachment.html>


More information about the petsc-users mailing list